Missed Team ’24? Catch up on announcements here.

×
Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Leveraging Algorithms for a User-Centric Jira Project Migration

A migration to Atlassian Cloud is a journey, and everyone who embarks on that journey will have to answer three critical questions - What am I migrating? When am I migrating it? and How will I migrate it? During the planning phase of your migration, you'll spend a lot of time thinking about how you'll migrate, influenced by what and when you're migrating. For large or more complex organizations, the "how" is also influenced by factors such as the feasibility of large-scale change management, uptime requirements for business-critical teams, limited migration resourcing, etc. In many cases, migrators will opt to divide their data into smaller, more manageable parts known as clusters to address these factors.

Today, we see migrators "cluster" their data by org structure, for example, by project, team, department, or business unit. However, modern teams aren't siloed; they're incredibly cross-functional, working on multiple projects owned by different departments, something not reflected in the org structure. You can uncover these connections by analyzing attributes of product data that indicate “connectedness.” For example, shared assignees on a Jira project are a strong indication of connectedness. By clustering based on connectedness, you can more easily decide which projects should be placed on the same site or need to be migrated at the same time to ensure your end-users won’t be working across different sites or deployments (i.e., DC or Cloud).

How to identify connected projects in Jira

In order to cluster projects based on their connectedness, you need to determine the criteria for considering two projects as connected. You’ll want to adopt a user-centric approach so you can ensure a seamless end-user experience; therefore, you can use shared assignees as an indication of connectedness between Jira projects. Consequently, the greater the number of assignees that two projects have in common, the stronger their connection will be.

You'll need to establish a relationship between each project in the form of a "connectedness factor" that will increase based on the number of shared assignees two projects have. You can extract this information from your Jira Server or Data Center on-premises database using the below query written for Postgres:

with projects as (
select p.id, p.pkey, ji.assignee, count(ji.id) as issues
from project p
join jiraissue ji on p.id = ji.project
where ji.assignee is not null
group by 1,2,3
)
select
p1.pkey as source, p2.pkey as target, (p1.issues * p2.issues) as weight
from projects p1
join projects p2 on p1.id < p2.id and p1.assignee = p2.assignee;

The above query does the following:

  • Project ABC has 5 issues assigned to userA and 6 issues assigned to userB

  • Project XYZ has 3 issues assigned to userA and 7 issues assigned to userB

  • Meaning the connectedness weight between ABC and XYZ = (5 x 3) + (6 x 7) = 57

The more common assignees there are between two projects, the more connectedness weight will grow exponentially; for example, with just two more issues sharing an assignee, we see the following:

  • Project ABC has 5 issues assigned to userA and 7 issues assigned to userB

  • Project XYZ has 4 issues assigned to userA and 7 issues assigned to userB

  • Connectedness weight between ABC and XYZ = 5 x 5 + 7 x 7 = 69


The result of the above query will be a weighted project-to-project mapping table of all possible project-to-project combinations. Here’s a redacted example of a CSV export of the query results:

  source

  target

  weight

GHS

JPO

7743377

SDS

JPO

1321898

CSP

STRD

242104

JST

SDS

5164252

SSP

PSSRV

2114693

CA

TRELLO

13994003

CSP

JPO

51813

JSP

PSP

313915

CLV

FEEDBACK

1456

PTGS

DATA

838554

JPNS

MOVE

42243

Clustering

The table above is a perfect data source to represent a weighted undirected network (also known as a ”graph”) where the “nodes” are the projects. The connectedness weight will represent the thickness/strength of the edge between two nodes.

Once we establish a network, we can use a few clustering algorithms to find clusters or groups within our network. One of the most popular ones is the Louvain method. We can apply this algorithm to our network using the Gephi tool or the Python Networkx library.

A/ The Gephi method

Importing the data into Gephi and running the Louvain algorithm with a few other manual steps will generate the view below:

  1. File > Import spreadsheet

    Screenshot 2022-10-26 at 11.46.57.png
  2. Select “Finish”

  3. Change “Graph Type” to “Undirected” then OK
    Screenshot 2022-10-26 at 12.01.12.png

  4. On the right side, Community Detection > Modularity > Run with the default settings. This is Gephi’s implementation of the Louvain algorithm.
    Screenshot 2022-10-26 at 14.07.21.png

  5. Close the “Run Report.

  6. On the left side, Appearance > Nodes > Partition, choose the “Modularity class” attribute, then Apply. This will color the nodes based on their cluster. Now, each color represents one of the 4 clusters.
    Screenshot 2022-10-26 at 14.08.06.png

  7. Adjust the display settings at the bottom bar to display node IDs and change the background. You may also move nodes around manually.
    Screenshot 2022-08-23 at 23.24.02 (2) (1).png

B/ The Python method

The approach with Python is a lot easier and only requires a run of the script below against the CSV file:

Python script

import networkx as nx
import pandas as pd

import networkx.algorithms.community as nx_comm

df = pd.read_csv('issue_assignees.csv')
Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, edge_attr='weight', create_using=Graphtype)

// You may tune the resolution attribute to generate smaller or bigger clusters.
communities = nx_comm.louvain_communities(G, weight='weight', resolution=0.5)
for i in range(len(communities)):
print(communities[i])

Example execution output with resolution = 0.5

{'HALP', 'SPSP', 'JST', 'AVP', 'DATA', 'CWDSUP', 'PSP', 'STSP', 'PCS', 'MOVE', 'PSCLOUD', 'TRELLO', 'BBS', 'COMM', 'PTGS', 'CES', 'SECREP', 'OGSP', 'HCP', 'HCSP', 'CSP', 'STRD', 'CQS'}
{'BSP', 'CRC', 'CLV', 'CSTADM', 'JPO', 'ACE', 'PSSRV', 'ECSP', 'BONS', 'JSP', 'DMCA', 'SUPTST', 'GHS', 'POINTA', 'SSP', 'FEEDBACK', 'FSH', 'SDS', 'ALIGNSP', 'IDESP', 'PS', 'KNOW'}
{'UNI', 'BACCESS', 'PSD', 'TE', 'JPNS', 'PFD', 'CEO', 'CA', 'DDS', 'PA', 'CEOARCHIVE'}

With thousands of projects, displaying the graph only brings a little value, and the Python script is much quicker. You may use the resolution attribute to get smaller or bigger clusters. This will also influence the number of clusters generated.

Summary of the Python method

  1. Run the query below and export the results to a CSV file:

    SELECT p1.pkey AS source, p2.pkey AS target, COUNT(ji1.assignee) AS weight
    FROM jiraissue ji1
    INNER JOIN jiraissue ji2 ON ji1.assignee = ji2.assignee
    INNER JOIN project p1 ON ji1.project = p1.id
    INNER JOIN project p2 ON ji2.project = p2.id
    WHERE p1.id < p2.id
    AND ji1.assignee IS NOT NULL GROUP BY (p1.pkey, p2.pkey);
  2. Validate the CSV file is called issue_assignees.csv and has a source,target,weight header.

  3. Run the Python script below from the same directory as the CSV file:

    import networkx as nx
    import pandas as pd

    import networkx.algorithms.community as nx_comm

    df = pd.read_csv('issue_assignees.csv')
    Graphtype = nx.Graph()
    G = nx.from_pandas_edgelist(df, edge_attr='weight', create_using=Graphtype)

    // You may tune the resolution attribute to generate smaller or bigger clusters.
    communities = nx_comm.louvain_communities(G, weight='weight', resolution=0.5)
    for i in range(len(communities)):
    print(communities[i])
  4. Adjust the resolution attribute until you're satisfied with how the projects are clustered (number of clusters, specific projects being in the same cluster, etc.). You may also add an issue count per project and count the total number of issues per cluster.

What else is possible?

Clustering scope

It is worth noting that clustering does not have to be applied to all Jira issues. For instance, you can cluster all data to determine the number of sites you want to have, then cluster the data to each site to identify projects per migration phase.

Connectedness factors

Shared assignees are only one attribute that can establish a connectedness factor between Jira projects. Many other factors may be considered, such as:

  • shared time loggers, commenters, reporters, etc…

  • shared presence in filter queries

  • shared presence in board queries

  • issue links

You may consider using several connectedness factors to produce a weighted undirected graph. Suppose you normalize the weight on each edge of the graph so all weights are between 0 and 100. In that case, you can use those connectedness factors to establish a final graph representation that considers multiple aspects of how two Jira projects are connected. We may even assign weight to each attribute; for example, you may say shared assignees are 10 times as important as issue links.

Alternative clustering algorithms

The Louvain algorithm isn’t the only clustering algorithm out there. K-means, for instance, lets you decide exactly how many clusters you want to have. This could be useful if you know how many migration phases you wish to have.

Feel free to comment on the post below if you’d like to share your thoughts and feedback with us.

3 comments

Comment

Log in or Sign up to comment
Adam Rypel _MoroSystems_
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 25, 2023

Hi @Arbi Dridi ,

Thanks for the interesting article. 

However if I understand correctly, during the migration you basically can't ever keep all the connections in case of thousands of projects, right? The method of clustering is to just minimize the data loss.

To be specific, I would create a cluster 1 of projects A, B, C that have the highest connectedness between each other and then cluster 2 of projects D, E, F. However, the project A can still be connected to project D, therefore I lose this connection during the migration. 

Thanks, Adam

Like # people like this
Arbi Dridi
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 25, 2023

@Adam Rypel _MoroSystems_ Thanks for your comment. You're correct. The value behind clustering is to move the most connected projects together in the same migration phase or to the same target site. There will always be cases where links between two issues will be temporarily lost before the reconciliation takes place. 

Like # people like this
Branimir Kain
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
November 22, 2023

Just wanted to cross-post Portfolio Analyzer 2.0 which is an EAP under construction by our Migration Tooling team very much related to this topic!

TAGS
AUG Leaders

Atlassian Community Events