Leveraging Algorithms for a User-Centric Jira Project Migration

Comment · October 24, 2023

A migration to Atlassian Cloud is a journey, and everyone who embarks on that journey will have to answer three critical questions - What am I migrating? When am I migrating it? and How will I migrate it? During the planning phase of your migration, you'll spend a lot of time thinking about how you'll migrate, influenced by what and when you're migrating. For large or more complex organizations, the "how" is also influenced by factors such as the feasibility of large-scale change management, uptime requirements for business-critical teams, limited migration resourcing, etc. In many cases, migrators will opt to divide their data into smaller, more manageable parts known as clusters to address these factors.

Today, we see migrators "cluster" their data by org structure, for example, by project, team, department, or business unit. However, modern teams aren't siloed; they're incredibly cross-functional, working on multiple projects owned by different departments, something not reflected in the org structure. You can uncover these connections by analyzing attributes of product data that indicate “connectedness.” For example, shared assignees on a Jira project are a strong indication of connectedness. By clustering based on connectedness, you can more easily decide which projects should be placed on the same site or need to be migrated at the same time to ensure your end-users won’t be working across different sites or deployments (i.e., DC or Cloud).

How to identify connected projects in Jira

In order to cluster projects based on their connectedness, you need to determine the criteria for considering two projects as connected. You’ll want to adopt a user-centric approach so you can ensure a seamless end-user experience; therefore, you can use shared assignees as an indication of connectedness between Jira projects. Consequently, the greater the number of assignees that two projects have in common, the stronger their connection will be.

You'll need to establish a relationship between each project in the form of a "connectedness factor" that will increase based on the number of shared assignees two projects have. You can extract this information from your Jira Server or Data Center on-premises database using the below query written for Postgres:

with projects as (
select p.id, p.pkey, ji.assignee, count(ji.id) as issues
from project p
join jiraissue ji on p.id = ji.project
where ji.assignee is not null
group by 1,2,3
)
select
p1.pkey as source, p2.pkey as target, (p1.issues * p2.issues) as weight
from projects p1
join projects p2 on p1.id < p2.id and p1.assignee = p2.assignee;

The above query does the following:

Project ABC has 5 issues assigned to userA and 6 issues assigned to userB
Project XYZ has 3 issues assigned to userA and 7 issues assigned to userB
Meaning the connectedness weight between ABC and XYZ = (5 x 3) + (6 x 7) = 57

The more common assignees there are between two projects, the more connectedness weight will grow exponentially; for example, with just two more issues sharing an assignee, we see the following:

Project ABC has 5 issues assigned to userA and 7 issues assigned to userB
Project XYZ has 4 issues assigned to userA and 7 issues assigned to userB
Connectedness weight between ABC and XYZ = 5 x 5 + 7 x 7 = 69

The result of the above query will be a weighted project-to-project mapping table of all possible project-to-project combinations. Here’s a redacted example of a CSV export of the query results:

source	target	weight
`GHS`	`JPO`	`7743377`
`SDS`	`JPO`	`1321898`
`CSP`	`STRD`	`242104`
`JST`	`SDS`	`5164252`
`SSP`	`PSSRV`	`2114693`
`CA`	`TRELLO`	`13994003`
`CSP`	`JPO`	`51813`
`JSP`	`PSP`	`313915`
`CLV`	`FEEDBACK`	`1456`
`PTGS`	`DATA`	`838554`
`JPNS`	`MOVE`	`42243`

Clustering

The table above is a perfect data source to represent a weighted undirected network (also known as a ”graph”) where the “nodes” are the projects. The connectedness weight will represent the thickness/strength of the edge between two nodes.

Once we establish a network, we can use a few clustering algorithms to find clusters or groups within our network. One of the most popular ones is the Louvain method. We can apply this algorithm to our network using the Gephi tool or the Python Networkx library.

A/ The Gephi method

Importing the data into Gephi and running the Louvain algorithm with a few other manual steps will generate the view below:

File > Import spreadsheet
Select “Finish”
Change “Graph Type” to “Undirected” then OK
On the right side, Community Detection > Modularity > Run with the default settings. This is Gephi’s implementation of the Louvain algorithm.
Close the “Run Report.
On the left side, Appearance > Nodes > Partition, choose the “Modularity class” attribute, then Apply. This will color the nodes based on their cluster. Now, each color represents one of the 4 clusters.
Adjust the display settings at the bottom bar to display node IDs and change the background. You may also move nodes around manually.

B/ The Python method

The approach with Python is a lot easier and only requires a run of the script below against the CSV file:

Python script

import networkx as nx
import pandas as pd

import networkx.algorithms.community as nx_comm

df = pd.read_csv('issue_assignees.csv')
Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, edge_attr='weight', create_using=Graphtype)

// You may tune the resolution attribute to generate smaller or bigger clusters.
communities = nx_comm.louvain_communities(G, weight='weight', resolution=0.5)
for i in range(len(communities)): 
    print(communities[i])

Example execution output with resolution = 0.5

{'HALP', 'SPSP', 'JST', 'AVP', 'DATA', 'CWDSUP', 'PSP', 'STSP', 'PCS', 'MOVE', 'PSCLOUD', 'TRELLO', 'BBS', 'COMM', 'PTGS', 'CES', 'SECREP', 'OGSP', 'HCP', 'HCSP', 'CSP', 'STRD', 'CQS'}
{'BSP', 'CRC', 'CLV', 'CSTADM', 'JPO', 'ACE', 'PSSRV', 'ECSP', 'BONS', 'JSP', 'DMCA', 'SUPTST', 'GHS', 'POINTA', 'SSP', 'FEEDBACK', 'FSH', 'SDS', 'ALIGNSP', 'IDESP', 'PS', 'KNOW'}
{'UNI', 'BACCESS', 'PSD', 'TE', 'JPNS', 'PFD', 'CEO', 'CA', 'DDS', 'PA', 'CEOARCHIVE'}

With thousands of projects, displaying the graph only brings a little value, and the Python script is much quicker. You may use the resolution attribute to get smaller or bigger clusters. This will also influence the number of clusters generated.

Summary of the Python method

Run the query below and export the results to a CSV file:

SELECT p1.pkey AS source, p2.pkey AS target, COUNT(ji1.assignee) AS weight
FROM jiraissue ji1 
INNER JOIN jiraissue ji2 ON ji1.assignee = ji2.assignee 
INNER JOIN project p1 ON ji1.project = p1.id 
INNER JOIN project p2 ON ji2.project = p2.id 
WHERE p1.id < p2.id 
AND ji1.assignee IS NOT NULL GROUP BY (p1.pkey, p2.pkey);

Validate the CSV file is called issue_assignees.csv and has a source,target,weight header.

Run the Python script below from the same directory as the CSV file:

import networkx as nx
import pandas as pd

import networkx.algorithms.community as nx_comm

df = pd.read_csv('issue_assignees.csv')
Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, edge_attr='weight', create_using=Graphtype)

// You may tune the resolution attribute to generate smaller or bigger clusters.
communities = nx_comm.louvain_communities(G, weight='weight', resolution=0.5)
for i in range(len(communities)): 
    print(communities[i])

Adjust the resolution attribute until you're satisfied with how the projects are clustered (number of clusters, specific projects being in the same cluster, etc.). You may also add an issue count per project and count the total number of issues per cluster.

What else is possible?

Clustering scope

It is worth noting that clustering does not have to be applied to all Jira issues. For instance, you can cluster all data to determine the number of sites you want to have, then cluster the data to each site to identify projects per migration phase.

Connectedness factors

Shared assignees are only one attribute that can establish a connectedness factor between Jira projects. Many other factors may be considered, such as:

shared time loggers, commenters, reporters, etc…
shared presence in filter queries
shared presence in board queries
issue links

You may consider using several connectedness factors to produce a weighted undirected graph. Suppose you normalize the weight on each edge of the graph so all weights are between 0 and 100. In that case, you can use those connectedness factors to establish a final graph representation that considers multiple aspects of how two Jira projects are connected. We may even assign weight to each attribute; for example, you may say shared assignees are 10 times as important as issue links.

Alternative clustering algorithms

The Louvain algorithm isn’t the only clustering algorithm out there. K-means, for instance, lets you decide exactly how many clusters you want to have. This could be useful if you know how many migration phases you wish to have.

Feel free to comment on the post below if you’d like to share your thoughts and feedback with us.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Leveraging Algorithms for a User-Centric Jira Project Migration

How to identify connected projects in Jira

Clustering

A/ The Gephi method

B/ The Python method

What else is possible?

Clustering scope

Connectedness factors

Alternative clustering algorithms

3 comments

Comment

Was this helpful?

Thanks!

About this author

TAGS

Atlassian Community Events