A migration to Atlassian Cloud is a journey, and everyone who embarks on that journey will have to answer three critical questions - What am I migrating? When am I migrating it? and How will I migrate it? During the planning phase of your migration, you'll spend a lot of time thinking about how you'll migrate, influenced by what and when you're migrating. For large or more complex organizations, the "how" is also influenced by factors such as the feasibility of large-scale change management, uptime requirements for business-critical teams, limited migration resourcing, etc. In many cases, migrators will opt to divide their data into smaller, more manageable parts known as clusters to address these factors.
Today, we see migrators "cluster" their data by org structure, for example, by project, team, department, or business unit. However, modern teams aren't siloed; they're incredibly cross-functional, working on multiple projects owned by different departments, something not reflected in the org structure. You can uncover these connections by analyzing attributes of product data that indicate “connectedness.” For example, shared assignees on a Jira project are a strong indication of connectedness. By clustering based on connectedness, you can more easily decide which projects should be placed on the same site or need to be migrated at the same time to ensure your end-users won’t be working across different sites or deployments (i.e., DC or Cloud).
In order to cluster projects based on their connectedness, you need to determine the criteria for considering two projects as connected. You’ll want to adopt a user-centric approach so you can ensure a seamless end-user experience; therefore, you can use shared assignees as an indication of connectedness between Jira projects. Consequently, the greater the number of assignees that two projects have in common, the stronger their connection will be.
You'll need to establish a relationship between each project in the form of a "connectedness factor" that will increase based on the number of shared assignees two projects have. You can extract this information from your Jira Server or Data Center on-premises database using the below query written for Postgres:
with projects as ( |
The above query does the following:
Project ABC has 5 issues assigned to userA and 6 issues assigned to userB
Project XYZ has 3 issues assigned to userA and 7 issues assigned to userB
Meaning the connectedness weight between ABC and XYZ = (5 x 3) + (6 x 7) = 57
The more common assignees there are between two projects, the more connectedness weight will grow exponentially; for example, with just two more issues sharing an assignee, we see the following:
Project ABC has 5 issues assigned to userA and 7 issues assigned to userB
Project XYZ has 4 issues assigned to userA and 7 issues assigned to userB
Connectedness weight between ABC and XYZ = 5 x 5 + 7 x 7 = 69
The result of the above query will be a weighted project-to-project mapping table of all possible project-to-project combinations. Here’s a redacted example of a CSV export of the query results:
source |
target |
weight |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The table above is a perfect data source to represent a weighted undirected network (also known as a ”graph”) where the “nodes” are the projects. The connectedness weight will represent the thickness/strength of the edge between two nodes.
Once we establish a network, we can use a few clustering algorithms to find clusters or groups within our network. One of the most popular ones is the Louvain method. We can apply this algorithm to our network using the Gephi tool or the Python Networkx library.
Importing the data into Gephi and running the Louvain algorithm with a few other manual steps will generate the view below:
File > Import spreadsheet
Select “Finish”
Change “Graph Type” to “Undirected” then OK
On the right side, Community Detection > Modularity > Run with the default settings. This is Gephi’s implementation of the Louvain algorithm.
Close the “Run Report.
On the left side, Appearance > Nodes > Partition, choose the “Modularity class” attribute, then Apply. This will color the nodes based on their cluster. Now, each color represents one of the 4 clusters.
Adjust the display settings at the bottom bar to display node IDs and change the background. You may also move nodes around manually.
The approach with Python is a lot easier and only requires a run of the script below against the CSV file:
Python script
import networkx as nx |
Example execution output with resolution = 0.5
{'HALP', 'SPSP', 'JST', 'AVP', 'DATA', 'CWDSUP', 'PSP', 'STSP', 'PCS', 'MOVE', 'PSCLOUD', 'TRELLO', 'BBS', 'COMM', 'PTGS', 'CES', 'SECREP', 'OGSP', 'HCP', 'HCSP', 'CSP', 'STRD', 'CQS'} |
With thousands of projects, displaying the graph only brings a little value, and the Python script is much quicker. You may use the resolution attribute to get smaller or bigger clusters. This will also influence the number of clusters generated.
Summary of the Python method
Run the query below and export the results to a CSV file:
SELECT p1.pkey AS source, p2.pkey AS target, COUNT(ji1.assignee) AS weight |
Validate the CSV file is called issue_assignees.csv
and has a source,target,weight
header.
Run the Python script below from the same directory as the CSV file:
import networkx as nx |
Adjust the resolution attribute until you're satisfied with how the projects are clustered (number of clusters, specific projects being in the same cluster, etc.). You may also add an issue count per project and count the total number of issues per cluster.
It is worth noting that clustering does not have to be applied to all Jira issues. For instance, you can cluster all data to determine the number of sites you want to have, then cluster the data to each site to identify projects per migration phase.
Shared assignees are only one attribute that can establish a connectedness factor between Jira projects. Many other factors may be considered, such as:
shared time loggers, commenters, reporters, etc…
shared presence in filter queries
shared presence in board queries
issue links
You may consider using several connectedness factors to produce a weighted undirected graph. Suppose you normalize the weight on each edge of the graph so all weights are between 0 and 100. In that case, you can use those connectedness factors to establish a final graph representation that considers multiple aspects of how two Jira projects are connected. We may even assign weight to each attribute; for example, you may say shared assignees are 10 times as important as issue links.
The Louvain algorithm isn’t the only clustering algorithm out there. K-means, for instance, lets you decide exactly how many clusters you want to have. This could be useful if you know how many migration phases you wish to have.
Feel free to comment on the post below if you’d like to share your thoughts and feedback with us.
Arbi Dridi
Sr. Enterprise Technical Architect
Atlassian
13 accepted answers
3 comments