Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Jira Align Performance Data Framework

Application performance. As an end-user, it can be throw the computer at the wall annoying, and as an engineer, like finding out you’ve drawn Captain Freedom when faced with troubleshooting a performance problem.

           SOUNDTRACKS | | Page 7

 

When facing a performance issue, there are so many variables that can cause the problem from the things we can control like code issues, things the end-users tech team can control like VPN issues, and things no one can control like poor ISP network configuration/performance and unrealistic expectations (maybe it supposed to take 5 seconds). To troubleshoot, you need to be able to rule all of these and many others out to get to a solution and the only way to do that is with good data.

Required: Problem Statement

An accurate problem statement is probably the most important piece of information when investigating a performance issue as it sets the entire tone for the investigation. It does not need to be verbose, in fact, long problem statements often cause confusion, rather it should be one or two specific, quantifiable lines about the problem. Not only does this give support a starting point to the investigation and something the can relay to engineering quickly if required, but it also stops a lot of back and forth questions.

A bad problem statement

"It is slow"

Whilst probably accurate, it gives the technical teams trying to assist absolutely nothing to go on and will lead to numerous follow on questions before the investigation can even begin. Another issue with this is that if the problem is intermittent, by the time support has an idea of what the problem is and what data they will need, the problem could be gone again leaving everyone frustrated.

 

A good problem statement

A good problem statement is going to contain a few pieces of key information, for example, but not limited to, where the problem is happening, comparative time measurements (these will be approximate because who starts timing this stuff until its a problem, but should not be exaggerated), when it started happening and who is experiencing the problem. Some imagined scenarios for examples:

 

"After Friday’s production release, the Epic Grid is taking 20 seconds to load for Team X. The Epic Grid loads within 5 seconds for Team Y."

 

"When User A navigates to page X, the page loads for approx 60 seconds before displaying a “Whoops…” error. This started happening after [we changed a toggle|post release|some other change]. We have confirmed that other users with the same role [do|do not] experience the same problem."

Required: Steps to reproduce

Next to the problem statement in importance is clear steps to reproduce the problem. Some times they will be detailed, other times they can be a single line, but without them, the technical teams will be making assumptions when trying to reproduce internally. To be able to fix the problem, we will need to be able to reproduce internally in debugging environments so the sooner we get this locked in the sooner repro attempts can be made. Sometimes we are still unable to reproduce because of some external factor but that is OK, it just points the investigation in a different direction. The length and format is user preference, but what typically works best is a numbered list.

 

  1. Log in as or impersonate a user with the Team Leads role assigned

  2. Select Portfolio A and Release B from T1

  3. Navigate to Program Board

  4. Select [example 1| example 2] columns from the Columns Shown menu

  5. Observe behavior from problem statement

 

Again, it does not to be overly detailed. What we are trying to achieve here is a list of steps to show what is required to make the problem show itself.

 

Required: Recording showing the issue

It is entirely possible that the performance issue being experienced is specific to this Jira Align instance and we will never be able to reproduce the problem precisely, possibly due to data size or layout - remember we do not have access to your data so cannot just ‘take a copy’ for reproduction purposes. A recording helps support and engineering understand the problem if we are unable to reproduce internally. If you are unable to capture a recording, support may request a screen share or access to the instance to see for themselves by following the steps to reproduce above.

 

Data to collect

When we have the above three items, support will then require some of the below data points. Support may require more data, but in most cases, the below would be sufficient to progress the investigation to a point where we understand where the problem is, even if there is not an immediate fix. Depending on the problem, the data set may be different so please use the table below to select the symptom that is the closest match to what you are seeing and then collect the data with the green circle/checkmark in the column. Anything with a question mark in the column, support will ask you for if they need it - feel free to have it ready but it may not be needed unless deeper investigations are needed. Instructions on how and when to collect the data will follow after the table.

NOTE: This table is not set in stone and does not cover every scenario. It is here to illustrate some common issues and the data needed in most cases.

PerfDataMatrix.png

*= note what makes this team or set of users different from the rest, if anything. Different roles for example.

If you are seeing the "spinning wheel" that never goes away, it is possible that an error is hidden. Please open the Console and attempt the navigation again, and if you see a 500 error, use the same Ajax script you would use for a Whoops error.

T1 Config

Sometimes, the T1 selection can be impacting performance due to the volume of data behind it. What is the T1 config (what portfolios etc) are selected when the issue is present. Does changing the T1 config make any difference? What specific selections do you have here?

 

Work Item Count

When performance is impacting a certain team, much like T1 config, it can be related to the amount of data that the team needs to deal with. By work item counts, we mean navigating to the individual work item grids and getting a count of items this team is dealing with, for example on the Epic Grid (but we should get a count of every item type the team would work on):

WorkItemCounts.png

 

HAR File

A HAR file typically will not show the answer but will offer a number of clues. It will show Support and Engineering what navigation steps were taken to get to the problem, call timings, specifically latent calls, etc etc.

 

Start the capture before you reproduce the issue. Capturing a HAR file when the issue is not present will not provide any useful information. It is better to wait for the problem to be present than getting incorrect data.

For detailed instructions and troubleshooting please see this Atlassian KB, but if you are using Chrome it is straight forward:

  1. Click on the three dots in the top right to open the menu, More tools, Developer tools

  2. In the window that opens, click on the Network tab: NetworkTab.png

  3. Recreate the problem

  4. When you are done, right-click inside Developer Tools and then Save all as HAR with Content: SaveAs.png
  5. Attach the file to your Support ticket

 

Console Output

Sometimes the Console gives away some clues as to an underlying problem. To get to it, just open up your Developer Tools in Chrome and click the Console tab:

Console1.png

 

The red stuff is usually bad 😞 . Take a screenshot and attach it to the ticket.

 

Ajax Console Script

If you are here then you are probably seeing a Whoops screen (or you saw a 500 error in the console tab). Take a read of The Dreaded Whoops Error post but for Engineering teams, the Whoops tells us nothing. The below script will pull out the actual error behind it.

 

You need to reproduce the issue BEFORE running this, including all navigation steps. If you get a blank output, we are either chasing the wrong kind of error OR the error was not reproduced properly.

  1. Navigate to the page throwing the error i.e. reproduce the problem

  2. Open Developer Tools and go to the Console tab

  3. Select the empty line and paste in the ajax script:

 Console2.png

$.ajax({type : "POST", url: "AjaxFiles/AjaxTag?Type=264", async: false}).responseText.replace("|-|", ",")

 

Sample output

The output can be long and run off the side of the screen like the below screenshot. In that case, please copy/paste it into the ticket or a text file.

ScriptOut.png

 

Network Stats

This needs to be done when the issue is present and impacting the application. Results taken any other time may skew the investigation.

The following will show if we have a networking problem between the client and cloud-based instance

For Windows:

Open a command prompt and enter the following where the site name is the name of your Jira Align instance (please do not prefix the address with https://):

tracert <sitename>.agilecraft.com

Take a screenshot of the result and attach it to the ticket.

Linux/Mac Users:

traceroute --resolve-hostnames <sitename>.agilecraft.com

 

What Happens Next?

Support will take everything provided and give it an initial analysis. As we are talking SaaS here, support may be limited in what they can actually do to help, but given the right data, they can get it to the correct people to get the problem addressed as quickly as possible.

 

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events