Introducing Safeguards for Jira index

Hi everyone! We’re glad to announce improvements to Jira indexation, that will help alleviate pressure on the indexing queues.

What is changing?

We’re introducing limits to the number of issue-related entities that can be reindexed at a time.

This feature will be enabled by default with the limit set to 1,000 of the newest entities (comments, change history records, and work logs). However, Jira admins can change the default value as well as turn the feature off. For more information and technical details, see our DAC announcement. 

After enabling the feature, we guarantee that 1,000 newest comments, change history records, and work logs will be reindexed. We’re not deindexing entities that fall out of this pool. So, the actual number of indexed entities may be bigger than 1,000, but these extra entities won’t be reindexed in the future.

The improvements are included in Jira 8.22.2. Following 8.22.x release, the fix will be available in Jira 9.0.1 and 9.1.0.

Why is this changing?

Jira reindexing quality and index consistency got much better moving from Jira 6.x to Jira 8.x. Many things were fixed and we can safely assume that the quality of reindexing is good. As reindexing got more reliable, it still can be a very expensive operation. Triggered multiple times can cause the overload of the indexing queue and lead to significant performance problems (high CPU usage, timeouts for end-users), sometimes leading to instance failure.

You can find more details in the following ticket:

JSDSERVER-10886 - SLA configuration changes create indexing pressure on the instance CLOSED

As part of the feature, we've added stats to help you gather information around the indexing of issue-related entities. You can find more details in this KB Jira indexing-limits stats | Jira | Atlassian Documentation

2 comments

Comment

Log in or Sign up to comment
Andrzej Kotas
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
May 25, 2022

Index Safeguards extension: Filtering out items with unsupported fields

Hello again Data Center Community!

In Jira 8.22.3, we’re extending the functionality of Safeguards in Jira index. We’re glad to introduce an update to the default behavior of change items indexing. 

Now, change items with unsupported fields will be filtered out before they’re collected into change groups. As a result, when there’s a group of change items with unsupported fields only, it won’t be indexed. So, no document will be created for it in a database.

The feature is enabled by default and allows for indexing only six fields in change items: 

  • Assignee
  • Fix Version
  • Priority
  • Reporter
  • Resolution
  • Status  

Why have we done it?

With Jira 8.22.2, we've introduced Safeguards to alleviate pressure on the instance caused by the overload of indexing queues. According to our analysis, the new limit on fields’ number will remove about 94% of redundant Jira indexes for change items. This is vital to the health of the instance and will only impact large and long-running issues.

Tips for administrators

In specific cases, Jira administrators can change the new default behavior. For example, the administrator can disable the filtering of unsupported fields in change items. But this may cause performance degradation for large issues.

Learn more about how to modify the default behavior here: Introducing Safeguards to Jira indexation KB 

This article What's changed in Jira after the implementation of indexing limits answers some common questions related to Jira indexing limits and explains differences in the way Jira works since the update.

Kamil Cichy
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
May 25, 2022

Index Safeguards adjustment: New values for default limits

We're adjusting the default limits of the topN issue-related entities.

What is changing?

Following performance tests we decided to adjust the default Index Safeguards limits. The new limits are as follows:

 comments 500
 worklogs 100
 changehistory 100

Why is it changing?

We performed extensive performance tests. We set up an 8-node cluster and we used 30 threads to constantly send requests to each node (240 threads in total). The requests consist of both reads and writes, simulating heavy user traffic. We then used Jira Indexing Queue stats to observe how the indexing performance changes with different limit values.

Choosing the right limit values is an act of balancing between functionality and usability. On one extreme a user can search in all 50000 comments of an issue, but it takes ages to index anything. On the other extreme the system is very fast, but searching for comments doesn't work. With this in mind we chose the new default limits.

Jira will index the latest 500 comments. Comment search is more popular than change history/worklog search, so we allowed a higher number here to lean towards the functionality. Also, comments are indexed incrementally (adding a new comment causes only this comment to be indexed), so we are safe with a more relaxed limit here.

Jira will index the latest 100 change history items and worklogs. These are less often searched for, so we can lean towards indexing performance here. Moreover, all change items of an issue are always indexed at once, making it much more expensive, further incentivising a tougher limit.

When is it changing

The new default limits are introduced in Jira 8.22.4.

The impact

We took a look at Jira stats coming from a production instance of one of our large clients to assess the impact the new limits would have on them. We used a sample of 1.3M indexing operations. We observed that only in 0.3% of cases the number of comments exceeded the limit, meaning not all comments were indexed. For worklogs this was 0% and for changehistory 0.1%. For exact numbers head down to the bottom of this article.

We believe this trade-off is fair taking into account the increased Jira indexing stability. 

Jira stats deep dive

This section is only for those who love numbers. 🤓

We ran our performance tests to observe how Jira indexer is coping with different limits.

No limit -1

Without any limits the cluster quickly became unstable. Change history items had to wait in a queue on average 743ms (many waited for over 10s!) to be indexed and the index update process many times took over 1s, meaning less than one update per second was done.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 743,
        "distributionCounter":
        {
            "0": 2773,
            "1": 90,
            "10": 38,
            "100": 214,
            "1000": 573,
            "10000": 727,
            "20000": 44,
            "30000": 0,
            "9223372036854775807": 1
        }
    },
    "timeToUpdateIndexMillis":
    {
        "avg": 158,
        "distributionCounter":
        {
            "0": 3749,
            "1": 27,
            "10": 161,
            "50": 59,
            "100": 36,
            "500": 113,
            "1000": 147,
            "9223372036854775807": 167
        }
    },
    "totalTimeMillis":
    {
        "avg": 796,
        "distributionCounter":
        {
            "0": 1215,
            "1": 1023,
            "10": 393,
            "100": 313,
            "1000": 641,
            "10000": 820,
            "20000": 53,
            "30000": 0
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 76,
    },
}

Limit 1000

With the previous default limit of 1000 the situation was much better, but when we reached the maximum load the indexer still couldn't withstand the amount of work. Time to update the index still exceeded 1s.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 515,
        "distributionCounter":
        {
            "0": 3190,
            "1": 249,
            "10": 239,
            "100": 622,
            "1000": 2683,
            "10000": 1041,
            "20000": 10,
            "30000": 7
        }
    },
    "timeToUpdateIndexMillis":
    {
        "avg": 93,
        "distributionCounter":
        {
            "0": 6665,
            "1": 196,
            "10": 194,
            "50": 77,
            "100": 27,
            "500": 424,
            "1000": 303,
            "9223372036854775807": 155
        }
    },
    "totalTimeMillis":
    {
        "avg": 609,
        "distributionCounter":
        {
            "0": 1329,
            "1": 294,
            "10": 1460,
            "100": 863,
            "1000": 2812,
            "10000": 1264,
            "20000": 9,
            "30000": 9
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 1,
    },
}

Limit 500

The limit of 500 brought another improvement. The average time to update the index fell down to 27ms, allowing for ~37 issue updates per second. Unfortunately, there were still 21 updates that spent over a second on the critical path. There were no timeouts anymore, though.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 79,
        "distributionCounter":
        {
            "0": 5616,
            "1": 170,
            "10": 219,
            "100": 2019,
            "1000": 2774,
            "10000": 80,
            "20000": 0,
            "30000": 0
        }
    }, 
    "timeToUpdateIndexMillis":
    {
        "avg": 27,
        "distributionCounter":
        {
            "0": 9300,
            "1": 44,
            "10": 101,
            "50": 100,
            "100": 81,
            "500": 1211,
            "1000": 20,
            "9223372036854775807": 21
        }
    },
    "totalTimeMillis":
    {
        "avg": 108,
        "distributionCounter":
        {
            "0": 2611,
            "1": 482,
            "10": 2054,
            "100": 2028,
            "1000": 3595,
            "10000": 108,
            "20000": 0,
            "30000": 0
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 0,
    },
}


Limit 100

The limit of 100 brought what we were aiming for - a cluster that is stable under heavy load. No timeouts, a short time spent in the indexer queue and on the writing thread (average 3ms for both) and only a single update that took over a second.

[JIRA-STATS] [INDEXING-QUEUE]  index:CHANGE_HISTORY, total primary queue stats: {
    "timeInQueueMillis":
    {
        "avg": 3,
        "distributionCounter":
        {
            "0": 9440,
            "1": 201,
            "10": 530,
            "100": 1120,
            "1000": 18,
            "10000": 0,
            "20000": 0,
            "30000": 0
        }
    },
    "timeToUpdateIndexMillis":
    {
        "avg": 3,
        "distributionCounter":
        {
            "0": 9875,
            "1": 23,
            "10": 125,
            "50": 1203,
            "100": 75,
            "500": 6,
            "1000": 1,
            "9223372036854775807": 1
        }
    },
    "totalTimeMillis":
    {
        "avg": 8,
        "distributionCounter":
        {
            "0": 3795,
            "1": 2879,
            "10": 2036,
            "100": 2548,
            "1000": 50,
            "10000": 1,
            "20000": 0,
            "30000": 0
        }
    },
    "totalTimeTimedOutMillis":
    {
        "count": 0,
    },
}

Real production instance indexing-limits stats

[JIRA-STATS] [INDEXING-LIMITS] total stats: ... data={
 ...
     "numberOfComments":{
          "count":1334984,
          "min":0,
          "max":2632,
          "sum":6588589,
          "avg":4,
          "distributionCounter":{
             "0":296176,
             "1":525645,
             "10":434660,
             "100":69058,
             "1000":9064,
             "10000":381,
             "20000":0,
             "50000":0
          }
       },
       "numberOfWorklogs":{
          "count":1334964,
          "min":0,
          "max":9,
          "sum":163,
          "avg":0,
          "distributionCounter":{
             "0":1334861,
             "1":78,
             "10":25,
             "100":0,
             "1000":0,
             "10000":0,
             "20000":0,
             "50000":0
          }
       },
       "numberOfChangeHistory":{
          "count":1334964,
          "min":1,
          "max":2455,
          "sum":10138251,
          "avg":7,
          "distributionCounter":{
             "0":0,
             "1":90395,
             "10":1111568,
             "100":131177,
             "1000":1469,
             "10000":355,
             "20000":0,
             "50000":0
          }
       },
...
}

Jira Stats

Jira Stats proved to be very useful for our performance assessment. You can use them too to evaluate the performance of your test and production environments! Find out more in this knowledge base article.

Like # people like this
TAGS
AUG Leaders

Atlassian Community Events