Like many of you, performance and scale are always top of mind for us here on the Data Center team. For us, the performance, scalability, and reliability of your products is not just a nice-to-have, but a requirement.
So we wanted to share with you all of the investments that we’ve made in this space over the last year and some of the new capabilities we have coming soon.
As your Jira Software and Jira Service Management Data Center instances grow, it starts to take longer to index. Our focus in 2020 was to reduce that amount of time it takes to index and, in fact, make it even faster than before.
This year, we introduced document-based replication to lower the index update distribution time in Jira Software and Jira Service Management clusters. Now, index replication isn’t delayed and remains consistent across your cluster. We saw the average index time go from 2.5 seconds down to 6 milliseconds!
By making this fix, we immediately saw the index consistency across nodes improve. It became more stable and faster than it was before. This also helped distribute changes in the index more quickly across nodes.
Instead of doing the same database operations to get the correct data into the indexes, the originating node will serialize the prepared document and propagate it to the nodes in your cluster. This reduces the impact of apps and the document-related impact on your database.
Additionally, we saw a number of other improvements.
Ability to handle higher throughput - allowing better horizontal scaling.
Database traffic was reduced, thus reducing the chance of a database failure.
All index copies receive exactly the same data during updates.
Before making this fix, in heavy load situations, the index replication process could take up to 30 minutes depending on the number of custom fields, apps, and load size.
We know that custom fields are an important part of your teams' workflows. However, the amount of time it took to reindex your instance was impacted by the number and configuration of custom fields in it.
In 8.10, we made two large improvements to reduce this performance overhead:
Custom fields with local context are indexed only for relevant issues.
Empty custom fields are skipped during the indexing process.
A clustered architecture is critical for organizations that need high-availability. However, for it be effective, you need to understand the status of each of the nodes in your cluster.
In 8.6, we added a cluster overview page that indicates the status of your nodes - which you can also track in your audit logs if you have advanced auditing enabled.
With this new page, you can see if your nodes are Active, No Heartbeat, or Offline. This enables you to make informed decisions about the administration of your cluster. But we didn’t stop there. If a node doesn’t have a heartbeat, it is now automatically removed from your cluster so that it doesn’t consume resources or impact performance.
License checks are more lightweight and average response time improved.
Your instance will no longer freeze when removing a user from a group or project due to a performance lock that has been eliminated. The removal process went from 12 seconds to approximately 1 second.
Sequential project creation is 65% faster in synthetic tests.
Improved performance and stability by optimizing epic searches.
Sped up the Favorite Filter gadget by restricting the number of issues that load.
We also focused on making significant changes to Confluence Data Center’s architecture to make indexing faster. We also prioritized making performance faster in different areas of the product.
As your Confluence instance gets larger, so does the index. In Confluence 7.9, we made an architectural change to the index and split it in two: one for content and one for changes. For every piece of content that is indexed, there is at least one changed indexed - more if there are lots of edits to a piece of content. It’s uncommon that your teams would need to search both content and change at the same time.
By splitting the indexes, we’ve seen performance increase and a reduction in both memory and CPU consumption.
Previously, reindexing your instances would require you to pull down your instance while the changes were fully propagated. For some teams, that could take up to 48 hours to complete. Having your instance down - for even just a few minutes - could limit your teams' ability to be productive.
We improved the process by adding a new admin UI trigger that triggers the process at runtime and automatically propagates the new index across your nodes seamlessly. The best part? When you reindex using the new UI, you don’t experience any downtime.
We introduced advanced user permissions in 7.3 to help you manage, troubleshoot, and audit permissions in your instance. However, as your database grows, we saw that some customers experienced some performance challenges because of how permissions were checked.
So, we introduced denormalized permissions. We also added additional database tables for space permissions and separated different types of permissions into their own tables so that your database could handle permissions more efficiently.
By making these changes, we improved the performance of searches, dashboard renderings, lists of visible spaces, and macros that list visible spaces. This not only makes page rendering faster but also decreases database load. We even extended these performance enhancements at the page level to make it easier for you to check page permissions.
In 7.6, we updated the cache architecture and saw dramatically improved performance under high load conditions. So improved, in fact, we saw that it was 4 times faster under simulated high loads.
Previously, when deployed in a clustered architecture, Confluence used a distributed cache - evenly partitioning data across all the nodes in the cluster rather than replicating the data. To improve cluster resilience and to unlock additional horizontal scaling capabilities, we switched some specific caches to local caching with remote invalidation.
Used External Process Pool to improve stability of HTML conversion for Word and Office documents that are viewed with the Office Word and Office Excel macros.
In Bitbucket Data Center, we added a new data management capability to help you clean-up your instance.
Typically, pull requests have a source and target branch and, typically, the target branch is Main. Whenever the target branch changes, all pull requests to Main need to be recalculated to review the differences between the source and target branch. This is a computationally extensive operation that can consume a lot of memory and CPU.
We released automatic decline of stale pull requests in 7.7. This capability enables you to decline any open pull requests that are considered stale, which helps to lower the resources you’re using.
We've moved some gears and sprockets, and made Crowd handle nested groups hierarchies in a more efficient way. This small change brings significant performance improvements for user authentication, permission checks, and the User groups screen.
Last, but not least, we prioritized optimizing full-time synchronization in Crowd Data Center.
Managing your user groups is important to maintain the security of your instance and the productivity of your teams. To help keep your user groups synced, we introduced the new canonicality checker.
The canonicality checker pre-fetches your user/groups names and shares them during the membership synchronization. We also optimized the existing non-shared mode of the checker too.
With the canonicality checker, we saw:
Memory consumption decreased during full sync of memberships by ~300MB.
Synchronization time was shorten by ~1h 40m when compared to the updated non-shared mode and approximately 2 hours compared to the old non-shared mode.
Improved overall sync time by 86% (Canonically) and 98% (Batched).
As you can see, we did a lot to optimize and improve the performance and scalability of our products and we plan on continuing to focus on supporting these needs for you in the new year. Here’s a sneak peek at some of the features we’re working on.
When your teams come to rely on products for their day-to-day activities, it’s not surprising that they generate a lot of data. At the enterprise-level, that only becomes compounded. That’s why we’re adding more data management - or clean-up - capabilities to our products, which will help you manage your data more effectively, reduce resource consumption, and ultimately improve the overall experience of your teams.
Another key performance update coming from Crowd is access based synchronization. With this improvement, only those users who have access to a given application will be synchronized, allowing you to save time in the synchronization process and improve performance by reducing the amount of user data that needs to be processed.
If you’re interested in learning more about other Data Center features that we’re working on, check out the Data Center roadmap.
Gaby Cardona
Technical PMM, Enterprise Marketing
Atlassian
3 comments