The aim of this article is to create awareness of circumstances that influence the time a user has to wait for a complex database query to produce results. For this, we introduce the query footprint. We want a small footprint to gain quick results.
To get a measure of the footprint, we look at how many comparisons the computer has to make to produce the results. This is illustrated with examples using a model database with a limited amount of objects. In these examples, we see how the computer evaluates objects by their properties by comparing their property values to desired target values. The total number of these conditional comparisons (conditionals) in a query determines the overall footprint.
By comparing the performance footprint of different types of queries, we gain insight into key factors that influence the query footprint and why in some cases it is beneficial to use additional techniques such as Automation assisted querying to obtain the same results at a much quicker rate.
When polling the content of a database using a single conditional, a certain property of all the objects in that database is compared to the target value. Example 1 shows how the performance footprint is defined in this case.
Let's assume we have a database with objects that have a property called Shape. The database is filled with 60 objects of a certain shape. For this exercise the database is filled with objects with their respective Shape property set to:
|
We have the following query to find all objects that are round or square:
Shape = square
To compute the answer to this query, the computer iterates over the complete database of 60 objects to compare if the shape of that object meets the desired condition (in this case square). We define the footprint of a query to be the number of comparisons the computer has to make: In this query, that's 60, so the footprint of this query has a value of 60.
When a query consists of multiple conditionals, each of these is executed against all objects of the database. The results are subsequently processed into an end result, which is presented to the user. The execution time increases linearly with an increasing number of conditionals, which is explained in example 2 by calculating the footprint that was defined earlier.
We have the same object database as in example 1. But now we want to find the objects that are round or square:
Shape = round or Shape = square
To compute the answer to this query, the computer first iterates over the database to find the round objects, followed by a second iteration over the database to find the star shaped objects. The query performance footprint is therefore 60 plus 60 equals 120.
Note that the above query can be somewhat optimized if the program allows for a value-checking array of property values, as in:
Shape in (round, star)
This query only needs one iteration over the complete database, yet is a bit slower than example 1, because the objects that don't have a round shape need to be checked consecutively for being star shaped, giving it an performance footprint of 60+50 = 110.
Top tip:
As a value checks out to be a target value, the remainder of the value-checking array is skipped. This means that it is worthwhile to populate the value-checking array in order of most common value to least common value.
Things can get obstipated quickly when having to query multiple conditions over multiple properties. Despite what one might think is that the order of execution is not a key factor here, though syntax would suggest otherwise. It is in fact the approach of the problem that can make a decisive difference in how things pan out in terms of the performance footprint needed. Typical use case is the filter used by management when constructing a Plan (A.K.A. Roadmap view) in Jira that would have the Jira work items (epics, stories, tasks) of multiple Jira projects that hierarchically fall under management initiatives/features.
Let's assume that the objects in our database have two properties: Shape and Color. As in the previous examples given above we have the same 60 object, yet each has a random color value chosen from the values red, white and blue.
Our goal is to find the red colored circles and red colored triangular objects.
There are a couple of different approaches to this problem: the hardest way (a), the hard way (b) and the smart way (c). Each approach will start be explained with the model database example.
This approach is based on gathering all possible objects and subsequently eliminating the objects that do not match an other property.
(Shape = circle or Shape = triangles) and (Color != white or Color !=blue)
This query needs 120 iterations for the Shapes, followed by 120 iterations to assess all the unwanted objects by color, followed by 20 iterations to check the combination of found shapes against the objects having the unwanted colors, giving this query a total footprint of a whopping 260.
This approach is based on gathering all possible objects and subsequently eliminating the objects that match an other property.
(Shape = circle or Shape = triangle) and (Color = red)
This query needs 120 iterations for the Shape, followed by 60 iterations to assess all objects with the wanted color, followed by 20 iterations to check the combination of found shapes against the objects having the correct color, giving it a total footprint of 200. So these tactics are marginally better than approach A.
Create an automation that checks on a Color change. When an object is changed to red and it happens to be a circle object or a triangle object, fill a newly created property Target with the value "true" and solely query this property. Yes, this feels like cheating, but it is in fact a pre-marking of the wanted objects at a time the user isn't using the query to select the objects. The user experience during querying is greatly enhanced as the query is simplified drastically to:
Target = true
Similar to example 1, this query yields the correct results with a total performance footprint of 60. Yes, it also needs the execution time of the automation, and we need an initial script to tag targets that are already in the database, but the execution of these automations does NOT coincide with the moment that the user wants to execute the query.
Finding the correct trigger for the automation isn't that hard: a parent link change trigger can be used to pinpoint work items that should be tagged the moment they are attached to a management initiative. Somewhat more of a challenge would be dealing with multiple management domains (read: multiple tags). Especially when project contributions span several domains, a parent link change can also lead to having to delete the proper tags.
There are circumstances when querying requires extra attention:
In the extreme cases where combinations of the above occur, a tailored solution such as approach C is preferred to prevent severe degradation in Jira query performance. Coordination with your Jira Administrator is in these cases highly recommended.
Dick
Jira Administrator for Wageningen University and Research
Wageningen University
Netherlands
70 accepted answers
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
6 comments