We have over 14 million records in the Changelogs table in Jira. A record is created whenever most values are changed in a ticket. I use this table to create dashboards measuring the time between status changes in a ticket. The process to pull in this data nightly has become unmanageable. The table has a field for Issue Created Date, I would like to delete all records older than a specific date, can this be done?
Welcome to the community.
As far as I know it is not possible via API. On the other hand, you can do that deleting those records in the changehistory and changeitem database tables but that's always tricky and not recommended.
I do not think it is a good idea to calculate that data from database, as it would not be a performant way. Also, it would have a bad impact on database statistics/index hits. Instead, you should use Java/REST API to get changelogs and calculate duration.
As an alternative, you can use an app for that data without a need to delete any data.
I hope it helps
I understand your answer, but in my case, it does not apply. I am pulling in the data to Sisense using a data connector and duplicating the log file in a Sisense data model. This connector uses the Jira APIs and I can't access these APIs or the Jira database directly. I can use a query in the connector to exclude records, but this creates another problem where the connection times out as millions of records are being bypassed. I need a way to actually shrink this file in the Jira database. This process now takes 5 hours, reducing the size of the log file is the only way to solve it.
If it is a database table, that does not cause any problem if you do not access bulk data. Generally changegroup and changehistory tables are accessed via API and only for specified issues.
But if it is a log file (I do not know such a logfile which contains status changes though), you can limit the log file, it is by default 20MB and if it exceeds it rolls over by separate files.
Long story short, I need a bit clarification please!
I use the changed from and to values to determine when a specific status change occurred and the date/time to measure how long it took from one specific change to another. The data is valuable, but we now have over two years of these and we don't need more than six months. Meanwhile, the bigger the table gets the longer it takes to import it into our Sisense BI tool.
The data is imported through the CDATA Jira connector which uses the native APIs. As I explained earlier, I have no option but to pull in bulk data and it is not a problem as long as the size of this table can be controlled.
Now it is clear, the tables in this context are: changegroup and changeitem tables. These tables do not store unnecessary log info, I strongly recommend that these tables should not be deleted. You need to think very carefully before you do this.
These tables are well-indexed both at the database level and at the application layer (by Apache Lucene). Historically your issues might need that information, for instance, you might be using an app that displays some date/user custom fields according to these historical data.
My suggestion: Do not directly access these tables! Only your BI systems should access and read from them incrementally. Then you can work with offline tables even trimming the old denormalized data.
I hope it was helpful
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event
You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events