Hi Community,
I am designing a scalable data pipeline to ingest Jira data into Google BigQuery for advanced analytics. My goal is to move beyond simple reporting and build a robust Data Warehouse structure.
The challenge I am facing is extracting complete data (Issues + All Comments + All Worklogs + Full Changelog/History) without hitting rate limits or facing severe performance degradation on large instances (50k+ issues).
The Architecture I am considering:
Incremental Loading: Using JQL updated >= last_run_date to fetch only modified issues.
The "Search" API for bulk data: Using POST /rest/api/3/search with fields like ['*all', 'worklog', 'comment'] and expand=['changelog'].
My Bottlenecks/Questions:
Truncation of Nested Arrays: The Search API seems to truncate nested fields (e.g., it only returns the first 20 comments or worklogs, and limited history).
Question: Is the standard pattern to inspect the total vs maxResults property for each issue in the search response, and then trigger a separate GET /issue/{id}/comment call only for those specific "heavy" issues? Or is there a way to paginate nested collections within the Search API?
Changelog Performance: fetching expand=changelog significantly slows down the Search response.
Question: For a Data Warehouse pipeline, is it better to fetch the "Current State" (Issue fields) via Search, and fetch the History/Changelog via a separate asynchronous process?
Deletions:
Question: Since JQL only finds existing issues, what is the best practice to detect deleted issues or deleted worklogs to keep the Data Warehouse in sync?
I am trying to avoid the "N+1 request" pattern where I loop through every issue ID, as that won't scale.
Any advice on the most efficient "Bulk Export" strategy for these datasets would be greatly appreciated.
Thanks!
Hi @Flester Iulian ,
What you’re trying to do — ingest complete Jira data including all comments, worklogs, and full changelog/history for large instances — is not straightforward with the standard Jira REST APIs. As you’ve noticed, the Search API truncates nested fields, paginating comments and worklogs, and expand=changelog can be extremely slow on large datasets. Detecting deletions is also tricky, because JQL only returns existing issues. So trying to do this efficiently at scale without hitting rate limits is very challenging.
Our team has developed a set of Jira apps that address exactly these challenges:
Issue History Reporter for Jira
Captures the full history/changelog of issues over time.
Allows you to export historical snapshots of issues to CSV for analysis.
Handles large datasets efficiently, avoiding the N+1 request pattern.
Time in Status Reporter for Jira
Calculates how long each issue spent in each status.
Works even for complex workflows with multiple issue types.
Exports data in a format ready for analytics in BigQuery, Excel, or BI tools.
Advanced Comments Search for Jira
Lets you search and filter comments across issues, authors, and dates.
Exports complete comment data reliably to CSV without truncation.
Works on large instances where the API alone would struggle.
Using these apps, you can effectively build a data warehouse pipeline:
Step 1 – Current state of issues: Use the apps to export all issue fields.
Step 2 – Comments: Use Advanced Comments Search to extract all comments without truncation.
Step 3 – History/Changelog: Use Issue History to get full historical snapshots of issues over time.
Step 4 – Time metrics: Optionally, use Time in Status to enrich your dataset with duration metrics per status.
All exports can be done incrementally and reliably, giving you a full picture of your Jira instance without manually looping over thousands of issues or hitting rate limits.
You can find all our apps here: Simitech Marketplace Apps.
This approach is far more scalable than trying to build it entirely with REST API calls.
Regards,
Petru
Hi @Flester Iulian ,
If you need to get the complete work item history for audits and analytics, meaning what was changed, when, and who is responsible for it (including comment changes and deleted work items), then building this fully on your own with Jira APIs becomes a lot of heavy engineering and ongoing maintenance.
This is exactly where Issue History for Jira (Work Item History) by SaaSJet helps. Using this app, you can do the following:
Track all Jira work items updates (field changes, comment edits, status updates, and user actions) in one place.
Export the full change history of work items, including comment changes, to Excel or CSV, ready for audits or loading into BigQuery.
Also, you can track and export deleted work items if needed. This is something Jira APIs don’t handle well at all.
So, you can export audit-ready history, load it into BigQuery, and focus on analytics instead of Jira API limits.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Welcome to the community !!
You may want to have a look at a mktplace app for exporting issue changelogs.
We have built an app to extract changelog data in a simple and easy to use interface. The data can be easily exported to a CSV file as well.
It provides complete details of who changed the data, what was changed and when.
Do give it a try.
Disclaimer : I am part of the app dev team
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.