Hi Community,I am designing a scalable data pipeline to ingest Jira data into Google BigQuery for advanced analytics. My goal is to move beyond simple reporting and build a robust Data Warehouse structure.The challenge I am facing is extracting complete data (Issues + All Comments + All Worklogs + Full Changelog/History) without hitting rate limits or facing severe performance degradation on large instances (50k+ issues).The Architecture I am considering:Incremental Loading: Using JQL updated >= last_run_date to fetch only modified issues.The "Search" API for bulk data: Using POST /rest/api/3/search with fields like ['*all', 'worklog', 'comment'] and expand=['changelog'].My Bottlenecks/Questions:Truncation of Nested Arrays: The Search API seems to truncate nested fields (e.g., it only returns the first 20 comments or worklogs, and limited history).Question: Is the standard pattern to inspect the total vs maxResults property for each issue in the search response, and then trigger a separate GET /issue/{id}/comment call only for those specific "heavy" issues? Or is there a way to paginate nested collections within the Search API?Changelog Performance: fetching expand=changelog significantly slows down the Search response.Question: For a Data Warehouse pipeline, is it better to fetch the "Current State" (Issue fields) via Search, and fetch the History/Changelog via a separate asynchronous process?Deletions:Question: Since JQL only finds existing issues, what is the best practice to detect deleted issues or deleted worklogs to keep the Data Warehouse in sync?I am trying to avoid the "N+1 request" pattern where I loop through every issue ID, as that won't scale.Any advice on the most efficient "Bulk Export" strategy for these datasets would be greatly appreciated.Thanks!

Hi @Flester Iulian , What you’re trying to do — ingest complete Jira data including all comments, worklogs, and full changelog/history for large instances — is not straightforward with the standard Jira REST APIs. As you’ve noticed, the Search API truncates nested fields, paginating comments and worklogs, and expand=changelog can be extremely slow on large datasets. Detecting deletions is also tricky, because JQL only returns existing issues. So trying to do this efficiently at scale without hitting rate limits is very challenging. Our team has developed a set of Jira apps that address exactly these challenges: Issue History Reporter for Jira Captures the full history/changelog of issues over time. Allows you to export historical snapshots of issues to CSV for analysis. Handles large datasets efficiently, avoiding the N+1 request pattern. Time in Status Reporter for Jira Calculates how long each issue spent in each status. Works even for complex workflows with multiple issue types. Exports data in a format ready for analytics in BigQuery, Excel, or BI tools. Advanced Comments Search for Jira Lets you search and filter comments across issues, authors, and dates. Exports complete comment data reliably to CSV without truncation. Works on large instances where the API alone would struggle. Using these apps, you can effectively build a data warehouse pipeline: Step 1 – Current state of issues: Use the apps to export all issue fields. Step 2 – Comments: Use Advanced Comments Search to extract all comments without truncation. Step 3 – History/Changelog: Use Issue History to get full historical snapshots of issues over time. Step 4 – Time metrics: Optionally, use Time in Status to enrich your dataset with duration metrics per status. All exports can be done incrementally and reliably, giving you a full picture of your Jira instance without manually looping over thousands of issues or hitting rate limits. You can find all our apps here: Simitech Marketplace Apps. This approach is far more scalable than trying to build it entirely with REST API calls. Regards, Petru

Community
Q&A
Jira
Questions
Best practices for High-Volume, Full-History Data Export (Issues, Worklogs, Changelogs) for DH

Best practices for High-Volume, Full-History Data Export (Issues, Worklogs, Changelogs) for DH

Hi Community,

I am designing a scalable data pipeline to ingest Jira data into Google BigQuery for advanced analytics. My goal is to move beyond simple reporting and build a robust Data Warehouse structure.

The challenge I am facing is extracting complete data (Issues + All Comments + All Worklogs + Full Changelog/History) without hitting rate limits or facing severe performance degradation on large instances (50k+ issues).

The Architecture I am considering:

Incremental Loading: Using JQL updated >= last_run_date to fetch only modified issues.
The "Search" API for bulk data: Using POST /rest/api/3/search with fields like ['*all', 'worklog', 'comment'] and expand=['changelog'].

My Bottlenecks/Questions:

Truncation of Nested Arrays: The Search API seems to truncate nested fields (e.g., it only returns the first 20 comments or worklogs, and limited history).
- Question: Is the standard pattern to inspect the total vs maxResults property for each issue in the search response, and then trigger a separate GET /issue/{id}/comment call only for those specific "heavy" issues? Or is there a way to paginate nested collections within the Search API?
Changelog Performance: fetching expand=changelog significantly slows down the Search response.
- Question: For a Data Warehouse pipeline, is it better to fetch the "Current State" (Issue fields) via Search, and fetch the History/Changelog via a separate asynchronous process?
Deletions:
- Question: Since JQL only finds existing issues, what is the best practice to detect deleted issues or deleted worklogs to keep the Data Warehouse in sync?

I am trying to avoid the "N+1 request" pattern where I loop through every issue ID, as that won't scale.

Any advice on the most efficient "Bulk Export" strategy for these datasets would be greatly appreciated.

Thanks!

5 answers

1 accepted

0 votes

Answer accepted

Hi @Flester Iulian ,

What you’re trying to do — ingest complete Jira data including all comments, worklogs, and full changelog/history for large instances — is not straightforward with the standard Jira REST APIs. As you’ve noticed, the Search API truncates nested fields, paginating comments and worklogs, and expand=changelog can be extremely slow on large datasets. Detecting deletions is also tricky, because JQL only returns existing issues. So trying to do this efficiently at scale without hitting rate limits is very challenging.

Our team has developed a set of Jira apps that address exactly these challenges:

Issue History Reporter for Jira
- Captures the full history/changelog of issues over time.
- Allows you to export historical snapshots of issues to CSV for analysis.
- Handles large datasets efficiently, avoiding the N+1 request pattern.
Time in Status Reporter for Jira
- Calculates how long each issue spent in each status.
- Works even for complex workflows with multiple issue types.
- Exports data in a format ready for analytics in BigQuery, Excel, or BI tools.
Advanced Comments Search for Jira
- Lets you search and filter comments across issues, authors, and dates.
- Exports complete comment data reliably to CSV without truncation.
- Works on large instances where the API alone would struggle.

Using these apps, you can effectively build a data warehouse pipeline:

Step 1 – Current state of issues: Use the apps to export all issue fields.
Step 2 – Comments: Use Advanced Comments Search to extract all comments without truncation.
Step 3 – History/Changelog: Use Issue History to get full historical snapshots of issues over time.
Step 4 – Time metrics: Optionally, use Time in Status to enrich your dataset with duration metrics per status.

All exports can be done incrementally and reliably, giving you a full picture of your Jira instance without manually looping over thousands of issues or hitting rate limits.

You can find all our apps here: Simitech Marketplace Apps.

This approach is far more scalable than trying to build it entirely with REST API calls.

Regards,

Petru

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Flester Iulian ,

Update: since this answer was posted, we’ve released Issue Insights for Jira, which bundles the three apps mentioned above into a single offering:

👉 Issue Insights for Jira

The bundle includes:

Issue History Reporter for Jira (full changelog and point-in-time issue snapshots)
Time in Status Reporter for Jira (status durations, counts, averages, cycle/lead time)
Advanced Comments Search for Jira (complete comment export and filtering)

Functionally, the approach remains the same as described above, but the bundle makes it easier to adopt the full toolset together when building scalable Jira data exports or analytics pipelines.

Regards,

Petru

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

2 votes

Hi @Flester Iulian ,

If you need to get the complete work item history for audits and analytics, meaning what was changed, when, and who is responsible for it (including comment changes and deleted work items), then building this fully on your own with Jira APIs becomes a lot of heavy engineering and ongoing maintenance.

This is exactly where Issue History for Jira (Work Item History) by SaaSJet helps. Using this app, you can do the following:

Track all Jira work items updates (field changes, comment edits, status updates, and user actions) in one place.
Export the full change history of work items, including comment changes, to Excel or CSV, ready for audits or loading into BigQuery.

export-in-jira (2).png

Also, you can track and export deleted work items if needed. This is something Jira APIs don’t handle well at all.

So, you can export audit-ready history, load it into BigQuery, and focus on analytics instead of Jira API limits.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

2 votes

Hi @Flester Iulian

Welcome to the community !!

You may want to have a look at a mktplace app for exporting issue changelogs.

We have built an app to extract changelog data in a simple and easy to use interface. The data can be easily exported to a CSV file as well.

It provides complete details of who changed the data, what was changed and when.

Issue History Reports

Do give it a try.

Disclaimer : I am part of the app dev team

Issue History.PNG

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

I was looking for strictly api that allows me to do what ever i need.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

@Flester Iulian

There are several app recommendations posted to your question, let's not forget the most popular Better Excel Exporter app.

You can use it as a Swiss army knife as this single app supports all the data you mentioned, including work items, comments, worklogs and change history. Here is a gallery of export samples.

jira-excel-export-issues-selected-fields-with-worklogs.png

As the XLSX file format is the "common language" supported by data processing tools, I am quite sure that you can transform the exported file to CSV or whatever input Google can import with reasonable efforts (even programmatically).

(Discl. this paid and supported app is developed by our team. Free for 10 users.)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Suggest an answer

Was this helpful?

Thanks!

Jira

DEPLOYMENT TYPE

CLOUD

PRODUCT PLAN

PREMIUM

PERMISSIONS LEVEL

Product Admin Site Admin

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Best practices for High-Volume, Full-History Data Export (Issues, Worklogs, Changelogs) for DH

5 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

DEPLOYMENT TYPE

PRODUCT PLAN

PERMISSIONS LEVEL

TAGS

Community showcase

Atlassian Community Events