I would like to extract data from a Jira instance database and have been advised that the most sensible solution would be to use the API, as it would still allow my connection to work even if my Jira instance is updated.
Could you tell me how this might work and what form of data I can get out?
Hi, I've been looking at extracting data for reporting, and I think you need to think through all the options, and why you are doing it.
People asking for reporting fall into two broad groups - those who don't know what the front-end can do, and those who do and have established that they cannot get what they need. The first thing to do is get your administrators to look at the reports you want to establish what group the reports fall into. I think it is important to get the question clearly defined before you go any further, as a lot of my users ask for reports which turn out to be of little use, because they don't really understand the question.
You then have to decide what the most appropriate route is:
1. Use the front-end
2. Specific reporting plugins
3. Write an external application that pulls Jira data via SOAP, REST or XML
4. Extend Jira output if 3 is not enough
5. Use a generalised reporting plugin (for example, EasyBI, Tempo)
6. You can access the database directly (Last resort, can be painful to set up and maintain)
Ok, Business Objects is a good way to approach point 6. It will be powerful, but you need to be very careful, because:
It lets people write reports that stress the database. Use a copy of the live data somehow (Regular nightly backup, replication, etc)
You will need to do data mapping. That will need to be checked and updated on upgrades, and possibly even just changes to data.
It can bypass all the security, unless you build that into your data map (not an issue for some people, huge for others)
It lets people write reports that are not useful because they can do what they think they want, without really understanding the data.
I would definitely talk to your administrators if you want to do this - you'll need their help with all of the worries above.
I've got several standby DBs to use, so thanks for that. I also have colleagues who are very close to the data, so have a validation plan as well.
What I'd be really keen on is a method of extracting the data I need via the API so I don't have to check the data mapping on upgrades. Is that a possibility?
It depends on how you mean "via the API". I'm not a developer, so I don't understand it, but my admin tells me that the API is the way coders ask for things. Everything uses the API one way or another, except the database, but the reason for using the API is to use common functions and ideas. The database has a load of tables with issue data in them, and you need to know how they work together to get data out. In the API, you can just say "look at that issue" and "do things to that issue" and not worry about the database.
To use the API directly, I think you need plugins that live inside Jira, the way he explained it. To extract data with the API means writing new stuff, or using the built-in stuff that is there.
If you don't want to deal with plugins, then you can use the data it publishes - XML is a good way of getting filtered data out (some reporting can be done with an XML feed from Jira being pushed through a XSLT file, so it's reformatted nicely), and I did some stuff with SOAP, which I need to push up to REST soon because that's the way Atlassian are recommending. (That's why I'm in the forum - I've been digging for reporting tricks because I'd rather avoid coding lots of stuff outside Jira)
We are using business objects to create daily reports and we haven't had any issues. We are pulling from the database directly via sql to the business objects server with a few reports that are very db taxing and haven't seen any issues, however: 1. We are using enterprise grade database servers. 2. We run our reports nightly, durring off hours. 3. We have spaced the reports out to minimize overlap. 4. We (the jira admins) control the reports that run. 5. we didn't select * from *
As far as maintainabilty is concered, we just did an upgrade from 3.13 to 5.1 and only had to fix one report. I think it had to do with a change to the way custom field names were being stored.
That sounds very much like one of the users I've got. Their reports were absolutely killing the database, because they were so inefficient. It was a large Jira (I think - tens of thousands of issues sounds big) and they were doing bad things - your "don't do select * from" was a big problem. Narrowing it down from "everything" to "only the issues and fields we actually want to report on" fixed a good part of the problem.
We did see that Business Objects queries were imposing a load 10,000 times higher than that of just Jira alone, which is why we looked at moving to a backup nightly copy of the database. I imagine you've either got quite a small Jira, simple queries, or your "only at night and only one at a time" is minimising the impact.
Very good point about keeping the Jira admins in charge of the queries - that does mean they are best placed to avoid causing issues, and spotting problems if they do it.
With these users, we found that once we'd simplified, clarified and optimised the reports to hand over to the admins, one of them pointed out that there was reporting built into Jira that answered the question. Her answer was a lot better than the optimised report, was in use by other teams without causing problems, and it was real-time (the business objects were reporting off a nightly copy because of the load issues, so there was lag). That's why I've said "talk to your admins" a couple of times - in many cases, I've found the users are really asking for reports that Jira already provides, or reports that aren't realistic. Keep them involved!
We are probably in the same boat as in size of the installation it seems. The worse we had was a report going four sql joins deep which we did optimizing on and made a big difference. Nick, this is where point number 5 comes in, when you select from the database to return a value from a table, only select the values you need, not "*". Since you didn't get my 'select * from *' reference I would recommend you find a dba for assistance or at the very least test in a separate environment before going forward with working from the database directly.
I honestly don't know what kind of load our reports place on the databases as our dba's handle that but, I do believe we are running a cluster, not a single server (this sql cluster is shared by a few other internal apps and is not just used by jira). I think that for best practices your use of a second database for reporting is a good idea and if a report or reporting server goes wild or locks up, it wouldn't impact the production environment.
Most of the reports we are running have to do with grouping and displaying multiple issues across multiple jira projects together by custom field values and release numbers. Most of our reports are for management or auditing purposes.
If you extract data from the JIRA db to another application, it's ok, no matter what data it is (of course, normal DB isolation rules apply, e.g. dirty read, commited, serializable).
Jira has caches, it's true, but modifications are always commited into the database then entered in the caches, so from that point of view you will be safe. That API caution should be employed ONLY when you are modifying data.
If you bypass the API, you can use any ORM you like (Hibernate / myBatis, etc). Since you are doing only reads, your solution of not using API is safe.
The solutions using the API would be : 1/ direct integration with the services JIRA offers (REST/SOAP) if these are covering your needs 2/ create your own services into that JIRA to get your data 3/ raise a read-only "zombie" JIRA on the same DB (needs some digging) and exploit it.
There may be others ...
We're looking for participants for another workshop at Atlassian! We need Jira admins who have interesting custom workflows, issue views, or boards. Think you have a story to sh...
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs