I'm importing a large number of objects from external sources. The objects themselves are fairly simple, just 5-10 attributes, but there are more than 100.000 of them.
Are there any tried-and-true suggestions for how to handle the import process? The data I use comes from external sources (which are out of my control) and it needs to be refreshed periodically.
Based on my testing of the import feature, this is too much for the CSV importer. I can get 25.000 objects imported/updated from CSV but the process is still quite slow. How much memory should be given to JVM, does anybody have any real-life experiences to share? Is the process memory or CPU bound?
On linux, splitting the incoming CSV data file into separate chunks is easy. What I find impractical is that after the split I must either a) create separate duplicate import configurations for each chunk (5-10, depending on how large the individual chunks are), or b) in some kind of a looping process copy/symlink the separate files into one known filename which the import process knows to look for, and then after the import switch to the next chunk. With the help of cron to drive the looping + scheduled imports on Insight this might just be doable, although somewhat annoying as a long-term solution.
Should I consider switching to a different importer altogether? Would it somehow make the process faster and less involved if I first imported the CSV as "raw data" into an external db from which the Insight db importer would do its job? I don't think there are any benefits to be gained from using the JSON importer as that is file based just like the CSV importer is.
I would look into the following documentation for performance and tuning regarding Insight: https://documentation.riada.se/insight/latest/system-requirements
Let us know if you find that useful.
It is useful, yes. We will look deeper and try to find an optimal solution.
Just out of curiosity, is the db import any less CPU/memory hungry? Do I gain anything by creating a temporary db table from which Insight could do the importing? The original csv would be quite easy (and fast!) to dump into a fresh table (i.e. drop table xxx; import into new table xxx from csv) each time the external data source produces a new fresh set of data.
Badges are a great way to show off community activity, whether you’re a newbie or a Champion.Learn more
Trello is one of the most effective tools for driving your sprints. It's customizable for every Agile team and product owners and Scrum masters (SM) love it. However, Agile teams often struggle with:...
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs