What are the practical limits for Insight (CSV) import?

I'm importing a large number of objects from external sources. The objects themselves are fairly simple, just 5-10 attributes, but there are more than 100.000 of them.

Are there any tried-and-true suggestions for how to handle the import process? The data I use comes from external sources (which are out of my control) and it needs to be refreshed periodically.

Based on my testing of the import feature, this is too much for the CSV importer. I can get 25.000 objects imported/updated from CSV but the process is still quite slow. How much memory should be given to JVM, does anybody have any real-life experiences to share? Is the process memory or CPU bound?

On linux, splitting the incoming CSV data file into separate chunks is easy. What I find impractical is that after the split I must either a) create separate duplicate import configurations for each chunk (5-10, depending on how large the individual chunks are), or b) in some kind of a looping process copy/symlink the separate files into one known filename which the import process knows to look for, and then after the import switch to the next chunk. With the help of cron to drive the looping + scheduled imports on Insight this might just be doable, although somewhat annoying as a long-term solution.

Should I consider switching to a different importer altogether? Would it somehow make the process faster and less involved if I first imported the CSV as "raw data" into an external db from which the Insight db importer would do its job? I don't think there are any benefits to be gained from using the JSON importer as that is file based just like the CSV importer is.

1 answer

1 accepted

This widget could not be displayed.

Hi Tomi,
I would look into the following documentation for performance and tuning regarding Insight: https://documentation.riada.se/insight/latest/system-requirements

Let us know if you find that useful.

Best Regards
Alexander

It is useful, yes. We will look deeper and try to find an optimal solution.

Just out of curiosity, is the db import any less CPU/memory hungry? Do I gain anything by creating a temporary db table from which Insight could do the importing? The original csv would be quite easy (and fast!) to dump into a fresh table (i.e. drop table xxx; import into new table xxx from csv) each time the external data source produces a new fresh set of data.

Hi Tomi,

I don't know if there is any difference between using the CSV and the DB import, in the end they both ends up creating the same data to import. I think what's more important is to look at the system requirements.

Best Regards
Alexander

Suggest an answer

Log in or Sign up to answer
Atlassian Summit 2018

Meet the community IRL

Atlassian Summit is an excellent opportunity for in-person support, training, and networking.

Learn more
Community showcase
Published Jul 25, 2018 in Marketplace Apps

Jira Cloud and Bitbucket Cloud Integration with Microsoft Teams

One of the newest products in the Microsoft family - Microsoft Teams,  is a chat-based hub for teamwork that integrates all the people, content, and tools your team needs to be more engaged and ...

733 views 0 3
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you