What are the practical limits for Insight (CSV) import?

Tomi Kallio November 21, 2017

I'm importing a large number of objects from external sources. The objects themselves are fairly simple, just 5-10 attributes, but there are more than 100.000 of them.

Are there any tried-and-true suggestions for how to handle the import process? The data I use comes from external sources (which are out of my control) and it needs to be refreshed periodically.

Based on my testing of the import feature, this is too much for the CSV importer. I can get 25.000 objects imported/updated from CSV but the process is still quite slow. How much memory should be given to JVM, does anybody have any real-life experiences to share? Is the process memory or CPU bound?

On linux, splitting the incoming CSV data file into separate chunks is easy. What I find impractical is that after the split I must either a) create separate duplicate import configurations for each chunk (5-10, depending on how large the individual chunks are), or b) in some kind of a looping process copy/symlink the separate files into one known filename which the import process knows to look for, and then after the import switch to the next chunk. With the help of cron to drive the looping + scheduled imports on Insight this might just be doable, although somewhat annoying as a long-term solution.

Should I consider switching to a different importer altogether? Would it somehow make the process faster and less involved if I first imported the CSV as "raw data" into an external db from which the Insight db importer would do its job? I don't think there are any benefits to be gained from using the JSON importer as that is file based just like the CSV importer is.

1 answer

1 accepted

0 votes
Answer accepted
Alexander Sundström
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 22, 2017

Hi Tomi,
I would look into the following documentation for performance and tuning regarding Insight: https://documentation.riada.se/insight/latest/system-requirements

Let us know if you find that useful.

Best Regards
Alexander

Tomi Kallio November 22, 2017

It is useful, yes. We will look deeper and try to find an optimal solution.

Just out of curiosity, is the db import any less CPU/memory hungry? Do I gain anything by creating a temporary db table from which Insight could do the importing? The original csv would be quite easy (and fast!) to dump into a fresh table (i.e. drop table xxx; import into new table xxx from csv) each time the external data source produces a new fresh set of data.

Alexander Sundström
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 22, 2017

Hi Tomi,

I don't know if there is any difference between using the CSV and the DB import, in the end they both ends up creating the same data to import. I think what's more important is to look at the system requirements.

Best Regards
Alexander

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events