Is your Jira instance slowly leaking memory with ScriptRunner?

David Yu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 6, 2023

So here's an interesting observation I've been seeing. If we leave our host running for about three months, the system will eventually run out of memory and crash. (30GB sized instance with Xmx of 20GB with off-server DB).

The memory usage climbs to 29GB and floats there until the linux out-of-memory killer process selects Jira and kills it. (I've since configured our service to restart automatically)

sysmemory.jpg

Here's the investigation steps I used in case this is helpful:

First, let's print out the heap histogram on the live running service using jcmd. This runs really quickly: <PATH_TO_JDK>/bin/jcmd <JAVA_PID> GC.class_histogram |more

num #instances #bytes class name (module)

1: 13413784 643861632 java.lang.ThreadGroup (java.base@11.0.16.1)
2: 6956154 577988424 [B (java.base@11.0.16.1)

Looks who's #1, java.lang.ThreadGroup

Next, we can perform a heap dump of all live objects using jcmd as well, and utilize Eclipse Memory Analyzer on it. We use a 20gb XMX heap size and it took about a minute to finish writing the heap dump, which is not bad at all.

Eclipse Memory Analyzer points to java.lang.ThreadGroup as the suspect as well, but when we drill into it, we see tons of references inside to ClassGraph.

classgraph.jpg

Now, to see where ClassGraph is being used, I just do a grep on my plugins:

grep -R "ClassGraph" *
Binary file installed-plugins/plugin.7509544786347732491.groovyrunner-7.7.0.jar matches

Thinking maybe it's some custom scriptrunner code that's causing it, I setup a brand new Jira locally without any custom scriptrunner code (just using built-in ones).

I was able to generate the same leak by accessing ScriptRunner's Listener settings page, and printing out the live heap histogram...sure enough, java.lang.ThreadGroup keeps growing.

I've sent this info to Adaptavist Support and if I hear any updates, I'll share. If you've experienced something similar as well, I'd love to hear about it!

Update from Adaptavist: Bug is fixed! Once it ships, you can check for release version here.

1 answer

1 accepted

1 vote
Answer accepted
Reece Lander _ScriptRunner - The Adaptavist Group_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 6, 2023

Hey @David Yu 

I'm Reece, technical lead for ScriptRunner for Jira.

I admire your investigation skills, this indeed appears to be a very slow leak, slow enough that nobody has probably made the connection before, kudos to you!

I believe I have reproduced this locally and I have a heap dump, my initial hunch is that this is a bug in the ClassGraph library, it appears to not be explicitly destroying thread groups, neither setting such groups as daemons.

ScriptRunner repeatedly makes calls to ClassGraph in one part of the codebase, which is not the standard usage pattern, and may explain an unidentified bug in the library.

I see you have raised a support ticket, I'll chat with the rest of the team tomorrow and get that escalated for you.

Thank you once again for your persistence in finding reproduction steps for us, you've likely saved me days/weeks of effort.

Cheers!

Reece

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events