Java GC blocks Jira

With the usage of Jira increasing, we notice that Jira is blocking out users for typically 120 to 300 seconds. The timing of this blocking seems to coincide with the garbage collector runs of the Java VM.

This is increasingly critical as users complain that when they have entered lengthy issue descriptions, they loose all their work (sometime in excess of half an hour). The number of user complaints is getting significant.

Is there a method to

a) determine the root cause of these short outages

b) fix it

Martin, Advantest

4 answers

1 accepted

This widget could not be displayed.

a) Options: <kbd>-verbose:gc -XX:+PrintGCDetails</kbd>

b) You have to tune your GC.

1. Keep in mind that most such problems arise because Java heap is too big.

2. You can always use -XX:+UseParallelGC (minor pauses acceptable) or even -XX:+UseConcMarkSweepGC (no pauses)

If you decide for parallel, cosider using -XX:ParallelGCThreads=<N procs - 1> (or around) and use -XX:MaxGCPauseMillis=2000 to indicate to the server that pauses greater than 2s are not desired.

There are plenty of articles describing how to minimize the pauses

[Edit] Try these to cut as much as possible the pauses:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing

Actually I was searching for that in both "Answers @ Atlassian" and Google. The majority of articles was just mentioning the GC logging parameters but I have those already.

Are there any specific pointers?

What's in gc.log ?

The one that caused a problem today:

2012-07-27T08:39:15.594+0200: 175993.979: [GC [PSYoungGen: 805630K-&gt;191877K(798656K)] 3554934K-&gt;2941630K(3594880K), 0.0779530 secs] [Times: user=1.12 sys=0.00, real=0.07 secs]
2012-07-27T08:39:58.266+0200: 176036.651: [GC [PSYoungGen: 798597K-&gt;162119K(830976K)] 3548350K-&gt;2912229K(3627200K), 0.0638500 secs] [Times: user=0.98 sys=0.01, real=0.06 secs]
2012-07-27T08:40:00.939+0200: 176039.324: [GC [PSYoungGen: 761095K-&gt;143249K(844288K)] 3511205K-&gt;2914494K(3640512K), 0.0581810 secs] [Times: user=0.94 sys=0.01, real=0.06 secs]
2012-07-27T08:40:04.790+0200: 176043.175: [GC [PSYoungGen: 742225K-&gt;156741K(908160K)] 3513470K-&gt;2928745K(3704384K), 0.0519910 secs] [Times: user=0.79 sys=0.00, real=0.06 secs]
2012-07-27T08:40:05.888+0200: 176044.274: [GC [PSYoungGen: 825669K-&gt;156447K(907136K)] 3597673K-&gt;2929053K(3703360K), 0.0541860 secs] [Times: user=0.86 sys=0.00, real=0.06 secs]
2012-07-27T08:40:20.023+0200: 176058.408: [GC [PSYoungGen: 825375K-&gt;179413K(990016K)] 3597981K-&gt;2952710K(3786240K), 0.0881010 secs] [Times: user=1.05 sys=0.07, real=0.09 secs]
2012-07-27T08:40:28.082+0200: 176066.467: [GC-- [PSYoungGen: 945173K-&gt;945173K(990016K)] 3718470K-&gt;3741389K(3786240K), 2.4579610 secs] [Times: user=1.93 sys=6.64, real=2.46 secs]
2012-07-27T08:40:30.540+0200: 176068.925: [Full GC [PSYoungGen: 945173K-&gt;0K(990016K)] [PSOldGen: 2796216K-&gt;762306K(2541568K)] 3741389K-&gt;762306K(3531584K) [PSPermGen: 240935K-&gt;235637K(249600K)], 73.4757870 secs] [Times: user=4.55 sys=0.76, real=73.46 secs]
2012-07-27T08:41:44.618+0200: 176143.004: [GC [PSYoungGen: 765760K-&gt;34425K(996928K)] 1528066K-&gt;796732K(3538496K), 0.0297050 secs] [Times: user=0.38 sys=0.00, real=0.03 secs]

So this Full GC took 73 seconds which was noticable

This is more of the typical one (takes between 0.5 and 4 sec for the full GC)

2012-07-27T13:34:21.545+0200: 193699.930: [GC [PSYoungGen: 539200K-&gt;33856K(609408K)] 3076719K-&gt;2586239K(3399424K), 0.0915780 secs] [Times: user=0.35 sys=0.02, real=0.09 secs]
2012-07-27T13:34:33.013+0200: 193711.398: [GC [PSYoungGen: 545152K-&gt;20362K(679104K)] 3097535K-&gt;2589388K(3469120K), 0.1007560 secs] [Times: user=0.41 sys=0.01, real=0.10 secs]
2012-07-27T13:34:37.009+0200: 193715.395: [GC [PSYoungGen: 613066K-&gt;30201K(682560K)] 3182092K-&gt;2599976K(3472576K), 0.0562450 secs] [Times: user=0.34 sys=0.04, real=0.06 secs]
2012-07-27T13:34:40.994+0200: 193719.379: [GC [PSYoungGen: 622905K-&gt;21104K(771776K)] 3192680K-&gt;2611651K(3561792K), 0.0600670 secs] [Times: user=0.34 sys=0.02, real=0.06 secs]
2012-07-27T13:34:46.649+0200: 193725.034: [GC [PSYoungGen: 711472K-&gt;39351K(772928K)] 3302019K-&gt;2641011K(3562944K), 0.0705590 secs] [Times: user=0.36 sys=0.02, real=0.07 secs]
2012-07-27T13:34:50.140+0200: 193728.525: [GC [PSYoungGen: 729719K-&gt;28886K(877504K)] 3331379K-&gt;2657883K(3667520K), 0.0467470 secs] [Times: user=0.34 sys=0.03, real=0.05 secs]
2012-07-27T13:34:51.458+0200: 193729.843: [GC [PSYoungGen: 833558K-&gt;2816K(880768K)] 3462555K-&gt;2654100K(3670784K), 0.0295270 secs] [Times: user=0.33 sys=0.00, real=0.03 secs]
2012-07-27T13:34:58.463+0200: 193736.848: [GC [PSYoungGen: 807488K-&gt;77042K(968384K)] 3458772K-&gt;2830436K(3758400K), 0.1496060 secs] [Times: user=0.94 sys=0.13, real=0.15 secs]
2012-07-27T13:34:58.613+0200: 193736.998: [Full GC [PSYoungGen: 77042K-&gt;0K(968384K)] [PSOldGen: 2753394K-&gt;1802531K(2796224K)] 2830436K-&gt;1802531K(3764608K) [PSPermGen: 255608K-&gt;243959K(256704K)], 4.8672610 secs] [Times: user=4.35 sys=0.01, real=4.87 secs]
2012-07-27T13:35:09.438+0200: 193747.823: [GC [PSYoungGen: 891328K-&gt;47021K(1018752K)] 2693859K-&gt;1849553K(3814976K), 0.1264450 secs] [Times: user=0.36 sys=0.02, real=0.13 secs]

As mentioned above: most of the full GCs take less than 2 seconds.

1) I would advise to set -Xms & -Xmx to the same value

2) Can you try adding -XX:NewRatio=4 (or even 5) and see what's happening ? Play also with -XX:MaxGCPauseMillis

3) Revert, and try:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing

Any update on this ?

This widget could not be displayed.

Could you please share the java arguments that you're using for running JIRA?

This widget could not be displayed.
-bash-3.2$ ps -ef |grep jira
jira       956 14625  0 11:33 pts/0    00:00:00 ps -ef
jira       957 14625  0 11:33 pts/0    00:00:00 grep jira
jira     12370     1 13 Jul25 ?        07:14:17 /usr/java/latest/bin/java -Djava.util.logging.config.file=/opt/apps/atlassian/jira-enterprise/conf/logging.properties -XX:MaxPermSize=512m -Xms1024m -Xmx4096m -Djava.awt.headless=true -Datlassian.standalone=JIRA -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dmail.mime.decodeparameters=true -Dhttp.proxyHost=webproxy.verigy.net -Dhttp.proxyPort=80 -Dhttp.nonProxyHosts=*.verigy.net|localhost -Djira.jelly.on=true -Dmail.mime.decodeparameters=true -XX:+PrintGC -XX:+PrintGCDetails -Xloggc:/opt/apps/atlassian/jira-enterprise/logs/gc.log -XX:+PrintGCDateStamps -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/opt/apps/atlassian/jira-enterprise/endorsed -classpath /opt/apps/atlassian/jira-enterprise/scripts/production/workflow/groovy/common:/opt/apps/atlassian/jira-enterprise/scripts/production/workflow/groovy/postactions:/opt/apps/atlassian/jira-enterprise/scripts/production/workflow/groovy/validators:/opt/atlassian/jira-data/scripts:/opt/apps/atlassian/jira-enterprise/bin/bootstrap.jar -Dcatalina.base=/opt/apps/atlassian/jira-enterprise -Dcatalina.home=/opt/apps/atlassian/jira-enterprise -Djava.io.tmpdir=/opt/apps/atlassian/jira-enterprise/temp org.apache.catalina.startup.Bootstrap start
root     14623 13614  0 09:49 pts/0    00:00:00 su - jira
jira     14625 14623  0 09:49 pts/0    00:00:00 -bash
postgres 31055 23969  0 Jul26 ?        00:03:09 postgres: jirausr jiradb 127.0.0.1(21248) idle
postgres 31072 23969  0 Jul26 ?        00:02:26 postgres: jirausr jiradb 127.0.0.1(21254) idle
postgres 31077 23969  0 Jul26 ?        00:03:54 postgres: jirausr jiradb 127.0.0.1(21258) idle
-bash-3.2$

This widget could not be displayed.

Here is the proposal from Atlassian support. It' slightly different:

Based on the Database statistic, it seems that this is a medium size instance. Based on the GCViewer image as per attached in the screenshot GClog, it seems that the timespent for the GC is really high for on the 27th July 2012. We also notice that the heap usage was high during the day. Since the time taken for the GC full GC is high, this might be cause by the size of the heap that Full GC need to clear is very large.

Therefore, we would like to recommend to adjust the current JVM input parameter to the following. In Linux, please edit the <tt>JIRA_Install/bin/setenv.sh</tt>:

  • Use a parallel collector: <tt>-XX:+UseParallelOldGC</tt>. This will ensure that GC will perform the Garbage Collection more frequently. But, every collection should be small thus the lag time should be reduce.
  • Disable remote clients from triggering a full GC event <tt>-XX:+DisableExplicitGC</tt>
  • Set the Young space up to 30-40% of the overall heap. Eg: <tt>-XX:NewSize=1229m</tt>
  • optionally, reduce the size of the -Xmx to 3GB because as a benchmark we allocate 1 GB of heap to instance with 100,000 of issues. Reducing the -Xmx will reduce the amount of heap that GC need to clear for each Full GC.

Do let us know how it goes.

At the moment we have implemented the solution from Radu (using the concurrent GC). As we are talking about a production instance here, we can only change things like this during a downtime window. This has happened Sunday night. So we are collecting data since about 10 hours.

So far no lengthy GC outages have been observed and the gc.log indicates the longest time that the concurrent GC did spend was about 1 sec.

1) Java heap is too big, agree. 2) Smaller young generation, agree 3) ok for <tt>DisableExplicitGC and </tt>UseParallelOldGC.

Please keep us informed with your case.

TIA,

R

Intellectual curiosity: which solution did you choose ?

Ultimately we are running the solution as proposed by Atlassian support:

<tt>-XX:+UseParallelOldGC </tt><tt>-XX:NewSize=1229m</tt>


and adjusted Min and Max Heap to 1.5 GB / 3 GB, so that the delta is not that big.

Thanks a lot for all the help

maba

Glad you solve it.

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Tuesday in Jira

What modern development practices are at the heart of how your team delivers software?

Hey Community mates! Claire here from the Software Product Marketing team. We all know software development changes rapidly, and it's often tough to keep up. But from our research, we've found the h...

145 views 1 3
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you