Java GC blocks Jira

Martin Bayreuther July 26, 2012

With the usage of Jira increasing, we notice that Jira is blocking out users for typically 120 to 300 seconds. The timing of this blocking seems to coincide with the garbage collector runs of the Java VM.

This is increasingly critical as users complain that when they have entered lengthy issue descriptions, they loose all their work (sometime in excess of half an hour). The number of user complaints is getting significant.

Is there a method to

a) determine the root cause of these short outages

b) fix it

Martin, Advantest

4 answers

1 accepted

2 votes
Answer accepted
Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2012

a) Options: <kbd>-verbose:gc -XX:+PrintGCDetails</kbd>

b) You have to tune your GC.

1. Keep in mind that most such problems arise because Java heap is too big.

2. You can always use -XX:+UseParallelGC (minor pauses acceptable) or even -XX:+UseConcMarkSweepGC (no pauses)

If you decide for parallel, cosider using -XX:ParallelGCThreads=<N procs - 1> (or around) and use -XX:MaxGCPauseMillis=2000 to indicate to the server that pauses greater than 2s are not desired.

There are plenty of articles describing how to minimize the pauses

[Edit] Try these to cut as much as possible the pauses:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing

Martin Bayreuther July 26, 2012

Actually I was searching for that in both "Answers @ Atlassian" and Google. The majority of articles was just mentioning the GC logging parameters but I have those already.

Are there any specific pointers?

Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2012

What's in gc.log ?

Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2012
Martin Bayreuther July 26, 2012

The one that caused a problem today:

2012-07-27T08:39:15.594+0200: 175993.979: [GC [PSYoungGen: 805630K-&gt;191877K(798656K)] 3554934K-&gt;2941630K(3594880K), 0.0779530 secs] [Times: user=1.12 sys=0.00, real=0.07 secs]
2012-07-27T08:39:58.266+0200: 176036.651: [GC [PSYoungGen: 798597K-&gt;162119K(830976K)] 3548350K-&gt;2912229K(3627200K), 0.0638500 secs] [Times: user=0.98 sys=0.01, real=0.06 secs]
2012-07-27T08:40:00.939+0200: 176039.324: [GC [PSYoungGen: 761095K-&gt;143249K(844288K)] 3511205K-&gt;2914494K(3640512K), 0.0581810 secs] [Times: user=0.94 sys=0.01, real=0.06 secs]
2012-07-27T08:40:04.790+0200: 176043.175: [GC [PSYoungGen: 742225K-&gt;156741K(908160K)] 3513470K-&gt;2928745K(3704384K), 0.0519910 secs] [Times: user=0.79 sys=0.00, real=0.06 secs]
2012-07-27T08:40:05.888+0200: 176044.274: [GC [PSYoungGen: 825669K-&gt;156447K(907136K)] 3597673K-&gt;2929053K(3703360K), 0.0541860 secs] [Times: user=0.86 sys=0.00, real=0.06 secs]
2012-07-27T08:40:20.023+0200: 176058.408: [GC [PSYoungGen: 825375K-&gt;179413K(990016K)] 3597981K-&gt;2952710K(3786240K), 0.0881010 secs] [Times: user=1.05 sys=0.07, real=0.09 secs]
2012-07-27T08:40:28.082+0200: 176066.467: [GC-- [PSYoungGen: 945173K-&gt;945173K(990016K)] 3718470K-&gt;3741389K(3786240K), 2.4579610 secs] [Times: user=1.93 sys=6.64, real=2.46 secs]
2012-07-27T08:40:30.540+0200: 176068.925: [Full GC [PSYoungGen: 945173K-&gt;0K(990016K)] [PSOldGen: 2796216K-&gt;762306K(2541568K)] 3741389K-&gt;762306K(3531584K) [PSPermGen: 240935K-&gt;235637K(249600K)], 73.4757870 secs] [Times: user=4.55 sys=0.76, real=73.46 secs]
2012-07-27T08:41:44.618+0200: 176143.004: [GC [PSYoungGen: 765760K-&gt;34425K(996928K)] 1528066K-&gt;796732K(3538496K), 0.0297050 secs] [Times: user=0.38 sys=0.00, real=0.03 secs]

So this Full GC took 73 seconds which was noticable

Martin Bayreuther July 26, 2012

This is more of the typical one (takes between 0.5 and 4 sec for the full GC)

2012-07-27T13:34:21.545+0200: 193699.930: [GC [PSYoungGen: 539200K-&gt;33856K(609408K)] 3076719K-&gt;2586239K(3399424K), 0.0915780 secs] [Times: user=0.35 sys=0.02, real=0.09 secs]
2012-07-27T13:34:33.013+0200: 193711.398: [GC [PSYoungGen: 545152K-&gt;20362K(679104K)] 3097535K-&gt;2589388K(3469120K), 0.1007560 secs] [Times: user=0.41 sys=0.01, real=0.10 secs]
2012-07-27T13:34:37.009+0200: 193715.395: [GC [PSYoungGen: 613066K-&gt;30201K(682560K)] 3182092K-&gt;2599976K(3472576K), 0.0562450 secs] [Times: user=0.34 sys=0.04, real=0.06 secs]
2012-07-27T13:34:40.994+0200: 193719.379: [GC [PSYoungGen: 622905K-&gt;21104K(771776K)] 3192680K-&gt;2611651K(3561792K), 0.0600670 secs] [Times: user=0.34 sys=0.02, real=0.06 secs]
2012-07-27T13:34:46.649+0200: 193725.034: [GC [PSYoungGen: 711472K-&gt;39351K(772928K)] 3302019K-&gt;2641011K(3562944K), 0.0705590 secs] [Times: user=0.36 sys=0.02, real=0.07 secs]
2012-07-27T13:34:50.140+0200: 193728.525: [GC [PSYoungGen: 729719K-&gt;28886K(877504K)] 3331379K-&gt;2657883K(3667520K), 0.0467470 secs] [Times: user=0.34 sys=0.03, real=0.05 secs]
2012-07-27T13:34:51.458+0200: 193729.843: [GC [PSYoungGen: 833558K-&gt;2816K(880768K)] 3462555K-&gt;2654100K(3670784K), 0.0295270 secs] [Times: user=0.33 sys=0.00, real=0.03 secs]
2012-07-27T13:34:58.463+0200: 193736.848: [GC [PSYoungGen: 807488K-&gt;77042K(968384K)] 3458772K-&gt;2830436K(3758400K), 0.1496060 secs] [Times: user=0.94 sys=0.13, real=0.15 secs]
2012-07-27T13:34:58.613+0200: 193736.998: [Full GC [PSYoungGen: 77042K-&gt;0K(968384K)] [PSOldGen: 2753394K-&gt;1802531K(2796224K)] 2830436K-&gt;1802531K(3764608K) [PSPermGen: 255608K-&gt;243959K(256704K)], 4.8672610 secs] [Times: user=4.35 sys=0.01, real=4.87 secs]
2012-07-27T13:35:09.438+0200: 193747.823: [GC [PSYoungGen: 891328K-&gt;47021K(1018752K)] 2693859K-&gt;1849553K(3814976K), 0.1264450 secs] [Times: user=0.36 sys=0.02, real=0.13 secs]

As mentioned above: most of the full GCs take less than 2 seconds.

Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2012

1) I would advise to set -Xms & -Xmx to the same value

2) Can you try adding -XX:NewRatio=4 (or even 5) and see what's happening ? Play also with -XX:MaxGCPauseMillis

3) Revert, and try:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 29, 2012

Any update on this ?

1 vote
Martin Bayreuther July 29, 2012

Here is the proposal from Atlassian support. It' slightly different:

Based on the Database statistic, it seems that this is a medium size instance. Based on the GCViewer image as per attached in the screenshot GClog, it seems that the timespent for the GC is really high for on the 27th July 2012. We also notice that the heap usage was high during the day. Since the time taken for the GC full GC is high, this might be cause by the size of the heap that Full GC need to clear is very large.

Therefore, we would like to recommend to adjust the current JVM input parameter to the following. In Linux, please edit the <tt>JIRA_Install/bin/setenv.sh</tt>:

  • Use a parallel collector: <tt>-XX:+UseParallelOldGC</tt>. This will ensure that GC will perform the Garbage Collection more frequently. But, every collection should be small thus the lag time should be reduce.
  • Disable remote clients from triggering a full GC event <tt>-XX:+DisableExplicitGC</tt>
  • Set the Young space up to 30-40% of the overall heap. Eg: <tt>-XX:NewSize=1229m</tt>
  • optionally, reduce the size of the -Xmx to 3GB because as a benchmark we allocate 1 GB of heap to instance with 100,000 of issues. Reducing the -Xmx will reduce the amount of heap that GC need to clear for each Full GC.

Do let us know how it goes.

Martin Bayreuther July 29, 2012

At the moment we have implemented the solution from Radu (using the concurrent GC). As we are talking about a production instance here, we can only change things like this during a downtime window. This has happened Sunday night. So we are collecting data since about 10 hours.

So far no lengthy GC outages have been observed and the gc.log indicates the longest time that the concurrent GC did spend was about 1 sec.

Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 29, 2012

1) Java heap is too big, agree. 2) Smaller young generation, agree 3) ok for <tt>DisableExplicitGC and </tt>UseParallelOldGC.

Please keep us informed with your case.

TIA,

R

Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 8, 2012

Intellectual curiosity: which solution did you choose ?

Martin Bayreuther August 8, 2012

Ultimately we are running the solution as proposed by Atlassian support:

<tt>-XX:+UseParallelOldGC </tt><tt>-XX:NewSize=1229m</tt>


and adjusted Min and Max Heap to 1.5 GB / 3 GB, so that the delta is not that big.

Thanks a lot for all the help

maba

Radu Dumitriu
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 8, 2012

Glad you solve it.

gf August 13, 2019

Hello, 

sorry for re-opening this thread but can anyone assist me with a guide on how to implement this solution;

Been struggling for over a week and running into GC Allocation errors as above.

 

Here is the proposal from Atlassian support. It' slightly different:

 

Based on the Database statistic, it seems that this is a medium size instance. Based on the GCViewer image as per attached in the screenshot GClog, it seems that the timespent for the GC is really high for on the 27th July 2012. We also notice that the heap usage was high during the day. Since the time taken for the GC full GC is high, this might be cause by the size of the heap that Full GC need to clear is very large.

Therefore, we would like to recommend to adjust the current JVM input parameter to the following. In Linux, please edit the <tt>JIRA_Install/bin/setenv.sh</tt>:

  • Use a parallel collector: <tt>-XX:+UseParallelOldGC</tt>. This will ensure that GC will perform the Garbage Collection more frequently. But, every collection should be small thus the lag time should be reduce.
  • Disable remote clients from triggering a full GC event <tt>-XX:+DisableExplicitGC</tt>
  • Set the Young space up to 30-40% of the overall heap. Eg: <tt>-XX:NewSize=1229m</tt>
  • optionally, reduce the size of the -Xmx to 3GB because as a benchmark we allocate 1 GB of heap to instance with 100,000 of issues. Reducing the -Xmx will reduce the amount of heap that GC need to clear for each Full GC.

Do let us know how it goes.

0 votes
Martin Bayreuther July 26, 2012
-bash-3.2$ ps -ef |grep jira
jira       956 14625  0 11:33 pts/0    00:00:00 ps -ef
jira       957 14625  0 11:33 pts/0    00:00:00 grep jira
jira     12370     1 13 Jul25 ?        07:14:17 /usr/java/latest/bin/java -Djava.util.logging.config.file=/opt/apps/atlassian/jira-enterprise/conf/logging.properties -XX:MaxPermSize=512m -Xms1024m -Xmx4096m -Djava.awt.headless=true -Datlassian.standalone=JIRA -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dmail.mime.decodeparameters=true -Dhttp.proxyHost=webproxy.verigy.net -Dhttp.proxyPort=80 -Dhttp.nonProxyHosts=*.verigy.net|localhost -Djira.jelly.on=true -Dmail.mime.decodeparameters=true -XX:+PrintGC -XX:+PrintGCDetails -Xloggc:/opt/apps/atlassian/jira-enterprise/logs/gc.log -XX:+PrintGCDateStamps -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/opt/apps/atlassian/jira-enterprise/endorsed -classpath /opt/apps/atlassian/jira-enterprise/scripts/production/workflow/groovy/common:/opt/apps/atlassian/jira-enterprise/scripts/production/workflow/groovy/postactions:/opt/apps/atlassian/jira-enterprise/scripts/production/workflow/groovy/validators:/opt/atlassian/jira-data/scripts:/opt/apps/atlassian/jira-enterprise/bin/bootstrap.jar -Dcatalina.base=/opt/apps/atlassian/jira-enterprise -Dcatalina.home=/opt/apps/atlassian/jira-enterprise -Djava.io.tmpdir=/opt/apps/atlassian/jira-enterprise/temp org.apache.catalina.startup.Bootstrap start
root     14623 13614  0 09:49 pts/0    00:00:00 su - jira
jira     14625 14623  0 09:49 pts/0    00:00:00 -bash
postgres 31055 23969  0 Jul26 ?        00:03:09 postgres: jirausr jiradb 127.0.0.1(21248) idle
postgres 31072 23969  0 Jul26 ?        00:02:26 postgres: jirausr jiradb 127.0.0.1(21254) idle
postgres 31077 23969  0 Jul26 ?        00:03:54 postgres: jirausr jiradb 127.0.0.1(21258) idle
-bash-3.2$

0 votes
Dennis Kromhout van der Meer
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2012

Could you please share the java arguments that you're using for running JIRA?

Suggest an answer

Log in or Sign up to answer