JIRA's GC problems

Hi!

We have JIRA 5.0.4 instance with lot of issues (~500K).

Today I have faced with concurrent mode failure:

430135.067: [CMS2015-02-17T10:37:10.868+0300: 430138.228: [CMS-concurrent-mark: 5.335/13.843 secs] [Times: user=53.87 sys=1.92, real=13.84 secs] 
 (concurrent mode failure): 8388608K-&gt;8388607K(8388608K), 39.5041120 secs]430174.572: [Class Histogram
 num     #instances         #bytes  class name
----------------------------------------------
   1:      38454698     2420770448  [C
   2:      19789035     2315535216  [Ljava.lang.Object;
   3:      38469185     1231013920  java.lang.String
   4:      23047100     1106260800  org.apache.lucene.document.Field
   5:      34057895     1089852640  java.util.HashMap$Entry
   6:      17937773      430506552  java.util.ArrayList
   7:       4308072      364018032  [Ljava.util.HashMap$Entry;
   8:       4240606      203549088  java.util.HashMap
   9:       5898413      141561912  java.lang.Long
  10:        193169      112915568  [B
  11:        148400      108985416  [I
  12:       1770331       99138536  org.ofbiz.core.entity.GenericValue
  13:            37       79666304  [Ljava.util.Collection;
  14:       1790656       57300992  java.util.Vector
  15:        348579       57289648  &lt;constMethodKlass&gt;
  16:       2021766       48522384  java.util.LinkedList$Entry
  17:        348579       47422664  &lt;methodKlass&gt;
  18:         38630       42971760  &lt;constantPoolKlass&gt;
  19:        560925       31411800  com.atlassian.jira.issue.DocumentIssueImpl
  20:       1881905       30110480  java.lang.Integer
  21:         38630       28972744  &lt;instanceKlassKlass&gt;
  22:       1784142       28546272  java.util.HashMap$EntrySet
  23:        414247       25311624  &lt;symbolKlass&gt;
  24:         33303       24114552  &lt;constantPoolCacheKlass&gt;
  25:        474338       15178816  java.lang.ThreadLocal$ThreadLocalMap$Entry
  26:        584090       14018160  org.apache.lucene.search.ScoreDoc
  27:        560926       13462224  org.apache.lucene.document.Document
  28:         23258       13071016  &lt;methodDataKlass&gt;
  29:        360203       11526496  java.util.concurrent.ConcurrentHashMap$HashEntry
  30:        278817       11152680  java.lang.ref.SoftReference
  31:        217163       10423824  org.apache.velocity.runtime.parser.Token
  32:        236301        9452040  org.apache.lucene.index.TermInfo
  33:        367843        8828232  java.util.concurrent.locks.ReentrantReadWriteLock$Sync$HoldCounter
  34:         89345        7862360  java.lang.reflect.Method
  35:         60222        7186440  [S
  36:          2003        7022208  [J
  37:        164389        6575560  java.util.LinkedHashMap$Entry
  38:        250402        6009648  org.apache.lucene.index.Term
  39:        249530        5988720  java.util.LinkedList
...

This happened several times before I restarted JIRA.

gclog.jpg

JVM settings are:

JAVA_OPTS="$JAVA_OPTS "-server" "-Xincgc" "-XX:NewSize=2048m" "-XX:MaxNewSize=2048m" "-XX:PermSize=512M" "-XX:+CMSClassUnloadingEnabled" "-XX:+CMSClassUnloadingEnabled


JVM_MINIMUM_MEMORY="10g"


JVM_MAXIMUM_MEMORY="10g"


JVM_EXTRA_ARGS="-XX:+PrintGCDateStamps -XX:ReservedCodeCacheSize=128M -Xloggc:/var/log/jira/gc.log -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC -XX:SurvivorRatio=4 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=16 -XX:ParallelCMSThreads=12 -XX:+CMSScavengeBeforeRemark -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60"

What happened with Old Gen? Why there are so many lucene objects?

atlassian-jira.log doesn't show anything, there are usual warnings and errors, like "RHS of #set statement is null". No info about indexing. No exceptions.

Can anybody help to find where is the problem?

Thanks in advance!

2 answers

1 accepted

1 vote

Answer accepted

1:      38454698     2420770448  [C
   2:      19789035     2315535216  [Ljava.lang.Object;
   3:      38469185     1231013920  java.lang.String
   4:      23047100     1106260800  org.apache.lucene.document.Field

This indicates to me that there is a very large search running that is likely trying to pull your entire Lucene database into memory at once. This usually comes from poorly behaved plugins, but there are ways to construct very nasty JQL that could also do this.

The old gen fills because there is a soft values cache that tries to hold onto Lucene data for as long as a particular state of the Lucene index is still in use. This is a huge performance benefit, but the fact that it is a soft values cache means that the JVM will only release the values when the last reference closes or when memory is otherwise exhausted and the GC is forced to release some of it. Normally this is not a problem because in-flight searches don't last long enough for it to build up.

Some things you can try:

Check atlassian-jira-slow-queries.log to see if there are any JQL logged as taking an unreasonably long time. It's hard to say exactly what is reasonable, but certainly any value over 30000 ms is going to cause serious problems.
Check your activity log for the requests around the time memory started increasing to see if you can find a culprit.
Generate a thread dump to see if there is a particular thread or set of threads that seems to be stuck in Lucene for a particularly long time. You may be able to track down a plugin or JQL function that is triggering this from the stack traces.
Generate a heap dump to see if a single thread is holding on to the memory.