We have JIRA 5.0.4 instance with lot of issues (~500K).
Today I have faced with concurrent mode failure:
430135.067: [CMS2015-02-17T10:37:10.868+0300: 430138.228: [CMS-concurrent-mark: 5.335/13.843 secs] [Times: user=53.87 sys=1.92, real=13.84 secs] (concurrent mode failure): 8388608K->8388607K(8388608K), 39.5041120 secs]430174.572: [Class Histogram num #instances #bytes class name ---------------------------------------------- 1: 38454698 2420770448 [C 2: 19789035 2315535216 [Ljava.lang.Object; 3: 38469185 1231013920 java.lang.String 4: 23047100 1106260800 org.apache.lucene.document.Field 5: 34057895 1089852640 java.util.HashMap$Entry 6: 17937773 430506552 java.util.ArrayList 7: 4308072 364018032 [Ljava.util.HashMap$Entry; 8: 4240606 203549088 java.util.HashMap 9: 5898413 141561912 java.lang.Long 10: 193169 112915568 [B 11: 148400 108985416 [I 12: 1770331 99138536 org.ofbiz.core.entity.GenericValue 13: 37 79666304 [Ljava.util.Collection; 14: 1790656 57300992 java.util.Vector 15: 348579 57289648 <constMethodKlass> 16: 2021766 48522384 java.util.LinkedList$Entry 17: 348579 47422664 <methodKlass> 18: 38630 42971760 <constantPoolKlass> 19: 560925 31411800 com.atlassian.jira.issue.DocumentIssueImpl 20: 1881905 30110480 java.lang.Integer 21: 38630 28972744 <instanceKlassKlass> 22: 1784142 28546272 java.util.HashMap$EntrySet 23: 414247 25311624 <symbolKlass> 24: 33303 24114552 <constantPoolCacheKlass> 25: 474338 15178816 java.lang.ThreadLocal$ThreadLocalMap$Entry 26: 584090 14018160 org.apache.lucene.search.ScoreDoc 27: 560926 13462224 org.apache.lucene.document.Document 28: 23258 13071016 <methodDataKlass> 29: 360203 11526496 java.util.concurrent.ConcurrentHashMap$HashEntry 30: 278817 11152680 java.lang.ref.SoftReference 31: 217163 10423824 org.apache.velocity.runtime.parser.Token 32: 236301 9452040 org.apache.lucene.index.TermInfo 33: 367843 8828232 java.util.concurrent.locks.ReentrantReadWriteLock$Sync$HoldCounter 34: 89345 7862360 java.lang.reflect.Method 35: 60222 7186440 [S 36: 2003 7022208 [J 37: 164389 6575560 java.util.LinkedHashMap$Entry 38: 250402 6009648 org.apache.lucene.index.Term 39: 249530 5988720 java.util.LinkedList ...
This happened several times before I restarted JIRA.
JVM settings are:
JAVA_OPTS="$JAVA_OPTS "-server" "-Xincgc" "-XX:NewSize=2048m" "-XX:MaxNewSize=2048m" "-XX:PermSize=512M" "-XX:+CMSClassUnloadingEnabled" "-XX:+CMSClassUnloadingEnabled JVM_MINIMUM_MEMORY="10g" JVM_MAXIMUM_MEMORY="10g" JVM_EXTRA_ARGS="-XX:+PrintGCDateStamps -XX:ReservedCodeCacheSize=128M -Xloggc:/var/log/jira/gc.log -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC -XX:SurvivorRatio=4 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=16 -XX:ParallelCMSThreads=12 -XX:+CMSScavengeBeforeRemark -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60"
What happened with Old Gen? Why there are so many lucene objects?
atlassian-jira.log doesn't show anything, there are usual warnings and errors, like "RHS of #set statement is null". No info about indexing. No exceptions.
Can anybody help to find where is the problem?
Thanks in advance!
1: 38454698 2420770448 [C 2: 19789035 2315535216 [Ljava.lang.Object; 3: 38469185 1231013920 java.lang.String 4: 23047100 1106260800 org.apache.lucene.document.Field
This indicates to me that there is a very large search running that is likely trying to pull your entire Lucene database into memory at once. This usually comes from poorly behaved plugins, but there are ways to construct very nasty JQL that could also do this.
The old gen fills because there is a soft values cache that tries to hold onto Lucene data for as long as a particular state of the Lucene index is still in use. This is a huge performance benefit, but the fact that it is a soft values cache means that the JVM will only release the values when the last reference closes or when memory is otherwise exhausted and the GC is forced to release some of it. Normally this is not a problem because in-flight searches don't last long enough for it to build up.
Some things you can try:
atlassian-jira-slow-queries.logto see if there are any JQL logged as taking an unreasonably long time. It's hard to say exactly what is reasonable, but certainly any value over 30000 ms is going to cause serious problems.
Thanks for your answer Chris! slow-queries log shows me lot of queries from suspicious plugin. There are about 15 queries a duration of 500 ms for only one http request. There was 3 such requests at the time memory started increasing. And some other slow queries. I have disabled that plugin. Could you please explain how to see which thread is holding on to the memory? I have created heap dump and open it in visualvm. What is the next step?
The slow query log triggers for anything 400ms or longer, but while a 500ms query is "slow", a single one on a page probably would not be all that noticeable. Did disabling the plugin actually improve things? The heap dump tools vary considerably. I usually use Eclipse MAT and don't really know my way around VisualVM. However, there should be some options to find the largest objects by the size of heap that they retain. MAT calls this "dominators" but I don't think VisualVM has any special name for it. When there is a single thread performing a long-running and excessively large search, it usually shows up here.
Just re-read your comment... 15 * 500ms would indeed be very suspicious. If the plugin has a configuration page where you can enter the JQL queries it uses, then you may be able to improve the actual queries instead of disabling the plugin. For example, there are some expressions like "filter IN (...)" or "parent(...)" that are particularly expensive because they implicitly perform nested searches. Sometimes these questions can be asked another way that avoids the problem.
Yes, that plugin is configurable. I'll rewrite search logic and also change JQL queries used in them. But that was nor the real cause of problem. Thanks a lot for the tip about Eclipse MAT. Powerful tool. Today CMS fails again and using MAT i definitely found a JQL function which leads to increase memory. Your answers were very helpful!
Happy to help! Eclipse MAT is very powerful, but it isn't very user-friendly, which may make it harder to get started with. Most of the developers at Atlassian tend to prefer YourKit for this kind of thing. I would be interested in knowing which JQL function was causing the problem in this case.
the following might not be the solution to your problem, but my frank idea of what would make sense in this situation.
Considering the size of your instance I assume it belongs to a larger corporation and has by now gained significant business value and importance. Due to this, I would suggest to involve an Atlassian Expert specialized in performance tuning to thoroughly analyse your instance and its resource footprint to give a proper consultation on how to adjust the configuration in a sustainable manner. Fighting the symptom might work for now, but it might not guarantee stability over the long run.
Being part of a platinum expert with several experts in this field that also have experience with instances of such volume, feel free to ping me when this is an approach you want to consider.
I hope I don't come off as greedy for the sales opportunity, but genuinely supportive with your troublesome situation.
Philipp, I think it's ok to remind people of what Atlassian Experts can help with. I tell people who I work (ServiceRocket) or or put it in my name field and then add a link to http://experts.atlassian.com That way we don't all have to chime and say "me too, me too!"
Hey Matt, thanks for your support. For me, personally, it's a bit of a balancing act to mention experts and the possible benefits on getting one on board but not give the impression of using this platform to hunt for leads. Of course there are always Devs from Atlassian looking into this and would naturally be able to answer/fix the problem. However, as you said, I think sometimes it makes sense to suggest to seek an expert to get bespoke service to not only tackle the problem but also avoid it from re-occuring in the future and unneccessarily risking tremendous business impacts (if applicable). I intentionally didn't mention my company as I only wanted to make the suggestion of considering getting an expert on board (and that we *would* be capable :)) instead of - as you say - shout "come to <company name>, you get cookies on top!". Using the company name in the name has some side-effects that I'm not sure I would like to take. But it's a thing I'm still considering. :) Regards, Philipp
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
We're bringing product updates and pro tips on teamwork to ten cities around the world.Save your spot