"Failed to peek." and "failuresCount: 200/200" Errors in JIRA DATA CENTER and low performance

Hamid Gholami January 27, 2019
We have JIRA DATA CENTER with three nodes and with 200,000 issues. Also we use some of useful add-ons such as : Structure, Scriptrunner, eazybi, . . .
At some peak hours of consumption abnormally CPU usage goes up and number of established sessions on 8443 port without reason increase and does not decrease, so JIRA is very slow and not responding, so we must restart application and this happens repeatedly.

In JIRA log file we found two errors.
Can anyone help me on this?

1 answer

0 votes
Kiran Panduga {Appfire}
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 27, 2019

Hi @Hamid Gholami

Based on the inputs, I would suggest to check the logs for errors and identified the plugin or script which causes the CPU usage increase. 

If possible, increase the allocated memory (JVM ) on each node and verify.

Additionally, go through the below Atlassian KB article which include performance related issues like distribution of requests across nodes

https://confluence.atlassian.com/enterprise/jira-data-center-common-problems-939513809.html

 

Thanks,

Kiran.

Hamid Gholami January 27, 2019

Hello @Kiran Panduga {Appfire}

Thank you for your response.

 we have 32G Memory totally and I set 6G JVM in jira application.

 

Also when that problem happened we found two errors in JIRA log file:

 

ERROR NUM1:

019-01-27 15:30:51,033 localq-reader-3 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] Critical state of local cache replication queue - cannot peek from queue: [queueId=queue_NODE1_1_4b1b4dc8cf38b3c64b1d657da8f5ac8c, queuePath=/opt/atlassian/application-data/jira/localq/queue_NODE1_1_4b1b4dc8cf38b3c64b1d657da8f5ac8c], error: Failed to peek.
com.squareup.tape.FileException: Failed to peek.
        at com.squareup.tape.FileObjectQueue.peek(FileObjectQueue.java:59)
        at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.peek(TapeLocalQCacheOpQueue.java:198)
        at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.peekOrBlock(TapeLocalQCacheOpQueue.java:216)
        at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpQueueWithStats.peekOrBlock(LocalQCacheOpQueueWithStats.java:198)
        at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.peekOrBlock(LocalQCacheOpReader.java:157)
        at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:71)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.ClassNotFoundException: com.codebarrel.jira.plugin.automation.store.CachingAutomationConfigStore$TenantRuleId
        at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpConverter.from(TapeLocalQCacheOpConverter.java:25)
        at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpConverter.from(TapeLocalQCacheOpConverter.java:16)
        at com.squareup.tape.FileObjectQueue.peek(FileObjectQueue.java:57)
        ... 10 more
Caused by: java.lang.ClassNotFoundException: com.codebarrel.jira.plugin.automation.store.CachingAutomationConfigStore$TenantRuleId
        ... 1 filtered
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:628)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpConverter.from(TapeLocalQCacheOpConverter.java:23)
        ... 12 more

ERROR NUM2:

2019-01-27 15:23:30,039 localq-reader-11 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] Abandoning sending: LocalQCacheOp{cacheName='org.marvelution.jji.releasereport.CiBuildReleaseReportColumn', action=REMOVE, key=ERP-2285, value=null, creationTimeInMillis=1548602413120} from cache replication queue: [queueId=queue_NODE2_0_31fe71b00865ae60db401068d5159de9, queuePath=/opt/atlassian/application-data/jira/localq/queue_NODE2_0_31fe71b00865ae60db401068d5159de9], failuresCount: 200/200. Removing from queue. Error: java.rmi.NotBoundException: org.marvelution.jji.releasereport.CiBuildReleaseReportColumn
com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.NotBoundException: org.marvelution.jji.releasereport.CiBuildReleaseReportColumn
        at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:88)
        at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.rmi.NotBoundException: org.marvelution.jji.releasereport.CiBuildReleaseReportColumn
        at sun.rmi.registry.RegistryImpl.lookup(RegistryImpl.java:166)
        at sun.rmi.registry.RegistryImpl_Skel.dispatch(Unknown Source)
        at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:411)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:272)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
        at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:276)
        at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:253)
        at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:379)
        at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
        at com.atlassian.jira.cluster.distribution.localq.rmi.BasicRMICachePeerProvider.lookupRemoteCachePeer(BasicRMICachePeerProvider.java:64)
        at com.atlassian.jira.cluster.distribution.localq.rmi.BasicRMICachePeerProvider.create(BasicRMICachePeerProvider.java:39)
        at com.atlassian.jira.cluster.distribution.localq.rmi.CachingRMICachePeerManager.getCachePeerFor(CachingRMICachePeerManager.java:58)
        at com.atlassian.jira.cluster.distribution.localq.rmi.CachingRMICachePeerManager.withCachePeer(CachingRMICachePeerManager.java:91)
        at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:63)
        ... 6 more

Kiran Panduga {Appfire}
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 28, 2019

Hi @Hamid Gholami

 

Based on the error message, it looks like the issue is with the cache replication configuration.

I would suggest to go through the below KB article and increase the queue max size and other options.

https://confluence.atlassian.com/enterprise/jira-data-center-cache-replication-954262828.html

Additionally, I would also suggest to check the cluster cache replication timeout related issues and follow the recommendation provided by Atlassian in the below link:

https://confluence.atlassian.com/jirakb/cluster-cache-replication-healthcheck-fails-due-to-unable-to-complete-within-the-timeout-945545127.html

Thanks,

Kiran.

Suggest an answer

Log in or Sign up to answer