JEMH 1.2.68/1.2.71 causing Java high CPU utilization (urgent)

Simon Gao November 27, 2012

Hi,

JEMH is causing 100% CPU utilization on our JIRA server. We believe it's JEMH auditing that's causing the issue. When JIRA makes queries to PostgreSQL database regarding auditing events, some of such queries take more than 10 minutes to complete. During this 10 minutes, the JIRA is pegging 100% CPU until it hangs.

Right now I can't even open the auditing page to clean up events via web interface.

How can I clean up all the events directly in the database? This is urgent. Our production JIRA server has been unstable last four days.

Any prompt help would highly appreciated.

Simon

1 answer

0 votes
Andy Brook [Plugin People]
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 27, 2012

Hi Simon,

Hmm, OK, Im not aware of any issues that could cause this, please raise a support JIRA attaching a logfile of recent activity and i'll do my best to advise.

Andy Brook [Plugin People]
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 28, 2012

Ill be on hand for another couple of hours today, no traffic yet that I can see

Simon Gao November 28, 2012

I've identified the root cause for the problem.

JEMH maintained too much email log events. The following db query to the events table took more than 10 minutes to return result if it ever does:

SELECT * FROM public."AO_78C957_AUDITEVENTS";

This caused JIRA/java process continuously run at 100% CPU until it hang. Deleting all the entries in table "AO_78C957_AUDITEVENTS" brought JIRA back to life. Since then, I dropped the event retention time from 3 months to one day. Over night, I saw 205413 email messages were processed by JEMH (at least that's what JEMH reported).

So for high email volume site like us, 3 month default setting for event retension is way too high.


One question I still have is why and how JIRA was brought down to its knees by JEMH?

Please help find out why.

Thanks,

Simon

Andy Brook [Plugin People]
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 29, 2012

Hi Simon, the AUDIT data is scanned for expired content every email, if you specify 3 months, its going to to retain and scan all email received in that period, its only duing a timestamp check but the issue volume is high. Setting a retention period to much less (a day) will of course reduce the volume of records that need to be scanned on email receipt.

I have generated large (hundreds of K) volumes before, I will do some more testing on that volume and see what falls out. I would think that a better solution would be a nightly job to remove older content, I'll certainly work to resolve this sooner rather than later.

Simon Gao December 2, 2012

This morning I am experiencing problem accessing the Auditing page again. See attached screenshot. This is after I drecrased events retension from 3 months to 1 day last Wed.

Any suggestion what I should do? JIRA is very slow again and currently consuming 100% CPU constantly.

Simon Gao December 2, 2012

After a while, I got this error when trying to open the Aduditing page. Is there a way to disable auditing?

/error/HTTP_SERVICE_UNAVAILABLE.html.var

Could not find what you were looking for. Maybe you should raise an issue.

JIRA home

Andy Brook [Plugin People]
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
December 4, 2012

Hi Simon,

There isnt a way to stop auditing at the moment, as you referred on the issue below, I will be putting in place measures to remove this impact.

The history needs to be purged, setting a day as the retention period is exactly what needs to be done,but still requires historic data to be removed (documented here). Please verify that the audit tables are empty, the impact of this should be near instant.

I'm tracking this at https://studio.plugins.atlassian.com/browse/JEMH-1067 , its my top prio right now, please feedback on how the purge works for you on the JIRA issue.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events