Fix Found for heartbeat problem

Hi, I'm addressing BSP-7948 issue so I can inform you of the cause and fix in an effort to assist others. Please feel free to close this immediately.

h2. Fix:

JVM wrapper settings changes fix problem

h6. Approach

Setup 2 node system on pkdesk and pklaptop

Setup bogus forever job that runs cmd shell using infinite loop

Used stress tool to max server disk, cpu, memory - could not cause heart beat problem

Reviewed Logs and found pattern (see description). Agent attempts to restart JVM and complains of DLL problem on windows

* Fix DLL problem by running all win agents with x86 jdk

* Fix JVM restart problem by increasing time for jvm ping {code}wrapper.java.command=d: Java x86 jdk1.6.0_26 bin java.exe

wrapper.ping.timeout=900

wrapper.ping.interval=30

{code}

{quote}References

* https://answers.atlassian.com/questions/61674/remote-agent-crashing-when-being-farmed

* http://wrapper.tanukisoftware.com/doc/english/prop-ping-timeout.html

{quote}

Restart entire system

h2. Problem:

Agent builds stop and results lost after missing heartbeat. for months this appears to be server problem. Re-evaluate

* Determine if we can replicate the problem with micro environment under load

* Check agent logs looking for a pattern

h2. Situation:

{code}(78) 2013-02-12 11:12:22,785 INFO ActiveMQ Session Task DefaultErrorHandler

Recording error: Agent General Purpose Windows X64 (vmwbuild-03) went offline while building TRUNK-QUICKCOMPILEWINDOWS-2876.

The build results will not be saved. : TRUNK-QUICKCOMPILEWINDOWS

{code}

h3. Agent

{code}

ERROR | wrapper | 2013/02/12 11:08:54 | JVM appears hung: Timed out waiting for signal from JVM.

ERROR | wrapper | 2013/02/12 11:08:55 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:09:10 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:09:34 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:09:34 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:09:39 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:10:08 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:10:08 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:10:13 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:10:42 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:10:42 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:10:47 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:11:16 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:11:16 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:11:21 | Launching a JVM...

INFO | jvm 105 | 2013/02/12 11:11:31 | Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org

INFO | jvm 105 | 2013/02/12 11:11:31 | Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.

...

INFO | jvm 105 | 2013/02/12 11:11:32 | WARNING - Unable to load the Wrapper's native library 'wrapper.dll'.

INFO | jvm 105 | 2013/02/12 11:11:32 | The file is located on the path at the following location but

INFO | jvm 105 | 2013/02/12 11:11:32 | could not be loaded:

INFO | jvm 105 | 2013/02/12 11:11:32 | d: bamboo-agent-home bin .. lib wrapper.dll

INFO | jvm 105 | 2013/02/12 11:11:32 | Please verify that the file is readable by the current user

INFO | jvm 105 | 2013/02/12 11:11:32 | and that the file has not been corrupted in any way.

INFO | jvm 105 | 2013/02/12 11:11:32 | One common cause of this problem is running a 32-bit version

INFO | jvm 105 | 2013/02/12 11:11:32 | of the Wrapper with a 64-bit version of Java, or vica versa.

INFO | jvm 105 | 2013/02/12 11:11:32 | This is a 64-bit JVM.

INFO | jvm 105 | 2013/02/12 11:11:32 | Reported cause:

INFO | jvm 105 | 2013/02/12 11:11:32 | D: bamboo-agent-home lib wrapper.dll: Can't find dependent libraries

INFO | jvm 105 | 2013/02/12 11:11:32 | System signals will not be handled correctly.

INFO | jvm 105 | 2013/02/12 11:11:32 |

INFO | jvm 105 | 2013/02/12 11:11:41 | Agent bootstrap using baseUrl: http://vmbamboo-01:8085/bamb

{code}

1 answer

1 accepted

1 vote

Hello there,

When we restart the local JVM, this kind of problem can happen.

To avoid it, you can change the wrapper time, like this:

wrapper.ping.timeout=800

wrapper.ping.interval=35

Hope that helps.

Cheers,

Suggest an answer

Log in or Sign up to answer
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Published Thursday in Marketplace Apps

Tips on how to choose the best estimation method for your planning

Planning and grooming sessions all come with their own sets of rules. Team members meet to estimate stories or other work items, all according to an agreed-upon process. And with every session comes ...

77 views 0 11
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you