Fix Found for heartbeat problem

Hi, I'm addressing BSP-7948 issue so I can inform you of the cause and fix in an effort to assist others. Please feel free to close this immediately.

Fix:

JVM wrapper settings changes fix problem

Approach

  • Setup 2 node system
  • Setup bogus forever job that runs cmd shell using infinite loop
  • Used stress tool to max server disk, cpu, memory - could not cause heart beat problem
  • Reviewed Logs and found pattern (see description). Agent attempts to restart JVM and complains of DLL problem on windows
  • Fix DLL problem by running all win agents with x86 jdk
  • Fix JVM restart problem by increasing time for jvm ping {code}wrapper.java.command=d: Java x86 jdk1.6.0_26 bin java.exe
wrapper.ping.timeout=900
wrapper.ping.interval=30

References

  • https://answers.atlassian.com/questions/61674/remote-agent-crashing-when-being-farmed
  • http://wrapper.tanukisoftware.com/doc/english/prop-ping-timeout.html

Problem:

Agent builds stop and results lost after missing heartbeat. for months this appears to be server problem. Re-evaluate

  • Determine if we can replicate the problem with micro environment under load
  • Check agent logs looking for a pattern

Situation:

2013-02-12 11:12:22,785 INFO ActiveMQ Session Task DefaultErrorHandler

Recording error: Agent General Purpose Windows X64 (vmwbuild-03) went offline while building TRUNK-QUICKCOMPILEWINDOWS-2876.

The build results will not be saved. : TRUNK-QUICKCOMPILEWINDOWS

Agent

ERROR | wrapper | 2013/02/12 11:08:54 | JVM appears hung: Timed out waiting for signal from JVM.

ERROR | wrapper | 2013/02/12 11:08:55 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:09:10 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:09:34 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:09:34 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:09:39 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:10:08 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:10:08 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:10:13 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:10:42 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:10:42 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:10:47 | Launching a JVM...

ERROR | wrapper | 2013/02/12 11:11:16 | Startup failed: Timed out waiting for a signal from the JVM.

ERROR | wrapper | 2013/02/12 11:11:16 | JVM did not exit on request, terminated

STATUS | wrapper | 2013/02/12 11:11:21 | Launching a JVM...

INFO | jvm 105 | 2013/02/12 11:11:31 | Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org

INFO | jvm 105 | 2013/02/12 11:11:31 | Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.

...

INFO | jvm 105 | 2013/02/12 11:11:32 | WARNING - Unable to load the Wrapper's native library 'wrapper.dll'.

INFO | jvm 105 | 2013/02/12 11:11:32 | The file is located on the path at the following location but

INFO | jvm 105 | 2013/02/12 11:11:32 | could not be loaded:

INFO | jvm 105 | 2013/02/12 11:11:32 | d: bamboo-agent-home bin .. lib wrapper.dll

INFO | jvm 105 | 2013/02/12 11:11:32 | Please verify that the file is readable by the current user

INFO | jvm 105 | 2013/02/12 11:11:32 | and that the file has not been corrupted in any way.

INFO | jvm 105 | 2013/02/12 11:11:32 | One common cause of this problem is running a 32-bit version

INFO | jvm 105 | 2013/02/12 11:11:32 | of the Wrapper with a 64-bit version of Java, or vica versa.

INFO | jvm 105 | 2013/02/12 11:11:32 | This is a 64-bit JVM.

INFO | jvm 105 | 2013/02/12 11:11:32 | Reported cause:

INFO | jvm 105 | 2013/02/12 11:11:32 | D: bamboo-agent-home lib wrapper.dll: Can't find dependent libraries

INFO | jvm 105 | 2013/02/12 11:11:32 | System signals will not be handled correctly.

INFO | jvm 105 | 2013/02/12 11:11:32 |

INFO | jvm 105 | 2013/02/12 11:11:41 | Agent bootstrap using baseUrl: http://vmbamboo-01:8085/bamb

{code}

1 answer

1 accepted

Heartbeat problems can happen when the agent restarts the local JVM when it fails to respond. We can avoid these by relaxing the schedule for checking the JVM.

wrapper.ping.timeout=900

wrapper.ping.interval=30

References

Suggest an answer

Log in or Join to answer
Community showcase
Renan Battaglin
Published May 18, 2017 in Bamboo

FAQ: How to Upgrade Bamboo Server

Bamboo 5.9 will no longer be supported after June 12, 2017. What does this mean? As part of our End of Life policy, Atlassian supports major versions for two years after the first major iteratio...

1,068 views 0 5
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you
Atlassian Team Tour

Join us on the Team Tour

We're bringing product updates and pro tips on teamwork to ten cities around the world.

Save your spot