Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Bamboo agents losing connectivity to Bamboo sporadically

Serge Dukic December 14, 2011

Hi,

There are times when our Bamboo agents lose connectivity with the main Bamboo server, even though there are no network connectivity issues that we can see.

The exception message is:

org.springframework.jms.UncategorizedJmsException: Uncategorized exception occured during JMS processing; nested exception is javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException; nested exception is org.apache.activemq.transport.RequestTimedOutIOException

And occurs after several unsuccessful heartbeat attempts:

INFO | jvm 4 | 2011/12/15 11:11:51 | 2011-12-15 11:11:51,403 INFO [QuartzScheduler_Worker-4] [AgentHeartBeatJob] Not sending a new heartbeat since an old one is still being sent, last successful transmission time was 44 seconds ago, dropping the current heartbeat...

This happens to all of our agents at the same time and they usually come back after about 20 minutes or so, but their builds usually fail, giving "agent has gone offline" as the error.

Has anyone run into this problem before? My first thoughts were that it's a networking error, but I can confirm that the network connection between Bamboo and its agents stays up the whole time, even when the agents lose connection with the server.

Thanks

1 answer

1 accepted

0 votes
Answer accepted
Carl Lewis December 14, 2011

You might want to take a look at this, although not sure if it will help in your case:

http://confluence.atlassian.com/plugins/viewsource/viewpagesrc.action?pageId=216957427

Also I would suggest carefully monitoring the load on the Bamboo server host. If the host CPU is overloaded or the disk IO bandwidth is maxed out, it could cause agent connectivity problems.

This can particularly be a problem when running many remote agents.

Serge Dukic December 14, 2011

Thanks for that, we have 19 agents at the moment and we've recently added a few, so it might very well be the Bamboo server host. I've disabled the timeout to see if that might fix the problem, but moving to a faster server and/or reducing the number of agents will be the likely solution.

Carl Lewis December 14, 2011

We're currently running 47 remote agents off a single Bamboo server with 20-25 running builds at any time. Our server is a several-year-old 1RU Dell, dual-core, 3GB RAM with a single 7200RPM drive. Windows Server 2003 R2 x64. So nothing special.

All the agents are stable now, although we've had similar problems to yours in the past. Our problems seemed to be related to disk IO. A couple of things have helped:

(1) Disabling the virus scanner on the Bamboo home directory (duh!)

(2) De-fraging the drive & ensuring there is plenty of free space.

If your server specs are comparable you should have no problems running 19 agents

Serge Dukic December 14, 2011

I disabled six of the agents yesterday afternoon and the problem hasn't reappeared, so it does look like it's the Bamboo server performance. We're running bamboo on a VM on a Dell blade server, but were going to be migrating it to a more powerful machine anyway, this just gives us one more reason to do so. Thank you for your answers, they were really helpful.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events