There are times when our Bamboo agents lose connectivity with the main Bamboo server, even though there are no network connectivity issues that we can see.
The exception message is:
org.springframework.jms.UncategorizedJmsException: Uncategorized exception occured during JMS processing; nested exception is javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException; nested exception is org.apache.activemq.transport.RequestTimedOutIOException
And occurs after several unsuccessful heartbeat attempts:
INFO | jvm 4 | 2011/12/15 11:11:51 | 2011-12-15 11:11:51,403 INFO [QuartzScheduler_Worker-4] [AgentHeartBeatJob] Not sending a new heartbeat since an old one is still being sent, last successful transmission time was 44 seconds ago, dropping the current heartbeat...
This happens to all of our agents at the same time and they usually come back after about 20 minutes or so, but their builds usually fail, giving "agent has gone offline" as the error.
Has anyone run into this problem before? My first thoughts were that it's a networking error, but I can confirm that the network connection between Bamboo and its agents stays up the whole time, even when the agents lose connection with the server.
You might want to take a look at this, although not sure if it will help in your case:
Also I would suggest carefully monitoring the load on the Bamboo server host. If the host CPU is overloaded or the disk IO bandwidth is maxed out, it could cause agent connectivity problems.
This can particularly be a problem when running many remote agents.
Thanks for that, we have 19 agents at the moment and we've recently added a few, so it might very well be the Bamboo server host. I've disabled the timeout to see if that might fix the problem, but moving to a faster server and/or reducing the number of agents will be the likely solution.
We're currently running 47 remote agents off a single Bamboo server with 20-25 running builds at any time. Our server is a several-year-old 1RU Dell, dual-core, 3GB RAM with a single 7200RPM drive. Windows Server 2003 R2 x64. So nothing special.
All the agents are stable now, although we've had similar problems to yours in the past. Our problems seemed to be related to disk IO. A couple of things have helped:
(1) Disabling the virus scanner on the Bamboo home directory (duh!)
(2) De-fraging the drive & ensuring there is plenty of free space.
If your server specs are comparable you should have no problems running 19 agents
I disabled six of the agents yesterday afternoon and the problem hasn't reappeared, so it does look like it's the Bamboo server performance. We're running bamboo on a VM on a Dell blade server, but were going to be migrating it to a more powerful machine anyway, this just gives us one more reason to do so. Thank you for your answers, they were really helpful.
Hey Community mates! Claire here from the Software Product Marketing team. We all know software development changes rapidly, and it's often tough to keep up. But from our research, we've found the h...
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs