We're running Bamboo OnDemand builds on Amazon EC2 machines with VPC. For some reason, the VPC network connection isn't 100% reliable, and so occasionally a given TCP connection attempt will fail.
In our VM startup script, we need to download and setup a bunch of stuff via the VPC connection. Most of the time, that works as expected. But occasionally, when a connection drops, the script fails. When this happens, the VM is not suitable for running builds.
But Bamboo OnDemand doesn't *know* that the VM isn't suitable for running builds, so it tries scheduling a build there anyway. Usually the build fails pretty fast, for a reason having nothing to do with the code that it's building. If there's a large queue of waiting builds, bad VMs will tend to churn through the queue, quickly failing all of the builds there.
Is there a way for the VM to send a signal back to Bamboo saying that the startup script failed and that the agent on that machine should be disabled?
If your script fails, just do:
rm -rf /opt/bamboo-elastic-agent
killall -9 java
That will prevent the agent from starting and will kill any started agent. Bamboo will notice that the agent went down and remove from the list eventually (no builds will be sent to the agent in any case).
Hello, Atlassian Community! My name is Dave Meyer and I'm a Principal Product Manager at Atlassian. I wanted to give this community a heads up about an upcoming Webinar that might be of interest...
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event
You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events