We're running Bamboo OnDemand builds on Amazon EC2 machines with VPC. For some reason, the VPC network connection isn't 100% reliable, and so occasionally a given TCP connection attempt will fail.
In our VM startup script, we need to download and setup a bunch of stuff via the VPC connection. Most of the time, that works as expected. But occasionally, when a connection drops, the script fails. When this happens, the VM is not suitable for running builds.
But Bamboo OnDemand doesn't *know* that the VM isn't suitable for running builds, so it tries scheduling a build there anyway. Usually the build fails pretty fast, for a reason having nothing to do with the code that it's building. If there's a large queue of waiting builds, bad VMs will tend to churn through the queue, quickly failing all of the builds there.
Is there a way for the VM to send a signal back to Bamboo saying that the startup script failed and that the agent on that machine should be disabled?
If your script fails, just do:
rm -rf /opt/bamboo-elastic-agent
killall -9 java
That will prevent the agent from starting and will kill any started agent. Bamboo will notice that the agent went down and remove from the list eventually (no builds will be sent to the agent in any case).
Over the next several weeks we'll be sharing some of our Getting Started guides here in the community. Throughout this series of posts, we'd love to hear from customers and non-customers ab...
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs