Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Build Queue - refresh/restart builds?

Simon Hooper December 22, 2019

After restarting bamboo, all of our local agents were offline. This resulted in a number of builds queuing with a status of "No agent can build this job".

Since restarting each agent, the build are still sitting in the queue with the error, no agents. New builds have since triggered, jumped the current queue and started running.

 

Is there a way to get bamboo to re-evaluate the queue to kick off these builds?

i'm looking for a way to kick these off without going into each build manually and re-running?

image.png

 

 

2 answers

1 accepted

1 vote
Answer accepted
Simon Hooper February 10, 2020

Is there a way to get bamboo to re-evaluate the queue to kick off these builds?

Once the builds fall in this state unfortunately, each build will need to be cancelled and restarted manually

2 votes
Daniel Santos
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
December 24, 2019

Hi @Simon Hooper

I assuming you are not running Bamboo 6.10.2 or newer where agents disconnection caused by a server restart should be a lot less frequent due to the implementation of [BAM-18608] Increase the default time taken for the agents to reconnect in case the server is temporarily down.

I just ran a quick test and the best way to stop those builds is by restarting the server. A quick restart should not drop any agents even in your version.

I do recommend that you follow the workaround suggested in the following document to increase the time that your agents are able to reconnect after a server restart. That should save you from this situation next time you need to restart your server to do some maintenance. This is the document:

I hope that helps.

Simon Hooper January 15, 2020

Hi Dan, yes we are running 6.10.2.

 

I believe that the bamboo host server was restarted due to a OS patch being applied. I'm not sure how long it was down however. Once the agents came back on line the builds in the queue did not resume.

To get these builds to work for my team i had to cancel and restart each build manually. Not ideal.

 

I'm not sure that Remote agent does not restart after server outage - Atlassian Documentation is related. The agents did come back on line and they serviced subsequent builds fine. It was the ones that were already in the queue that just seemed to never be evaluated again.

I would have assumed that once the agents reconnected they would proceed to work on the existing queue?

Daniel Santos
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
February 3, 2020

First of all, I'm sorry for the delay (vacation time).

@Simon Hooper, from what you say, I agree that the problem was not caused by the agents being offline, but there is a chance that scenario was triggered by the absence of agents online right after the Bamboo server was back.

I suggested that link as an attempt to reduce the possibilities of triggering this scenario.

I would have assumed that once the agents reconnected they would proceed to work on the existing queue?

Yes, we are on the same page. Unfortunately, the feature didn't work as we expected.

This type of problem is more complex to troubleshoot. We would need to check the logs from that event if you still have them (due to log rotation they could be lost already). 

Do you see any error messages in <Bamboo_Install>/logs/catalina.out (assuming you are running Bamboo on a Linux instance) ?

Simon Hooper February 10, 2020

We have bamboo is running on windows.

I noted an error on one of the builds that failed to restart. Appears to be related to connecting to git.

image.png

Ill check with the services team to see if they take down the git server for patching also at these times.

 

My suspicion is this is just a timing issue with when each server is taken offline for patching. 



Simon Hooper February 24, 2020

For everyone's benefit, we confirmed this was a timing issue with bitbucket being taken down for maintenance after a build had triggered. 

Like # people like this

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events