Self hosted bitbucket runner 'docker command not found' suggested solution not working

Charlie Davies March 13, 2024

Hi,

I recently opened a question regarding our self hosted docker bitbucket runners which every now and again start to fail with 'docker command not found'.

I was given this link as the solution.

Having followed the steps in the above link I restart the runner, after having stopped it and deleted the empty docker folder, and I see the expected message in the logs:

Copying Docker cli to working directory.

However, if I then attempt to run the pipeline again it still fails with 'docker command not found' so the suggested solution is not working for me.

The only solution I have found is to delete the runners altogether and re-add them through the workspace settings > add runner.

This solution is very slow and repetitive as I have to click through the UI for each runner.

The problem seems to occur randomly so our nightly pipelines are not guaranteed to run which is causing issues for our development team when they start work in the morning.

Any help with this issue is greatly appreciated.

Many Thanks,

Charlie

1 answer

1 vote
Patrik S
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 14, 2024

Hello @Charlie Davies and thank you for reaching out to Community!

From the symptoms of the issues, I'm afraid you are being affected by a known bug in the self-hosted runners where sometimes they do not always clear the docker mount directory, causing the error you are reporting.

Unfortunately, currently we don't have a permanent fix for this, and the following bug ticket was raised with our development team to report this:

I would suggest you to add your vote there, since this helps both developers and product managers to understand the impact. Also, make sure you add yourself as a watcher in case you want to receive first-hand updates from that ticket. Please note that all bug fixes are implemented with this policy in mind.

As a workaround, one option you may want to try is to create a cron job, which removes that folder once a day and restarts the runner. Something similar to

0 0 * * * docker stop runner-<uuid> && rm -r /tmp/<runner-uuid>/docker && docker start runner-<uuid> >/dev/null 2>&1

So instead of having to restart and remove the folder manually, the cron job would help to automate the process.

Thank you, @Charlie Davies !

Patrik S

Charlie Davies March 15, 2024

Hi @Patrik S thank you for reaching out.

Unfortunately the workaround listed does not work for us. When we follow the process listed in the workaround we see the same "docker command not found" message and have to remove all of our current runners and then manually re-add them.

This process is taking time away from our engineers and the issue is being seen with a lot more frequency this year.

It is disappointing to see that the attached bug ticket does not having a higher priority as docker is a widely adopted technology.

For one of the leading CI platforms to have this issue with their self hosted offering is not good enough.

I hope that this issue is sorted ASAP.

Kind Regards,

Charlie

Like Patrik S likes this

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PERMISSIONS LEVEL
Site Admin
TAGS
AUG Leaders

Atlassian Community Events