Hi.
Since a few weeks, we are repeatedly experiencing an error preventing our self-hosted runners from running Bitbucket Cloud pipelines.
These runners may remain in an unhealthy state after executing a pipeline, leading to systematic failure of all subsequent pipeline ran on the faulty runner. The issue can only be resolved by restarting the VM or manually removing the offending file.
Here is the output of the pipeline:
Runner matching labels: - linux - fast - self.hosted Runner name: bitbucket-runner-fast-1 Runner UUID: {056b36f2-6db0-5784-b31e-543bb76093ca} Runner labels: self.hosted, linux, fast Runner version: current: 3.1.0 latest: 3.1.0 mkfifo: /var/lib/bitbucket-pipelines-runner/056b36f2-6db0-5784-b31e-543bb76093ca/tmp/clone_result: File exists Skipping cache upload for failed step Searching for test report files in directories named [test-reports, TestResults, test-results, surefire-reports, failsafe-reports] down to a depth of 4 Finished scanning for test reports. Found 0 test report files. Merged test suites, total number tests is 0, with 0 failures and 0 errors.
I've been trying to figure out how to reproduce the problem, but it's still not very clear. I have feeling it happens either:
Somehow, the "clone_result" file created by mkfifo isn't cleaned up, which causes subsequent runs to fail.
We checked that our configuration and VMs were correct. These have always worked well until then. We can't figure out what's causing the problem, and we're beginning to think it might be a Bitbucket problem.
Any ideas please?
Hi Adrien and welcome to the community!
At the end of each step in a pipelines build, the runner will try to clean up build directories before running the next step. If the runner can’t perform this cleanup, it will enter an Unhealthy state.
We have the following knowledge base article that mentions possible causes and how to troubleshoot this:
Can you please check the Build teardown section of the last step of the build that ran on this runner right before the one whose output you have shared, and let me know if you see any errors related to deleting this specific file?
Based on the path of the file, it looks like you may have changed the runner's working directory. Can you please confirm if this is the case? If so, have you adjusted the command that starts the runner as per our documentation here?
Kind regards,
Theodora
Adrien,
I just wanted to clarify regarding this part of my previous answer:
Based on the path of the file, it looks like you may have changed the runner's working directory. Can you please confirm if this is the case? If so, have you adjusted the command that starts the runner as per our documentation here?
https://support.atlassian.com/bitbucket-cloud/docs/set-up-and-use-runners-for-linux/#Changing-the-working-directory-of-your-runner
If you use something like this as per our doc:
docker run [all existing parameters] -v /mydir:/mydir -e WORKING_DIRECTORY=/mydir
then the volume mount on both the host and the container needs to be equal to the value specified in the WORKING_DIRECTORY environment variable.
You can also share the command you use to start the runner (after masking the UUIDs and OAUTH ID and secret) so that we check if it looks ok.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thank you for the prompt reply.
Curiously, we actually did not observe the problem again since last Tuesday. Yet, we didn't change our configuration. We continued to use our runners extensively in conditions that last week regularly generated the problem.
We looked back at the "Build teardown" section of pipelines before the one that failed last week but we did not find anything meaningful. One runner started to fail after a pipeline was marked as "Halted" by Bitbucket, but it's hard to tell if it's related, since the runner UUID of the "Halted" pipeline is not visible.
We are using GCP VM to host and execute Bitbucket docker runners. We set the "WORKING_DIRECTORY" environment variable to `/var/lib/bitbucket-pipelines-runner` and we use "-v /var/lib/bitbucket-pipelines-runner:/var/lib/bitbucket-pipelines-runner" as expected.
The only "unconventional" thing we also do is to make `/var/lib/bitbucket-pipelines-runner` a tmpfs. Here are the commands executed at boot of the VM:
mkdir -p /var/lib/bitbucket-pipelines-runner
mount -t tmpfs tmpfs /var/lib/bitbucket-pipelines-runner
docker system prune -af
Additionally, I noted we're using non-tagged version of the Bitbucket runner image:
docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-runner
It'll pull the "latest" tag, which is actually different from the "1" tag I've seen recommended for use.
Although we haven't encountered the problem again, we plan the following actions:
- Change the working directory to /tmp and stop mounting it as tmpfs
- Explicitly require the “:1” tag for the Bitbucket runner docker image
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Adrien,
Thank you for the update.
Regarding the runner image, I suggest using the non-tagged version. We have also changed the preconfigured command that starts the runner (when you create a new runner from the UI) so that it doesn't use the “:1” tag. This tag points to an older version of the runner (1.581) while the latest version is 3.1.0 at the moment.
If you encounter this issue again, I suggest creating a ticket with the support team where you can share logs safely (support tickets are not publicly visible) and we'll also be able to check build logs. You can create a ticket via https://support.atlassian.com/contact/#/, in "What can we help you with?" select "Technical issues and bugs" and then Bitbucket Cloud as product. When you are asked to provide the workspace URL, please make sure you enter the URL of the workspace that is on a paid billing plan to proceed with ticket creation.
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.