Community
Products
Bitbucket
Questions
Unhealthy runner due to "mkfifo: .../tmp/clone_result: File exists"

Unhealthy runner due to "mkfifo: .../tmp/clone_result: File exists"

Hi.

Since a few weeks, we are repeatedly experiencing an error preventing our self-hosted runners from running Bitbucket Cloud pipelines.

These runners may remain in an unhealthy state after executing a pipeline, leading to systematic failure of all subsequent pipeline ran on the faulty runner. The issue can only be resolved by restarting the VM or manually removing the offending file.

Here is the output of the pipeline:

Runner matching labels:
    - linux
    - fast
    - self.hosted
Runner name: bitbucket-runner-fast-1
Runner UUID: {056b36f2-6db0-5784-b31e-543bb76093ca}
Runner labels: self.hosted, linux, fast
Runner version:
    current: 3.1.0
    latest: 3.1.0
mkfifo: /var/lib/bitbucket-pipelines-runner/056b36f2-6db0-5784-b31e-543bb76093ca/tmp/clone_result: File exists
Skipping cache upload for failed step
Searching for test report files in directories named [test-reports, TestResults, test-results, surefire-reports, failsafe-reports] down to a depth of 4
Finished scanning for test reports. Found 0 test report files.
Merged test suites, total number tests is 0, with 0 failures and 0 errors.

I've been trying to figure out how to reproduce the problem, but it's still not very clear. I have feeling it happens either:

When a running pipeline is stopped manually from the UI.
When a pipeline is failed prematurely due to "The clone failed due to a merge conflict with the destination branch. Fix conflicts and then commit the result."

Somehow, the "clone_result" file created by mkfifo isn't cleaned up, which causes subsequent runs to fail.

We checked that our configuration and VMs were correct. These have always worked well until then. We can't figure out what's causing the problem, and we're beginning to think it might be a Bitbucket problem.

Any ideas please?

1 answer

1 accepted

2 votes

Answer accepted

Hi Adrien and welcome to the community!

At the end of each step in a pipelines build, the runner will try to clean up build directories before running the next step. If the runner can’t perform this cleanup, it will enter an Unhealthy state.

We have the following knowledge base article that mentions possible causes and how to troubleshoot this:

https://confluence.atlassian.com/bbkb/troubleshooting-runners-1167819666.html#Troubleshootingrunners-Arunnerisinanunhealthystateandcannotrunpipelines

Can you please check the Build teardown section of the last step of the build that ran on this runner right before the one whose output you have shared, and let me know if you see any errors related to deleting this specific file?

Based on the path of the file, it looks like you may have changed the runner's working directory. Can you please confirm if this is the case? If so, have you adjusted the command that starts the runner as per our documentation here?

https://support.atlassian.com/bitbucket-cloud/docs/set-up-and-use-runners-for-linux/#Changing-the-working-directory-of-your-runner

Kind regards,
Theodora

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Adrien,

I just wanted to clarify regarding this part of my previous answer:

Based on the path of the file, it looks like you may have changed the runner's working directory. Can you please confirm if this is the case? If so, have you adjusted the command that starts the runner as per our documentation here?

https://support.atlassian.com/bitbucket-cloud/docs/set-up-and-use-runners-for-linux/#Changing-the-working-directory-of-your-runner

If you use something like this as per our doc:

docker run [all existing parameters] -v /mydir:/mydir -e WORKING_DIRECTORY=/mydir

then the volume mount on both the host and the container needs to be equal to the value specified in the WORKING_DIRECTORY environment variable.

You can also share the command you use to start the runner (after masking the UUIDs and OAUTH ID and secret) so that we check if it looks ok.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Theodora Boudale

Thank you for the prompt reply.

Curiously, we actually did not observe the problem again since last Tuesday. Yet, we didn't change our configuration. We continued to use our runners extensively in conditions that last week regularly generated the problem.

We looked back at the "Build teardown" section of pipelines before the one that failed last week but we did not find anything meaningful. One runner started to fail after a pipeline was marked as "Halted" by Bitbucket, but it's hard to tell if it's related, since the runner UUID of the "Halted" pipeline is not visible.

We are using GCP VM to host and execute Bitbucket docker runners. We set the "WORKING_DIRECTORY" environment variable to `/var/lib/bitbucket-pipelines-runner` and we use "-v /var/lib/bitbucket-pipelines-runner:/var/lib/bitbucket-pipelines-runner" as expected.

The only "unconventional" thing we also do is to make `/var/lib/bitbucket-pipelines-runner` a tmpfs. Here are the commands executed at boot of the VM:

mkdir -p /var/lib/bitbucket-pipelines-runner
mount -t tmpfs tmpfs /var/lib/bitbucket-pipelines-runner
docker system prune -af

Additionally, I noted we're using non-tagged version of the Bitbucket runner image:

docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-runner

It'll pull the "latest" tag, which is actually different from the "1" tag I've seen recommended for use.

Although we haven't encountered the problem again, we plan the following actions:
- Change the working directory to /tmp and stop mounting it as tmpfs
- Explicitly require the “:1” tag for the Bitbucket runner docker image

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi Adrien,

Thank you for the update.

Regarding the runner image, I suggest using the non-tagged version. We have also changed the preconfigured command that starts the runner (when you create a new runner from the UI) so that it doesn't use the “:1” tag. This tag points to an older version of the runner (1.581) while the latest version is 3.1.0 at the moment.

If you encounter this issue again, I suggest creating a ticket with the support team where you can share logs safely (support tickets are not publicly visible) and we'll also be able to check build logs. You can create a ticket via https://support.atlassian.com/contact/#/, in "What can we help you with?" select "Technical issues and bugs" and then Bitbucket Cloud as product. When you are asked to provide the workspace URL, please make sure you enter the URL of the workspace that is on a paid billing plan to proceed with ticket creation.

Kind regards,
Theodora

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Adrien G likes this

Suggest an answer

Was this helpful?

Thanks!

Bitbucket

DEPLOYMENT TYPE

CLOUD

PRODUCT PLAN

STANDARD

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Unhealthy runner due to "mkfifo: .../tmp/clone_result: File exists"

1 answer

1 accepted

Suggest an answer

Was this helpful?

Thanks!

DEPLOYMENT TYPE

PRODUCT PLAN

TAGS

Atlassian Community Events