bitbucket self hosted runner failing to start docker

cmorris98 February 2, 2024

As of yesterday at 6pm my time all self hosted bitbucket runners were working fine and we had our last successful build. This morning at 9:30 am all builds on self hosted runners are failing. I am struggling greatly trying to figure out what is happening. From what I can see.

Last successful build version details

Runner labels: self.hosted, linux, retail.science
Runner version:
current: 1.555
latest: 1.555

 


First failed build version details

Runner version:
current: 1.555
latest: 1.559
The version of this runner is outdated. Upgrade to the latest version (1.559).

So it seems a new version came out, but we were not using it yet. However the build had failed with the following error.
Service 'docker' exited with exit code: 1.
Here are the logs from the docker tab.
Runner warnings:
docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-docker-daemon:v25.0.2-multiarch-prod-stable: Your kernel does not support memory swappiness capabilities or the cgroup is not mounted. Memory swappiness discarded.
cat: can't open '/proc/net/arp_tables_names': No such file or directory
iptables v1.8.10 (nf_tables)
time="2024-02-02T16:32:14.749486743Z" level=warning msg="Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network." host="tcp://0.0.0.0:2375"
time="2024-02-02T16:32:14.749614343Z" level=warning msg="Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out there!" host="tcp://0.0.0.0:2375"
time="2024-02-02T16:32:15.749805000Z" level=warning msg="Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message" host="tcp://0.0.0.0:2375"
time="2024-02-02T16:32:15.749876257Z" level=warning msg="Please consider generating tls certificates with client validation to prevent exposing unauthenticated root access to your network" host="tcp://0.0.0.0:2375"
time="2024-02-02T16:32:15.749903598Z" level=warning msg="You can override this by explicitly specifying '--tls=false' or '--tlsverify=false'" host="tcp://0.0.0.0:2375"
time="2024-02-02T16:32:15.749921619Z" level=warning msg="Support for listening on TCP without authentication or explicit intent to run without authentication will be removed in the next release" host="tcp://0.0.0.0:2375"
time="2024-02-02T16:32:30.800604163Z" level=warning msg="failed to load plugin io.containerd.internal.v1.opt" error="mkdir /opt/containerd: read-only file system"
time="2024-02-02T16:32:30.800906129Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
time="2024-02-02T16:32:30.808994889Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
failed to start daemon: Error initializing network controller: error creating default "bridge" network: Failed to Setup IP tables: Unable to enable NAT rule: (iptables failed: iptables --wait -t nat -I POSTROUTING -s 172.18.0.0/16 ! -o docker0 -j MASQUERADE: Warning: Extension MASQUERADE revision 0 not supported, missing kernel module?
iptables v1.8.10 (nf_tables): CHAIN_ADD failed (No such file or directory): chain POSTROUTING
(exit status 4))

 If I switch this to use a cloud hosted runner it all works except of course for the part of the build that is trying to access internal resources. I have tried so many different versions of the runner including the latest but nothing seems to be working and I get the same error message. Can anyone point me in the right direction here? 

These runners are running on a Google Kubernetes  cluster. We have three runners and all three started failing with the same error. I even tried to create an entirely new node pool for these runners but same error. GKE version is 1.29.0-gke.1381000. 

I even tried moving it to a GKE cluster that is several versions older, but same error.

2 answers

1 accepted

1 vote
Answer accepted
cmorris98 February 5, 2024
0 votes
cmorris98 February 2, 2024

I want to add one more thing. The docker logs for the successful run have the following at the top.

Runner warnings:
docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-docker-daemon:v20.10.24-multiarch-runc-patch-prod-stable: Your kernel does not support memory swappiness capabilities or the cgroup is not mounted. Memory swappiness discarded.

The failing builds have 

Runner warnings:
docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-docker-daemon:v25.0.2-multiarch-prod-stable: Your kernel does not support memory swappiness capabilities or the cgroup is not mounted. Memory swappiness discarded.
So it seems It tried to use a new version of the the bitbucket-pipelines-docker-daomon. How can I force it to use the old version? 
cmorris98 February 2, 2024

I feel like this has to be the issue? I am running 

docker:20.10.24-dind with 
docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-runner:1.552 version of the runner. 
So I am unclear as to why it is trying to use version 25 of the bitbucket-pipelines-docker-daemon.

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PERMISSIONS LEVEL
Product Admin
TAGS
AUG Leaders

Atlassian Community Events