Hi,
We are able to execute runners on EC2 but because runner is scoped to repository it is not a scalable model. Is there a plan to allow runners to be executed by Kubernetes platform OpenShift or EKS Fargate?
Thank you
Quick Answer: Update the docker in docker image for the k8s job.
- name: docker-in-docker
image: docker:24.0.7-dind
Long answer:
I had grabbed the example yamls from:
* https://golesuite.com/en/blog/bitbukect-runners/
* https://support.atlassian.com/bitbucket-cloud/docs/deploying-the-docker-based-runner-on-kubernetes/
But had the error. After some searching I found this post: https://itgcommerce.com/how-to-run-self-hosted-bitbucket-pipelines-runners-in-kubernetes/. Basically saying upgrade the docker-in-docker image.
So I checked for the latest tags at: https://itgcommerce.com/how-to-run-self-hosted-bitbucket-pipelines-runners-in-kubernetes/
I update the k8s job.yaml to 24.07-dind. Then I updated the job:
kubectl delete --namespace=bitbucket -f job.yaml # Your namespace maybe different
kubectl apply --namespace=bitbucket -f job.yaml
So far, so good. 🤞
Maybe atlassian could update the example at https://support.atlassian.com/bitbucket-cloud/docs/deploying-the-docker-based-runner-on-kubernetes/ with a newer tag for docker-in-docker?
I found this bug in containerd
If you change the docker version from "20.10.5-dind" to "19.03.15-dind" it works.
I believe fargate is using a version of containerd that isnt compatible with the latest docker version v20+ according to that ticket its a bug in containerd.
Kind Regards,
Nathan Burrell
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Alex.Gilburd We have tested the pod spec only on a K8s cluster, we haven't tested it on AWS Fargate. Also, AWS Fargate doesn't support privileged containers, which is required for Bitbucket Pipeline Runner. I think you will have to run it directly on AWS EKS. Is there a reason for not running it directly on a K8s cluster?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@feithj I can see the following error in the logs, these logs are from the K8s cluster. I believe that is causing the container not to start, can you debug it at your end?
time="2021-03-23T20:22:57.558837721Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" expected="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" ref="unknown-sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" total=6949
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Alex,
You can run the current runner in kubernetes by putting the runner container and a docker in docker container inside the pod spec with the /var/lib/docker/containers, /var/run/ and tmp directories shared between the two containers to allow it to run containers.
This is however for a long lived runner.
We are looking into self registering and auto scaling runners in a future release aswell.
Kind Regards,
Nathan Burrell
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Nitin Goyal You can start the runner on a Kubernetes cluster with the following template
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: Secret
metadata:
name: runner-oauth-credentials
# labels:
# accountUuid: # Add your account uuid to optionally allow finding the secret for an account
# repositoryUuid: # Add your repository uuid to optionally allow finding the secret for a repository
# runnerUuid: # Add your runner uuid to optionally allow finding the secret for a particular runner
data:
oauthClientId: # add your base64 encoded oauth client id here
oauthClientSecret: # add your base64 encoded oauth client secret here
- apiVersion: batch/v1
kind: Job
metadata:
name: runner
spec:
template:
# metadata:
# labels:
# accountUuid: # Add your account uuid to optionally allow finding the pods for an account
# repositoryUuid: # Add your repository uuid to optionally allow finding the pods for a repository
# runnerUuid: # Add your runner uuid to optionally allow finding the pods for a particular runner
spec:
containers:
- name: runner
image: docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-runner
env:
- name: ACCOUNT_UUID
value: # Add your account uuid here
- name: REPOSITORY_UUID
value: # Add your repository uuid here
- name: RUNNER_UUID
value: # Add your runner uuid here
- name: OAUTH_CLIENT_ID
valueFrom:
secretKeyRef:
name: runner-oauth-credentials
key: oauthClientId
- name: OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: runner-oauth-credentials
key: oauthClientSecret
- name: WORKING_DIRECTORY
value: "/tmp"
volumeMounts:
- name: tmp
mountPath: /tmp
- name: docker-containers
mountPath: /var/lib/docker/containers
readOnly: true # the runner only needs to read these files never write to them
- name: var-run
mountPath: /var/run
- name: docker-in-docker
image: docker:20.10.5-dind
securityContext:
privileged: true # required to allow docker in docker to run and assumes the namespace your applying this to has a pod security policy that allows privilege escalation
volumeMounts:
- name: tmp
mountPath: /tmp
- name: docker-containers
mountPath: /var/lib/docker/containers
- name: var-run
mountPath: /var/run
restartPolicy: OnFailure # this allows the runner to restart locally if it was to crash
volumes:
- name: tmp # required to share a working directory between docker in docker and the runner
- name: docker-containers # required to share the containers directory between docker in docker and the runner
- name: var-run # required to share the docker socket between docker in docker and the runner
# backoffLimit: 6 # this is the default and means it will retry upto 6 times if it crashes before it considers itself a failure with an exponential backoff between
# completions: 1 # this is the default the job should ideally never complete as the runner never shuts down successfully
# parallelism: 1 # this is the default their should only be one instance of this particular runner
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Nitin Goyal Please let us know your feedback around running runner in a K8s cluster.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Justin Thomas you shared an example how to run 1 runner in k8s, it means that runner limited by Docker in docker of current k8s node
but what about a scalable CI solution with more native integration with k8s
1 - have ability to automatically scale down and up node instances depends on runner workload (to cover case with lot of concurrent pipelines of the same runnner)
(examples https://docs.gitlab.com/runner/configuration/autoscale.html https://plugins.jenkins.io/ec2-fleet/ )
2 - it will be great to automatically get "runner-oauth-credentials". now we need manually create separate secret config for each runner, in each env, it's absolutely not scalable
or any other solution (not limited to K8S) but with features (1) and (2)
- may be AWS Fargate
- or just EC2 runners
> We are looking into self registering and auto scaling runners in a future release aswell.
@lassian thnx a lot for sharing this info! Do you have the ticket we can subscribe or vote +1 for such features ? :)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We used above template to run on AWS EKS fargate node group.
Error:
time="2021-03-23T20:22:18.594751955Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2021-03-23T20:22:18.644290921Z" level=info msg="Loading containers: start."
time="2021-03-23T20:22:18.651362827Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge 172032 1 br_netfilter\nstp 16384 1 bridge\nllc 16384 2 bridge,stp\nipv6 528384 86 ip_vs,bridge,[permanent]\nip: can't find device 'br_netfilter'\nbr_netfilter 24576 0 \nbridge 172032 1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"
time="2021-03-23T20:22:18.813070340Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2021-03-23T20:22:19.011710378Z" level=info msg="Loading containers: done."
time="2021-03-23T20:22:19.034360362Z" level=info msg="Docker daemon" commit=363e9a8 graphdriver(s)=overlay2 version=20.10.5
time="2021-03-23T20:22:19.034453946Z" level=info msg="Daemon has completed initialization"
time="2021-03-23T20:22:19.055564837Z" level=info msg="API listen on /var/run/docker.sock"
time="2021-03-23T20:22:19.059723919Z" level=info msg="API listen on [::]:2376"
time="2021-03-23T20:22:57.486662442Z" level=warning msg="reference for unknown type: application/vnd.docker.distribution.manifest.v1+prettyjws" digest="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" remote="docker.io/google/pause:latest"
time="2021-03-23T20:22:57.558837721Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" expected="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" ref="unknown-sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" total=6949
time="2021-03-23T20:22:57.562631717Z" level=warning msg="Error persisting manifest" digest="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" error="error committing manifest to content store: commit failed: unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" remote="docker.io/google/pause:latest"
time="2021-03-23T20:22:57.562705076Z" level=warning msg="Image docker.io/google/pause:latest uses outdated schema1 manifest format. Please upgrade to a schema2 image for better future compatibility. More information at https://docs.docker.com/registry/spec/deprecated-schema-v1/"
time="2021-03-23T20:23:01.027090628Z" level=error msg="copy shim log" error="reading from a closed fifo"
time="2021-03-23T20:23:01.027509549Z" level=error msg="stream copy error: reading from a closed fifo"
time="2021-03-23T20:23:01.027943668Z" level=error msg="stream copy error: reading from a closed fifo"
time="2021-03-23T20:23:01.120519744Z" level=error msg="fdbdc22e716870ac5c92a1f0e0ca8e72102dc84fd77950126b893349803fa8e3 cleanup: failed to delete container from containerd: no such container"
time="2021-03-23T20:23:01.120558400Z" level=error msg="Handler for POST /containers/fdbdc22e716870ac5c92a1f0e0ca8e72102dc84fd77950126b893349803fa8e3/start returned error: io.containerd.runc.v2: failed to adjust OOM score for shim: set shim OOM score: write /proc/617/oom_score_adj: invalid argument\n: exit status 1: unknown"
@Justin Thomas this error blocks us from running on EKS. Even simple. echo command fails.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Artsiom Zhurbila Thanks for the feedback.
It means that runner limited by Docker in docker of current k8s node
Does this limitation block you from using runners in a K8s cluster?
2 - it will be great to automatically get "runner-oauth-credentials". now we need manually create separate secret config for each runner, in each env, it's absolutely not scalable
Would the auto-scaling runner solve this problem or do you want to use the same runner secrets to start multiple runners?
thnx a lot for sharing this info! Do you have the ticket we can subscribe or vote +1 for such features ? :)
We currently don't have a public ticket for the auto-scaling feature because the runner is not yet GA. Once it's publicly available, I will create a ticket and share it with you.
Thank you again for taking the time to provide feedback. This helps us prioritize and build the features.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
> Does this limitation block you from using runners in a K8s cluster?
yes, we are planning have CI workload more than 1 VM (k8s node)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Artsiom Zhurbila You can run multiple runners on a single K8s node, they have their own namespace and filesystem. Maybe I am not understanding your use case?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
example:
1 VM = 32 GB RAM, 8 CPU
but we plan to run multiple jobs in parallel which required more than 32 GB 8 CPU
we need an automatical scale up from 1 VM to several VMs
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Does this limitation block you from using runners in a K8s cluster?
This limitation blocks us. Since Kubernetes deprecated docker all cloud managed kubernetes services migrated containerd or cri-o.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.