Executing Runners in Kubernetes environment

Hi,

We are able to execute runners on EC2 but because runner is scoped to repository it is not a scalable model. Is there a plan to allow runners to be executed by Kubernetes platform OpenShift or EKS Fargate?

Thank you

4 answers

Suggest an answer

3 votes

Quick Answer: Update the docker in docker image for the k8s job.

- name: docker-in-docker
  image: docker:24.0.7-dind

Long answer:

I had grabbed the example yamls from:

* https://golesuite.com/en/blog/bitbukect-runners/
* https://support.atlassian.com/bitbucket-cloud/docs/deploying-the-docker-based-runner-on-kubernetes/

But had the error. After some searching I found this post: https://itgcommerce.com/how-to-run-self-hosted-bitbucket-pipelines-runners-in-kubernetes/. Basically saying upgrade the docker-in-docker image.

So I checked for the latest tags at: https://itgcommerce.com/how-to-run-self-hosted-bitbucket-pipelines-runners-in-kubernetes/

I update the k8s job.yaml to 24.07-dind. Then I updated the job:

kubectl delete --namespace=bitbucket -f job.yaml # Your namespace maybe different
kubectl apply  --namespace=bitbucket -f job.yaml

So far, so good. 🤞

Maybe atlassian could update the example at https://support.atlassian.com/bitbucket-cloud/docs/deploying-the-docker-based-runner-on-kubernetes/ with a newer tag for docker-in-docker?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi @Alex.Gilburd

I found this bug in containerd

If you change the docker version from "20.10.5-dind" to "19.03.15-dind" it works.

I believe fargate is using a version of containerd that isnt compatible with the latest docker version v20+ according to that ticket its a bug in containerd.

Kind Regards,

Nathan Burrell

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

@Alex.Gilburd We have tested the pod spec only on a K8s cluster, we haven't tested it on AWS Fargate. Also, AWS Fargate doesn't support privileged containers, which is required for Bitbucket Pipeline Runner. I think you will have to run it directly on AWS EKS. Is there a reason for not running it directly on a K8s cluster?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Justin - We put this on a node group inside of EKS/Fargate which allows us to run the container as privileged.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@feithj I can see the following error in the logs, these logs are from the K8s cluster. I believe that is causing the container not to start, can you debug it at your end?

time="2021-03-23T20:22:57.558837721Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" expected="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" ref="unknown-sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" total=6949

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi Alex,

You can run the current runner in kubernetes by putting the runner container and a docker in docker container inside the pod spec with the /var/lib/docker/containers, /var/run/ and tmp directories shared between the two containers to allow it to run containers.

This is however for a long lived runner.

We are looking into self registering and auto scaling runners in a future release aswell.

Kind Regards,

Nathan Burrell

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@lassiancan you share a sample working pod specs which will be helpful for everyone.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • like this

@Nitin Goyal You can start the runner on a Kubernetes cluster with the following template

apiVersion: v1
kind: List
items:
  - apiVersion: v1
    kind: Secret
    metadata:
      name: runner-oauth-credentials
#      labels:
#        accountUuid: # Add your account uuid to optionally allow finding the secret for an account
#        repositoryUuid: # Add your repository uuid to optionally allow finding the secret for a repository
#        runnerUuid: # Add your runner uuid to optionally allow finding the secret for a particular runner
    data:
      oauthClientId: # add your base64 encoded oauth client id here
      oauthClientSecret: # add your base64 encoded oauth client secret here
  - apiVersion: batch/v1
    kind: Job
    metadata:
      name: runner
    spec:
      template:
#        metadata:
#          labels:
#            accountUuid: # Add your account uuid to optionally allow finding the pods for an account
#            repositoryUuid: # Add your repository uuid to optionally allow finding the pods for a repository
#            runnerUuid: # Add your runner uuid to optionally allow finding the pods for a particular runner
        spec:
          containers:
            - name: runner
              image: docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-runner
              env:
                - name: ACCOUNT_UUID
                  value: # Add your account uuid here
                - name: REPOSITORY_UUID
                  value: # Add your repository uuid here
                - name: RUNNER_UUID
                  value: # Add your runner uuid here
                - name: OAUTH_CLIENT_ID
                  valueFrom:
                    secretKeyRef:
                      name: runner-oauth-credentials
                      key: oauthClientId
                - name: OAUTH_CLIENT_SECRET
                  valueFrom:
                    secretKeyRef:
                      name: runner-oauth-credentials
                      key: oauthClientSecret
                - name: WORKING_DIRECTORY
                  value: "/tmp"
              volumeMounts:
                - name: tmp
                  mountPath: /tmp
                - name: docker-containers
                  mountPath: /var/lib/docker/containers
                  readOnly: true # the runner only needs to read these files never write to them
                - name: var-run
                  mountPath: /var/run
            - name: docker-in-docker
              image: docker:20.10.5-dind
              securityContext:
                privileged: true # required to allow docker in docker to run and assumes the namespace your applying this to has a pod security policy that allows privilege escalation
              volumeMounts:
                - name: tmp
                  mountPath: /tmp
                - name: docker-containers
                  mountPath: /var/lib/docker/containers
                - name: var-run
                  mountPath: /var/run
          restartPolicy: OnFailure # this allows the runner to restart locally if it was to crash
          volumes:
            - name: tmp # required to share a working directory between docker in docker and the runner
            - name: docker-containers # required to share the containers directory between docker in docker and the runner
            - name: var-run # required to share the docker socket between docker in docker and the runner
        # backoffLimit: 6 # this is the default and means it will retry upto 6 times if it crashes before it considers itself a failure with an exponential backoff between
        # completions: 1 # this is the default the job should ideally never complete as the runner never shuts down successfully
        # parallelism: 1 # this is the default their should only be one instance of this particular runner

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • like this

@Nitin Goyal Please let us know your feedback around running runner in a K8s cluster.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@Justin Thomas you shared an example how to run 1 runner in k8s, it means that runner limited by Docker in docker of current k8s node

but what about a scalable CI solution with more native integration with k8s
1 - have ability to automatically scale down and up node instances depends on runner workload (to cover case with lot of concurrent pipelines of the same runnner)
(examples https://docs.gitlab.com/runner/configuration/autoscale.html https://plugins.jenkins.io/ec2-fleet/ )
2 - it will be great to automatically get "runner-oauth-credentials". now we need manually create separate secret config for each runner, in each env, it's absolutely not scalable

or any other solution (not limited to K8S) but with features (1) and (2)
- may be AWS Fargate
- or just EC2 runners

> We are looking into self registering and auto scaling runners in a future release aswell.

@lassian thnx a lot for sharing this info! Do you have the ticket we can subscribe or vote +1 for such features ? :)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

We used above template to run on AWS EKS fargate node group.

Error:

time="2021-03-23T20:22:18.594751955Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc

time="2021-03-23T20:22:18.644290921Z" level=info msg="Loading containers: start."

time="2021-03-23T20:22:18.651362827Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge 172032 1 br_netfilter\nstp 16384 1 bridge\nllc 16384 2 bridge,stp\nipv6 528384 86 ip_vs,bridge,[permanent]\nip: can't find device 'br_netfilter'\nbr_netfilter 24576 0 \nbridge 172032 1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"

time="2021-03-23T20:22:18.813070340Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"

time="2021-03-23T20:22:19.011710378Z" level=info msg="Loading containers: done."

time="2021-03-23T20:22:19.034360362Z" level=info msg="Docker daemon" commit=363e9a8 graphdriver(s)=overlay2 version=20.10.5

time="2021-03-23T20:22:19.034453946Z" level=info msg="Daemon has completed initialization"

time="2021-03-23T20:22:19.055564837Z" level=info msg="API listen on /var/run/docker.sock"

time="2021-03-23T20:22:19.059723919Z" level=info msg="API listen on [::]:2376"

time="2021-03-23T20:22:57.486662442Z" level=warning msg="reference for unknown type: application/vnd.docker.distribution.manifest.v1+prettyjws" digest="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" remote="docker.io/google/pause:latest"

time="2021-03-23T20:22:57.558837721Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" expected="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" ref="unknown-sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" total=6949

time="2021-03-23T20:22:57.562631717Z" level=warning msg="Error persisting manifest" digest="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" error="error committing manifest to content store: commit failed: unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" remote="docker.io/google/pause:latest"

time="2021-03-23T20:22:57.562705076Z" level=warning msg="Image docker.io/google/pause:latest uses outdated schema1 manifest format. Please upgrade to a schema2 image for better future compatibility. More information at https://docs.docker.com/registry/spec/deprecated-schema-v1/"

time="2021-03-23T20:23:01.027090628Z" level=error msg="copy shim log" error="reading from a closed fifo"

time="2021-03-23T20:23:01.027509549Z" level=error msg="stream copy error: reading from a closed fifo"

time="2021-03-23T20:23:01.027943668Z" level=error msg="stream copy error: reading from a closed fifo"

time="2021-03-23T20:23:01.120519744Z" level=error msg="fdbdc22e716870ac5c92a1f0e0ca8e72102dc84fd77950126b893349803fa8e3 cleanup: failed to delete container from containerd: no such container"

time="2021-03-23T20:23:01.120558400Z" level=error msg="Handler for POST /containers/fdbdc22e716870ac5c92a1f0e0ca8e72102dc84fd77950126b893349803fa8e3/start returned error: io.containerd.runc.v2: failed to adjust OOM score for shim: set shim OOM score: write /proc/617/oom_score_adj: invalid argument\n: exit status 1: unknown"

@Justin Thomas this error blocks us from running on EKS. Even simple. echo command fails.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@Artsiom Zhurbila Thanks for the feedback.

It means that runner limited by Docker in docker of current k8s node

Does this limitation block you from using runners in a K8s cluster?

2 - it will be great to automatically get "runner-oauth-credentials". now we need manually create separate secret config for each runner, in each env, it's absolutely not scalable

Would the auto-scaling runner solve this problem or do you want to use the same runner secrets to start multiple runners?

thnx a lot for sharing this info! Do you have the ticket we can subscribe or vote +1 for such features ? :)

We currently don't have a public ticket for the auto-scaling feature because the runner is not yet GA. Once it's publicly available, I will create a ticket and share it with you.

Thank you again for taking the time to provide feedback. This helps us prioritize and build the features.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Artsiom Zhurbila likes this

> Does this limitation block you from using runners in a K8s cluster?

yes, we are planning have CI workload more than 1 VM (k8s node)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@Artsiom Zhurbila You can run multiple runners on a single K8s node, they have their own namespace and filesystem. Maybe I am not understanding your use case?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

example:
1 VM = 32 GB RAM, 8 CPU

but we plan to run multiple jobs in parallel which required more than 32 GB 8 CPU
we need an automatical scale up from 1 VM to several VMs

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Justin Thomas likes this

@Justin Thomas

Does this limitation block you from using runners in a K8s cluster?

This limitation blocks us. Since Kubernetes deprecated docker all cloud managed kubernetes services migrated containerd or cri-o.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Justin Thomas likes this

Was this helpful?

Thanks!

Bitbucket Pipelines: Runners

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Executing Runners in Kubernetes environment

4 answers

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events

Ask a question

Start a discussion

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Executing Runners in Kubernetes environment

4 answers

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events