Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,293,260
Community Members
 
Community Events
165
Community Groups

Executing Runners in Kubernetes environment

Hi,

We are able to execute runners on EC2 but because runner is scoped to repository it is not a scalable model. Is there a plan to allow runners to be executed by Kubernetes platform OpenShift or EKS Fargate?

Thank you 

3 answers

0 votes
lassian Atlassian Team Apr 05, 2021

Hi @Alex.Gilburd 

I found this bug in containerd

If you change the docker version from "20.10.5-dind" to "19.03.15-dind" it works.

I believe fargate is using a version of containerd that isnt compatible with the latest docker version v20+ according to that ticket its a bug in containerd.

Kind Regards,

Nathan Burrell

0 votes

@Alex.Gilburd We have tested the pod spec only on a K8s cluster, we haven't tested it on AWS Fargate. Also, AWS Fargate doesn't support privileged containers, which is required for Bitbucket Pipeline Runner. I think you will have to run it directly on AWS EKS. Is there a reason for not running it directly on a K8s cluster?

Justin - We put this on a node group inside of EKS/Fargate which allows us to run the container as privileged. 

@feithj I can see the following error in the logs, these logs are from the K8s cluster. I believe that is causing the container not to start, can you debug it at your end?

time="2021-03-23T20:22:57.558837721Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" expected="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" ref="unknown-sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" total=6949
0 votes
lassian Atlassian Team Mar 10, 2021

Hi Alex,

You can run the current runner in kubernetes by putting the runner container and a docker in docker container inside the pod spec with the /var/lib/docker/containers, /var/run/ and tmp directories shared between the two containers to allow it to run containers.

This is however for a long lived runner.

We are looking into self registering and auto scaling runners in a future release aswell.

Kind Regards,

Nathan Burrell

@lassiancan you share a sample working pod specs which will be helpful for everyone.

Like # people like this

@Nitin Goyal You can start the runner on a Kubernetes cluster with the following template

apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: Secret
metadata:
name: runner-oauth-credentials
# labels:
# accountUuid: # Add your account uuid to optionally allow finding the secret for an account
# repositoryUuid: # Add your repository uuid to optionally allow finding the secret for a repository
# runnerUuid: # Add your runner uuid to optionally allow finding the secret for a particular runner
data:
oauthClientId: # add your base64 encoded oauth client id here
oauthClientSecret: # add your base64 encoded oauth client secret here
- apiVersion: batch/v1
kind: Job
metadata:
name: runner
spec:
template:
# metadata:
# labels:
# accountUuid: # Add your account uuid to optionally allow finding the pods for an account
# repositoryUuid: # Add your repository uuid to optionally allow finding the pods for a repository
# runnerUuid: # Add your runner uuid to optionally allow finding the pods for a particular runner
spec:
containers:
- name: runner
image: docker-public.packages.atlassian.com/sox/atlassian/bitbucket-pipelines-runner
env:
- name: ACCOUNT_UUID
value: # Add your account uuid here
- name: REPOSITORY_UUID
value: # Add your repository uuid here
- name: RUNNER_UUID
value: # Add your runner uuid here
- name: OAUTH_CLIENT_ID
valueFrom:
secretKeyRef:
name: runner-oauth-credentials
key: oauthClientId
- name: OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: runner-oauth-credentials
key: oauthClientSecret
- name: WORKING_DIRECTORY
value: "/tmp"
volumeMounts:
- name: tmp
mountPath: /tmp
- name: docker-containers
mountPath: /var/lib/docker/containers
readOnly: true # the runner only needs to read these files never write to them
- name: var-run
mountPath: /var/run
- name: docker-in-docker
image: docker:20.10.5-dind
securityContext:
privileged: true # required to allow docker in docker to run and assumes the namespace your applying this to has a pod security policy that allows privilege escalation
volumeMounts:
- name: tmp
mountPath: /tmp
- name: docker-containers
mountPath: /var/lib/docker/containers
- name: var-run
mountPath: /var/run
restartPolicy: OnFailure # this allows the runner to restart locally if it was to crash
volumes:
- name: tmp # required to share a working directory between docker in docker and the runner
- name: docker-containers # required to share the containers directory between docker in docker and the runner
- name: var-run # required to share the docker socket between docker in docker and the runner
# backoffLimit: 6 # this is the default and means it will retry upto 6 times if it crashes before it considers itself a failure with an exponential backoff between
# completions: 1 # this is the default the job should ideally never complete as the runner never shuts down successfully
# parallelism: 1 # this is the default their should only be one instance of this particular runner
Like # people like this

@Nitin Goyal Please let us know your feedback around running runner in a K8s cluster.

@Justin Thomas you shared an example how to run 1 runner in k8s, it means that runner limited by Docker in docker of current k8s node

but what about a scalable CI solution with more native integration with k8s
1 - have ability to automatically scale down and up node instances depends on runner workload (to cover case with lot of concurrent pipelines of the same runnner)
(examples https://docs.gitlab.com/runner/configuration/autoscale.html https://plugins.jenkins.io/ec2-fleet/ )
2 - it will be great to automatically get "runner-oauth-credentials". now we need manually create separate secret config for each runner, in each env, it's absolutely not scalable

or any other solution (not limited to K8S) but with features (1) and (2)
- may be AWS Fargate
- or just EC2 runners


We are looking into self registering and auto scaling runners in a future release aswell.

@lassian thnx a lot for sharing this info! Do you have the ticket we can subscribe or vote +1 for such features ? :) 

We used above template to run on AWS EKS fargate node group.

Error:

time="2021-03-23T20:22:18.594751955Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc

time="2021-03-23T20:22:18.644290921Z" level=info msg="Loading containers: start."

time="2021-03-23T20:22:18.651362827Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge                172032  1 br_netfilter\nstp                    16384  1 bridge\nllc                    16384  2 bridge,stp\nipv6                  528384 86 ip_vs,bridge,[permanent]\nip: can't find device 'br_netfilter'\nbr_netfilter           24576  0 \nbridge                172032  1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"

time="2021-03-23T20:22:18.813070340Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"

time="2021-03-23T20:22:19.011710378Z" level=info msg="Loading containers: done."

time="2021-03-23T20:22:19.034360362Z" level=info msg="Docker daemon" commit=363e9a8 graphdriver(s)=overlay2 version=20.10.5

time="2021-03-23T20:22:19.034453946Z" level=info msg="Daemon has completed initialization"

time="2021-03-23T20:22:19.055564837Z" level=info msg="API listen on /var/run/docker.sock"

time="2021-03-23T20:22:19.059723919Z" level=info msg="API listen on [::]:2376"

time="2021-03-23T20:22:57.486662442Z" level=warning msg="reference for unknown type: application/vnd.docker.distribution.manifest.v1+prettyjws" digest="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" remote="docker.io/google/pause:latest"

time="2021-03-23T20:22:57.558837721Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" expected="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" ref="unknown-sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" total=6949

time="2021-03-23T20:22:57.562631717Z" level=warning msg="Error persisting manifest" digest="sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c" error="error committing manifest to content store: commit failed: unexpected commit digest sha256:9735a647596859b4cb1f164d5f8f5f8ca4dead79d778825e974e8123a77a17e6, expected sha256:e8fc56926ac3d5705772f13befbaee3aa2fc6e9c52faee3d96b26612cd77556c: failed precondition" remote="docker.io/google/pause:latest"

time="2021-03-23T20:22:57.562705076Z" level=warning msg="Image docker.io/google/pause:latest uses outdated schema1 manifest format. Please upgrade to a schema2 image for better future compatibility. More information at https://docs.docker.com/registry/spec/deprecated-schema-v1/"

time="2021-03-23T20:23:01.027090628Z" level=error msg="copy shim log" error="reading from a closed fifo"

time="2021-03-23T20:23:01.027509549Z" level=error msg="stream copy error: reading from a closed fifo"

time="2021-03-23T20:23:01.027943668Z" level=error msg="stream copy error: reading from a closed fifo"

time="2021-03-23T20:23:01.120519744Z" level=error msg="fdbdc22e716870ac5c92a1f0e0ca8e72102dc84fd77950126b893349803fa8e3 cleanup: failed to delete container from containerd: no such container"

time="2021-03-23T20:23:01.120558400Z" level=error msg="Handler for POST /containers/fdbdc22e716870ac5c92a1f0e0ca8e72102dc84fd77950126b893349803fa8e3/start returned error: io.containerd.runc.v2: failed to adjust OOM score for shim: set shim OOM score: write /proc/617/oom_score_adj: invalid argument\n: exit status 1: unknown"

 

@Justin Thomas this error blocks us from running on EKS. Even simple. echo command fails. 

@Artsiom Zhurbila Thanks for the feedback. 

It means that runner limited by Docker in docker of current k8s node

Does this limitation block you from using runners in a K8s cluster?

2 - it will be great to automatically get "runner-oauth-credentials". now we need manually create separate secret config for each runner, in each env, it's absolutely not scalable

Would the auto-scaling runner solve this problem or do you want to use the same runner secrets to start multiple runners?

thnx a lot for sharing this info! Do you have the ticket we can subscribe or vote +1 for such features ? :) 

We currently don't have a public ticket for the auto-scaling feature because the runner is not yet GA. Once it's publicly available, I will create a ticket and share it with you.

Thank you again for taking the time to provide feedback. This helps us prioritize and build the features.

Like Artsiom Zhurbila likes this

Does this limitation block you from using runners in a K8s cluster?

yes, we are planning have CI workload more than 1 VM (k8s node)

@Artsiom Zhurbila You can run multiple runners on a single K8s node, they have their own namespace and filesystem. Maybe I am not understanding your use case?

example:
1 VM = 32 GB RAM, 8 CPU

but we plan to run multiple jobs in parallel which required more than 32 GB 8 CPU
we need an automatical scale up from 1 VM to several VMs

Like Justin Thomas likes this

@Justin Thomas 

 

Does this limitation block you from using runners in a K8s cluster?

 

This limitation blocks us. Since Kubernetes deprecated docker  all cloud managed kubernetes services migrated containerd or cri-o.

Like Justin Thomas likes this

Suggest an answer

Log in or Sign up to answer
TAGS

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you