Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to docker build or run from some self hosted runners

Mark G
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
February 1, 2024

We have several self hosted pipeline runners running on different machines. The pipeline uses the docker service.  On one particular machine the pipeline will always fail at the point of either `docker build` or `docker run`.

```

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't get final child's PID from pipe: EOF: unknown.

```

Comparing output from the "docker" tab in the web interface against working runners shows

```

time="2024-02-01T09:36:18.350487729Z" level=warning msg="cleanup warnings time=\"2024-02-01T09:36:18Z\" level=info msg=\"starting signal loop\" namespace=moby pid=617 runtime=io.containerd.runc.v2\ntime=\"2024-02-01T09:36:18Z\" level=warning msg=\"failed to read init pid file\" error=\"open /run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a917cc6f193488c34285ffa4d0af0a8d30c9e1339145d7513a11a8dbb8f53a84/init.pid: no such file or directory\" runtime=io.containerd.runc.v2\n"

```

For `docker build` the base image will be pulled successfully and then we get the error above for the first `RUN` step.  Other commands like `docker version` are fine, so this is not "docker: Command not found" error while running docker commands in self-hosted Runner.  I also tried stopping the runner container, deleting the `/tmp/<runner id>` directory and restarting but that didn't help.

All runners are using the latest version 1.555.

As a minimal failing example of our pipeline configuration:

```

image: alpine

# Perform some debugging on the different runners
definitions:
  script: &test-script
    - echo $PATH
    - ls -lhd /usr/bin/docker
    - ls -lh /usr/bin/docker
    - which docker
    - docker version
    - docker run --rm -t alpine sh -c 'echo "Hello from container!"'
    - mkdir docker_context
    - |
      echo 'FROM alpine
      # Arbitrary nonsense just to see if a container can be built
      RUN apk add --no-cache vim' > docker_context/Dockerfile
    - docker build -t test-container docker_context


pipelines:
  default:
    - parallel:
        - step:
            runs-on:
              - machine1
            name: ci-test-machine1
            script: *test-script
            services:
              - docker
      - step:
            runs-on:
              - machine2
            name: ci-test-machine2
            script: *test-script
            services:
              - docker
      - step:
            runs-on:
              - machine3
            name: ci-test-machine3
            script: *test-script
            services:
              - docker

```

The failing machine is CentOS Linux 7 with docker 20.10.14 (from system package manager).  On one of the working machines we have openSUSE Leap 15.2 docker version 20.10.9-ce.

Does anyone have any idea of how to get this runner working?

1 answer

1 vote
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
February 2, 2024

Hi Mark and welcome to the community!

I would suggest checking the following issue:

Based on the replies, it seems like there may be different causes for this error. Different users reported resolving it in different ways.

You can try some of the solutions mentioned in the replies to see if any of them resolves the issue on your CentOS machine.

Kind regards,
Theodora

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PERMISSIONS LEVEL
Product Admin
TAGS
AUG Leaders

Atlassian Community Events