Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Pipelines hangs when using docker USER instruction

parogers
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
October 4, 2023

Yesterday, pipelines started failing for one of our repositories. Every run got stuck in the "Build setup" phase and eventually the pipeline would fail because the time limit was exceeded. This happened even for past commits that previously had passed successfully.

Inspecting the pipeline run log, it looks like it actually completes the build setup phase and is about to move onto the next step. (ie it successfully pulls down the docker image, checks out the source code, configures the environment, etc)

After lots of investigating I discovered the problem seems to be connected to the docker "USER" instruction.

Our repo uses a custom docker image based on node:

FROM node:18.16.0
# etc....
USER node

If I comment out the "USER" line the pipeline runs and passes. If I leave it in the pipelines get stuck at "Build setup" and eventually time out.

Posting this in case another team has run into the same problem. Anybody have some insight into this?

1 answer

0 votes
Patrik S
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 5, 2023

Hello @parogers !

and thanks for reaching out to the Atlassian Community!

You should indeed be able to run the pipeline with different users as long as :

1. the image contains this user (or it was created during the docker build)

2. this user has a home directory inside the image

for the node user in the node:18.16.0 image, I confirmed both of the requirements are met. I tried reproducing the error on my own pipeline, but the build was completed successfully, so I wonder if you might not be building the custom image with a different architecture other than AMD64, which could cause the pipeline to silently fail after the build setup.

Following are the steps I used to run a pipeline successfully with the user node :

  1. Create a test Dockerfile based on node:18.16.0 changing the default user to node : 
    FROM node:18.16.0

    USER node

    RUN echo "this is a test"
  2. Build this Dockerfile using docker buildx to make sure it's using the amd64 architecture, and push it to DockerHub
    docker buildx build --platform linux/amd64 --push -t mydockerhubrepo/testnodeuser .
  3. Use the custom image in the pipeline's yml file 
    image: mydockerhubrepo/testnodeuser

    pipelines:
      default:                
            - step:           
                name: Test step
                script:
                  - id

By following those steps the correct node user was printed by the id command, and the pipeline was completed without errors.

Would it be possible for you to rebuild your custom image based on the instructions above and let us know if the error persists? 

If you have any questions, feel free to ask!

Thank you, @parogers !

Patrik S

parogers
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
October 6, 2023

Thanks Patrik, I think I've been able to narrow down the problem further.

Here's my Dockerfile:

FROM node:18.16.0

USER node

RUN echo "testing"

My docker build command: (tag removed)

docker buildx build --platform linux/amd64 --tag=XYZ -f docker/pipelines/Dockerfile .

Here's my yml file: (image name removed)

image:
name: XYZ:latest
username: $AZURECR_USER
password: $AZURECR_PASSWORD

options:
max-time: 20
pipelines:
default:
- step:
name: "Build and test"
caches:
- node
script:
- id

With the above setup the pipeline hangs at the "Build setup" phase and eventually times out.

Now if I remove the "caches: node" config from the yml file the pipeline runs fine and passes. If I leave in the caches config but remove "USER node" it also runs fine.

It looks like we don't need the node caching anyways, so I'm removing it from the project. But maybe the problem is an interaction between caching and non-root users?

Patrik S
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 10, 2023

Hello @parogers ,

Thanks for sharing additional context.

Unfortunately, I was not able to reproduce the issue using the same Dockerfile and YML file you have shared, but as you mentioned, the cache is indeed extracted during the Build Setup with the root user and some conflict might be causing this, but this usually just affects permissions to files.

Since I'm not able to reproduce on my end, I would need to access your build in order to investigate if that is the case. I understand you found a workaround by disabling the cache for this particular build, but if you would like to proceed with the investigation of what is causing this issue, please let me know, so I can open an internal ticket for you.

Thank you, @parogers !

Patrik S

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PERMISSIONS LEVEL
Site Admin
TAGS
AUG Leaders

Atlassian Community Events