Docker Cache not working

Michael Maier
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
August 1, 2023

I have this bitbucket-pipelines.yml

image: atlassian/default-image

pipelines:
custom:
test-build-docker-image:
- step:
name: Build Docker image
script:
- |
docker build -f - . <<EOF
FROM busybox:1.36.1
CMD echo just a test
EOF
caches:
- docker
services:
- docker

definitions:
services:
docker:
memory: 7168

options:
docker: true #enabling docker daemon
size: 2x

When I run the pipeline the second time, I expect the layers to be cached. However, the pipeline always builds from scratch (incl. pulling the busybox image).

Output of the first run:

Screenshot_run-1.png

Output of the second run:

Screenshot_run-2.png

PS: The editor of this form sucks. I can't paste code without loosing indentation. 

1 answer

0 votes
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 2, 2023

Hi Michael!

I ran a few builds using the same bitbucket-pipelines.yml that you shared here and the Docker cache is working as expected.

Looking at the output of the first run, in the Build teardown section, we can see that no cache was uploaded from this build because a Docker cache was already present in the repo.

Did you previously run a different docker build that generated a cache? Or do you have another pipeline on a different branch of the repo that generated a Docker cache?

Can you try clearing the cache (from the Pipelines page of the repo, option Caches at the top right corner) and running two more builds (one to generate the cache and the second to confirm if it's used)?

Kind regards,
Theodora

Michael Maier
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
August 3, 2023

Yes, I ran a pipeline for a different Dockerfile before. Do I need to use different caches for each Dockerfile? What if the Dockerfile changes? Are new layers not added to the cache?

For example, if I build

FROM busybox:1.36.1
RUN echo "first"
RUN echo "second"
RUN echo "third"

twice, I expect all RUN layers to be cached in the second run. If I change "second" to "foo", I expect "first" to be cached, but not "third", since previous layer changed:

FROM busybox:1.36.1
RUN echo "first"
RUN echo "foo"
RUN echo "third"

After I build the changed Dockerfile (above), I expect the cache to be updated, i.e. if I build the changed Dockerfile another time, all layers should be cached.

Is this the case?

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 4, 2023

Hi Michael,

If I change "second" to "foo", I expect "first" to be cached, but not "third", since previous layer changed

With the Docker cache definition you are using, this should work as you described. I tried it as well and "first" is using cache, while "foo" and "third" do not.

However, the cache does not get updated by default when there are changes. In order for this to happen, we need to use caching with file-based cache keys:

For Docker, this means that we need to use a custom cache definition since the default one does not offer the path variable that is required.

If the Dockerfile is committed to the repo, you can use a yml file as follows:

image: atlassian/default-image:4

pipelines:
custom:
test-build-docker-image:
- step:
name: Build Docker image
script:
- docker load -i docker-cache/* || echo "No cache"
- docker build .
- mkdir -p docker-cache && docker save $(docker images -aq) -o docker-cache/cache.tar
caches:
- my-docker-cache
services:
- docker

definitions:
services:
docker:
memory: 7168
caches:
my-docker-cache:
key:
files:
- Dockerfile
path: docker-cache

options:
size: 2x

With a yml file like this, the cache will be updated every time there is a change in the Dockerfile that is committed to the repo.

Please keep in mind that in this case, the following will not work:

If I change "second" to "foo", I expect "first" to be cached, but not "third", since previous layer changed

If there is going to be a change in the Dockerfile, the build is not going to use the previous cache at all.

Please feel free to let me know if you have any questions.

Kind regards,
Theodora

maiermic
Contributor
August 4, 2023

Thanks, that should be more usable. However, it is a lot more hassle to configure. What about build arguments?

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 7, 2023

Hi Michael,

If you use a build argument and you later change it in your bitbucket-pipelines.yml file, the cache will be used up until the layer where you define the argument.

However, a new cache is not going to be created at the end of the build if you have a file-based cache key with the Dockerfile, because no change will have happened in the Dockerfile. Adding the bitbucket-pipelines.yml file to the file-based cache is also not ideal, as that would generate a new cache for every single change in the bitbucket-pipelines.yml file that may not be related to the docker build.

I cannot think of a way to work around this. I will reach out to the development team to ask if there is a way that I am not aware of, I'll let you know when I have an update.

Kind regards,
Theodora

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 26, 2023

Hi,

I'm pretty sure I replied earlier to your last question regarding build args, I'm not sure what happened and if there was a glitch and my reply wasn't posted. In any case, I am posting my reply again:

If the docker build command uses build args, you can do any of the following so that the cache gets updated when the arguments change:

1. You can put the build args in a file that you commit to the repo and read the contents of that file to construct the docker build command.

The file where you store the arguments can then be added to the cache key files in the yml, along with the Dockerfile. This way, changing this file will generate a new cache.

2. Another option is to put the Docker command in a shell script that you commit to the repo and execute during the build. Then, add the script file in the cache key files in the yml. You could also use a Makefile instead of a shell script.

Kind regards,
Theodora

Like maiermic likes this
maiermic
Contributor
September 26, 2023

Is it correct that I always need to load and save the cache manually (calling docker save/load), as soon as I use a docker cache with key files?

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 26, 2023

Yes, this is correct, because you will no longer be using the predefined docker cache.

maiermic
Contributor
September 26, 2023

If the value of the build argument is passed as pipeline variable, but only contains predefined values, I could use different caches, but I still have to use a key file with the respective build argument values to invalidate the cache on change, right?

For example, I might have a build argument `ENVIRONMENT`, which might be `prod` or `dev`:

image: atlassian/default-image:4

pipelines:
custom:
test-build-docker-image:
- variables:
- name: ENVIRONMENT
- step:
name: Build Docker image
script:
- docker load -i docker-cache-$ENVIRONMENT/* || echo "No cache"
- docker build --build-arg="ENVIRONMENT=$ENVIRONMENT" .
- mkdir -p docker-cache-$ENVIRONMENT && docker save $(docker images -aq) -o docker-cache-$ENVIRONMENT/cache.tar
caches:
- docker-cache-prod
- docker-cache-dev
services:
- docker

definitions:
services:
docker:
memory: 7168
caches:
docker-cache-prod:
key:
files:
- Dockerfile
- build-args-prod
path: docker-cache-prod
docker-cache-dev:
key:
files:
- Dockerfile
- build-args-dev
path: docker-cache-dev

where the file build-args-prod contains

ENVIRONMENT=prod

and build-args-dev contains

ENVIRONMENT=dev
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 27, 2023

Hi,

I think that with this setup you don't need the files build-args-prod and build-args-dev.

The docker build command doesn't read the value for ENVIRONMENT from these files, but from the custom pipeline variable you have defined:

- variables:
  - name: ENVIRONMENT

The variable contains predefined values and you have defined two caches. I believe that you can remove these files from the caches' definition and also from your repo. If you want these caches to get updated when you change the Dockerfile, you will need to leave the Dockerfile as a key file.

In case the variable values ever change, you will just need to change the directory name for each cache. If a new environment is added and you want a separate cache for that, you can add a new cache definition.

If the values are predefined, you can also adjust the definition of the pipe's variables to allow only these values:

- variables:
- name: ENVIRONMENT
default: "dev"
allowed-values: # optionally restrict variable values
- "prod"
- "dev"

Kind regards,
Theodora

Like maiermic likes this
maiermic
Contributor
January 15, 2024

The documentation points out that

any cache which is older than 1 week will be cleared automatically and repopulated during the next build.

Does this apply to caches with file based keys, too?

Is it possible to clear the cache if the base image in Dockerfile changes without a change to the file, i.e. if the image of the given tag is overwritten in the Docker registry (e.g. hub.docker.com). This is sometimes done for images that are based on a Linux distribution, if there is an update of the Linux distribution. Besides, if you'd use

FROM busybox:1.36

it may refer to busybox:1.36.0 on one day and busybox:1.36.1 on another day. Can you trigger a new build if you call

docker pull busybox:1.36

to maybe pull busybox:1.36.1 after

docker load -i docker-cache-$ENVIRONMENT/* || echo "No cache"

which still may contain busybox:1.36.0?

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 16, 2024

Hi @maiermic,

File-based caches also get deleted a week after they are populated.

About your second question, we have an API endpoint to delete caches by name:

You can use that during a Pipelines build to delete a cache.

The next question would be how do you figure out if you need to delete a cache because the tag is overwritten. You could redirect the output of the following command in a file

docker load -i docker-cache/* || echo "No cache"

You can also add in your yml file the following command after you do the docker build and redirect its outut in a file

docker image ls -aq

You can then parse the two files and see if there are any layers in the output of the second command that don't exist in the output of the first command. In that case, you could delete the cache.

Please keep in mind that if the cache is deleted via an API call in a certain build, no new cache will be uploaded at the end of the build (because the cache was present when the build started). So, a new cache will be generated in a subsequent build.

About your last question, if busybox:1.36 has new layers, the docker build command will pull these new layers and it won't use the cache. But the build won't upload a new cache if there was an existing one at the beginning of the build, since the Dockerfile hasn't changed.

Kind regards,
Theodora

Like maiermic likes this
maiermic
Contributor
January 17, 2024

Tags are not saved by

docker save $(docker images -aq) -o docker-cache/cache.tar

or at least not loaded by

docker load -i docker-cache/*

How can you fix that?

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 19, 2024

Hi,

You can adjust the command as follows and the tags will be saved:

docker save $(docker images -aq) -o docker-cache/cache.tar <repo>:<tag>

where <repo>:<tag> replace with actual values.

Kind regards,
Theodora

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PERMISSIONS LEVEL
Product Admin
TAGS
AUG Leaders

Atlassian Community Events