I have this bitbucket-pipelines.yml
image: atlassian/default-image
pipelines:
custom:
test-build-docker-image:
- step:
name: Build Docker image
script:
- |
docker build -f - . <<EOF
FROM busybox:1.36.1
CMD echo just a test
EOF
caches:
- docker
services:
- docker
definitions:
services:
docker:
memory: 7168
options:
docker: true #enabling docker daemon
size: 2x
When I run the pipeline the second time, I expect the layers to be cached. However, the pipeline always builds from scratch (incl. pulling the busybox image).
Output of the first run:
Output of the second run:
PS: The editor of this form sucks. I can't paste code without loosing indentation.
Hi Michael!
I ran a few builds using the same bitbucket-pipelines.yml that you shared here and the Docker cache is working as expected.
Looking at the output of the first run, in the Build teardown section, we can see that no cache was uploaded from this build because a Docker cache was already present in the repo.
Did you previously run a different docker build that generated a cache? Or do you have another pipeline on a different branch of the repo that generated a Docker cache?
Can you try clearing the cache (from the Pipelines page of the repo, option Caches at the top right corner) and running two more builds (one to generate the cache and the second to confirm if it's used)?
Kind regards,
Theodora
Yes, I ran a pipeline for a different Dockerfile before. Do I need to use different caches for each Dockerfile? What if the Dockerfile changes? Are new layers not added to the cache?
For example, if I build
FROM busybox:1.36.1
RUN echo "first"
RUN echo "second"
RUN echo "third"
twice, I expect all RUN layers to be cached in the second run. If I change "second" to "foo", I expect "first" to be cached, but not "third", since previous layer changed:
FROM busybox:1.36.1
RUN echo "first"
RUN echo "foo"
RUN echo "third"
After I build the changed Dockerfile (above), I expect the cache to be updated, i.e. if I build the changed Dockerfile another time, all layers should be cached.
Is this the case?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Michael,
If I change "second" to "foo", I expect "first" to be cached, but not "third", since previous layer changed
With the Docker cache definition you are using, this should work as you described. I tried it as well and "first" is using cache, while "foo" and "third" do not.
However, the cache does not get updated by default when there are changes. In order for this to happen, we need to use caching with file-based cache keys:
For Docker, this means that we need to use a custom cache definition since the default one does not offer the path variable that is required.
If the Dockerfile is committed to the repo, you can use a yml file as follows:
image: atlassian/default-image:4
pipelines:
custom:
test-build-docker-image:
- step:
name: Build Docker image
script:
- docker load -i docker-cache/* || echo "No cache"
- docker build .
- mkdir -p docker-cache && docker save $(docker images -aq) -o docker-cache/cache.tar
caches:
- my-docker-cache
services:
- docker
definitions:
services:
docker:
memory: 7168
caches:
my-docker-cache:
key:
files:
- Dockerfile
path: docker-cache
options:
size: 2x
With a yml file like this, the cache will be updated every time there is a change in the Dockerfile that is committed to the repo.
Please keep in mind that in this case, the following will not work:
If I change "second" to "foo", I expect "first" to be cached, but not "third", since previous layer changed
If there is going to be a change in the Dockerfile, the build is not going to use the previous cache at all.
Please feel free to let me know if you have any questions.
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks, that should be more usable. However, it is a lot more hassle to configure. What about build arguments?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Michael,
If you use a build argument and you later change it in your bitbucket-pipelines.yml file, the cache will be used up until the layer where you define the argument.
However, a new cache is not going to be created at the end of the build if you have a file-based cache key with the Dockerfile, because no change will have happened in the Dockerfile. Adding the bitbucket-pipelines.yml file to the file-based cache is also not ideal, as that would generate a new cache for every single change in the bitbucket-pipelines.yml file that may not be related to the docker build.
I cannot think of a way to work around this. I will reach out to the development team to ask if there is a way that I am not aware of, I'll let you know when I have an update.
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I'm pretty sure I replied earlier to your last question regarding build args, I'm not sure what happened and if there was a glitch and my reply wasn't posted. In any case, I am posting my reply again:
If the docker build command uses build args, you can do any of the following so that the cache gets updated when the arguments change:
1. You can put the build args in a file that you commit to the repo and read the contents of that file to construct the docker build command.
The file where you store the arguments can then be added to the cache key files in the yml, along with the Dockerfile. This way, changing this file will generate a new cache.
2. Another option is to put the Docker command in a shell script that you commit to the repo and execute during the build. Then, add the script file in the cache key files in the yml. You could also use a Makefile instead of a shell script.
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Is it correct that I always need to load and save the cache manually (calling docker save/load), as soon as I use a docker cache with key files?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes, this is correct, because you will no longer be using the predefined docker cache.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
If the value of the build argument is passed as pipeline variable, but only contains predefined values, I could use different caches, but I still have to use a key file with the respective build argument values to invalidate the cache on change, right?
For example, I might have a build argument `ENVIRONMENT`, which might be `prod` or `dev`:
image: atlassian/default-image:4
pipelines:
custom:
test-build-docker-image:
- variables:
- name: ENVIRONMENT
- step:
name: Build Docker image
script:
- docker load -i docker-cache-$ENVIRONMENT/* || echo "No cache"
- docker build --build-arg="ENVIRONMENT=$ENVIRONMENT" .
- mkdir -p docker-cache-$ENVIRONMENT && docker save $(docker images -aq) -o docker-cache-$ENVIRONMENT/cache.tar
caches:
- docker-cache-prod
- docker-cache-dev
services:
- docker
definitions:
services:
docker:
memory: 7168
caches:
docker-cache-prod:
key:
files:
- Dockerfile
- build-args-prod
path: docker-cache-prod
docker-cache-dev:
key:
files:
- Dockerfile
- build-args-dev
path: docker-cache-dev
where the file build-args-prod contains
ENVIRONMENT=prod
and build-args-dev contains
ENVIRONMENT=dev
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I think that with this setup you don't need the files build-args-prod and build-args-dev.
The docker build command doesn't read the value for ENVIRONMENT from these files, but from the custom pipeline variable you have defined:
- variables:
- name: ENVIRONMENT
The variable contains predefined values and you have defined two caches. I believe that you can remove these files from the caches' definition and also from your repo. If you want these caches to get updated when you change the Dockerfile, you will need to leave the Dockerfile as a key file.
In case the variable values ever change, you will just need to change the directory name for each cache. If a new environment is added and you want a separate cache for that, you can add a new cache definition.
If the values are predefined, you can also adjust the definition of the pipe's variables to allow only these values:
- variables:
- name: ENVIRONMENT
default: "dev"
allowed-values: # optionally restrict variable values
- "prod"
- "dev"
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
The documentation points out that
any cache which is older than 1 week will be cleared automatically and repopulated during the next build.
Does this apply to caches with file based keys, too?
Is it possible to clear the cache if the base image in Dockerfile changes without a change to the file, i.e. if the image of the given tag is overwritten in the Docker registry (e.g. hub.docker.com). This is sometimes done for images that are based on a Linux distribution, if there is an update of the Linux distribution. Besides, if you'd use
FROM busybox:1.36
it may refer to busybox:1.36.0 on one day and busybox:1.36.1 on another day. Can you trigger a new build if you call
docker pull busybox:1.36
to maybe pull busybox:1.36.1 after
docker load -i docker-cache-$ENVIRONMENT/* || echo "No cache"
which still may contain busybox:1.36.0?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @maiermic,
File-based caches also get deleted a week after they are populated.
About your second question, we have an API endpoint to delete caches by name:
You can use that during a Pipelines build to delete a cache.
The next question would be how do you figure out if you need to delete a cache because the tag is overwritten. You could redirect the output of the following command in a file
docker load -i docker-cache/* || echo "No cache"
You can also add in your yml file the following command after you do the docker build and redirect its outut in a file
docker image ls -aq
You can then parse the two files and see if there are any layers in the output of the second command that don't exist in the output of the first command. In that case, you could delete the cache.
Please keep in mind that if the cache is deleted via an API call in a certain build, no new cache will be uploaded at the end of the build (because the cache was present when the build started). So, a new cache will be generated in a subsequent build.
About your last question, if busybox:1.36 has new layers, the docker build command will pull these new layers and it won't use the cache. But the build won't upload a new cache if there was an existing one at the beginning of the build, since the Dockerfile hasn't changed.
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Tags are not saved by
docker save $(docker images -aq) -o docker-cache/cache.tar
or at least not loaded by
docker load -i docker-cache/*
How can you fix that?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
You can adjust the command as follows and the tags will be saved:
docker save $(docker images -aq) -o docker-cache/cache.tar <repo>:<tag>
where <repo>:<tag> replace with actual values.
Kind regards,
Theodora
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.