For one of our services, ideally we'd like to run the entire build process in a Docker container (in a multi-stage build) so that our build environment is controlled and shared between development computers and Pipelines. But, part of the build depends on credentials for access to a private package repository.
If we run the build directly in Pipelines, then we can use secured Pipelines environment variables and presume that the credentials are handled securely since that's presumably what that feature is designed for.
However if we run the build using Docker within pipelines (i.e. if the pipelines config includes a "docker build ..." command which runs the application build as part of a multi-stage Docker image build) then there will exist a cached Docker layer which contains the values of the any build args. Now, build args are not supposed to be used for secrets but disturbingly they still appear to be the easiest and most accessible way to pass in arguments to a build and I would guess a lot of people are actually using them this way. If cached images are getting stored only on customer-controlled build machines then this may be an acceptable risk.
What security can a user of Bitbucket Pipelines assume is applied to the actual Pipelines build environments and to caches (Docker or otherwise, but especially Docker) that are created as part of builds?
Stumbled upon this thread when trying to get Docker secrets working in Bitbucket Pipelines. In case anyone else ends up here in 2023 or later, Bitbucket Pipelines supports Docker secrets since 2022.
See: https://bitbucket.org/blog/announcing-support-for-docker-buildkit-in-bitbucket-pipelines
In my case I was using secrets in docker-compose.yml and I was getting the error:
Error response from daemon: invalid mount config for type "bind": bind source path does not exist: <path to file>
407
I enabled BuildKit with:
export DOCKER_BUILDKIT=1
and made sure the path to the file containing the secret was in the current directory.
Hi Jonathan,
Thanks for being diligent with your security practices. And sorry for the delayed response.
I'll go into more details, but the short answer is that your concern won't be an issue, assuming you don't add any secrets as part of your docker build process.
Pulling the Docker Build Image
When pipelines runs your build inside the docker container, the variables are passed into the runtime. Similar to: 'docker run -e MY_VAR=MY_VAL image/name'. (Pipelines is running on Kubernetes, so this isn't exactly how it works, but is a close enough analogy). This doesn't generate any caches containing secrets. We cache image layers that are publicly available for performance benefits, but because we inject the variables in at runtime they are not cached with the image layers.
NOTE: These are not user configured caches. And thus, do not have a 'caches' section in the bitbucket-pipelines.yml.
You can verify the secrets are not in the system caches by removing them from your build, they will not be present in your build container (when running 'env') as we retrieve them directly from Pipelines storage.
Running Docker Build
Building a Docker container will interact with image layers in three ways.
When building a new image, the newly built image layers are built on top of the image specified in your 'FROM' command. The pipelines build container is not included in this context. The only way you'll be to add your secrets to your build is if you explicitly add them to an image layer, such as with an 'ENV' command.
The docker cache (the one defined in the 'caches' section) inside of your build does not contain any information about your pipelines build image. It only includes image layers pulled/built from the Docker CLI running inside of your pipeline build.
If you'd like to verify this behaviour, you can run 'docker run --entrypoint env image/name' on an image you've built, to verify the content of the environment variables.
The above holds for multi-staged builds as well.
Feel free to reply with more questions/clarifications.
Thanks,
Phil
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Philip,
Thanks for the detailed explanation and breakdown of the process! Based on this information I think the build process that my team was thinking about may in fact have an issue. If images built via the Docker CLI while running inside the pipeline build will be included in the user-defined docker caches, then those images must not have secrets passed into 'docker build' via --build-args (since such secrets are effectively env vars visible in the 'docker history' output of those images).
If all intermediate layers and stages are cached then this would result in cached secrets. However, if the Docker build process running within the pipeline is a staged build, and ONLY the final stage is cached, then perhaps this is OK with respect to the user defined docker cache?
Aside from the cache, the other issue would be Docker images with stored secrets simply existing within the storage volumes of the containers where the pipelines builds run. Are the build container volumes encrypted or cleared out after use?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @Jonathan Little,
I received an email notification that you replied here, but I can't see it on this page. Can you try reposting your reply for me to see?
Thanks
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Philip Hodder I can see my response (beginning "Thanks for the detailed explanation") on this page in multiple browsers (even when not logged in) -- are you still not seeing it?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I can see it now. Probably a caching issue on my end.
I can double check with the team, but I'm pretty sure the user-defined Docker cache is naive and uploads all image layers it finds. This would include all stages in a multi-stage build (as it's beneficial from a caching perspective).
The volumes of the containers are ephemeral. They are created when each step begins, so you don't need to worry about the general volumes being persisted after your build step completes.
The only things we persist outside of a single build step are artifacts which we persist across steps in the same pipeline run, and caches which are persisted across pipeline runs.
Can I ask why you need to include secrets inside you Docker image? From my understand, it's better practice to not include secrets inside your image. As the image layers are immutable and can leak the secrets in scenarios unrelated to this case. Instead you could pass in the secrets as an environment variable at container runtime. But, perhaps you have a specific use-case that makes this necessary?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Agreed it is optimal to avoid including secrets in Docker images at any point. We're experimenting with doing some application builds during a staged image build process (rather than at runtime within a build environment image) and this requires using --build-arg for some secrets (e.g. private artifact repository credentials to pull dependencies).
The treatment of build args as build time environment variables (stored in image history) is a downside of this strategy, but one that might be permissible e.g. when running the builds on private/internal CI build agent machines. I'm less comfortable with it on cloud build agents such as with Pipelines, but figured it was worth asking about.
Sounds like you're confirming that such images (including their embedded secrets) will be cached by pipelines. One last question that I'm not clear on yet: can you confirm whether this applies only when you have a 'docker' caches section in your pipelines config? From you initial response it kind of sounds like if you don't have a docker caches section in the config, then only public images that are pulled will be cached, not images that result from a build. If this is the case, then it sounds like the only real concern (i.e. the only place such secrets would be stored) would be the storage underlying the ephemeral volumes used during the build process.
Perhaps this nuance is enough to turn me off from the whole idea but just interested in understanding.
Thanks!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Sorry for the delayed response.
Some other workarounds to not have the secret in your image layers
If you're able, you should perform the dependency downloads outside of your Docker build and COPY them into your build instead. That will prevent this issue entirely.
You are also able to run commands that would clean up the layers of the old-stages as part of your build step. Thus, keeping only the resulting image and it's layers. That should prevent the user-defined Docker cache from caching your secrets if it's in a separate build phase entirely. However, it does build up tribal knowledge that new team members may not be aware of why they are being deleted.
Is the above work arounds do not work, then below is the caching behaviour (as well as some other relevant information)
Caching
Yes, all built images (that are in the build container) are cached in Pipelines with the user-defined caches.
The other caching I mentioned, is an intermediate Docker registry that only stores public images that are pulled.
i.e.
From you initial response it kind of sounds like if you don't have a docker caches section in the config, then only public images that are pulled will be cached, not images that result from a build.
This is correct.
Shared infrastructure
The other thing worth noting, is we do run other customer builds on shared infrastructure. So you do not have your own dedicated machine for running your pipelines.
The team takes security of our build infrastructure extremely importantly. We follow security best practices for running containers. As well as our own learnings. We also have a bug bounty for security reports.
Other Risks
You seem well aware of the risks involved. But, I'll reiterate the other relevant risks for anyone that reads this post in the future:
So, I'll leave the decision to you. :) If you need some guidance on the other ideas I suggested, or you have any other questions, please ask.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Awesome, thanks @Philip Hodder for the additional info and for the risk summary. We are definitely aware that the infrastructure is shared and that is a tradeoff we're considering with respect to what processes we can run in Bitbucket Pipelines.
Your suggestion about cleaning out intermediate layers before the build step finishes is an interesting one that I will look into. It also sounds like simply not using a user-defined Docker cache may be an acceptable workaround in some situations.
Again I really appreciate the detailed responses here and look forward to continuing to use Bitbucket!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You're welcome! :)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.