Hi community,
we are currently in the midst of a phase of heavy pipeline performance improvements. During this process we came across several issues regarding caches:
There is also this open ticket dating back all the way 2018 with engaged discussions up to last month regarding cache refreshing.
In the discussion thread of said ticket, a fellow community member proposed a workaround with adding unique hashed endings for checksum-tests regarding files that need caching (in his case yarn lock files) and thereby allowing for individual caching.
Leaning on this solution we have implemented our own approach of using the branches name and some scripting to on-the-fly generate a new bitbucket-pipelines.yaml upon commit that let's us have branch-specific caches for pnpm and node_modules.
#bitbucket pipeline template
definitions:
caches:
pnpm-<branch-name>: $BITBUCKET_CLONE_DIR/.pnpm-store
node-<branch-name>: node_modules
# some-other-nested-node_modules-here
# <branch-name> will be replaced by a hash of the branches name and an internal prefix
Input is very welcome, thanks in advance!
Best regards
Deniz
Hi, @Cengiz Deniz Thanks for reaching out to Atlassian Community!
Only caches under 1GB once compressed are saved. we have a feature request https://jira.atlassian.com/browse/BCLOUD-21484 - to Increase the cache limit.
More details about caching can be found at: https://support.atlassian.com/bitbucket-cloud/docs/cache-dependencies/#How-does-caching-work
You can use the combination of list and delete cache API endpoints to list the cache and delete all of them from the pipelines.
documentation:
https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pipelines/#api-repositories-workspace-repo-slug-pipelines-config-caches-get
https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pipelines/#api-repositories-workspace-repo-slug-pipelines-config-caches-cache-uuid-delete
Let me know if this helps.
Regards,
Suhas
regarding cache sizing: We are aware of cache size limits during build teardown and compressing. What we are interested in is wether there is a limit on overall cache size per repository.
Going from our branch-specific caching we now have setup up with our workaround, we would have about 0.5-0.7 GB of caches (17 in total, ranging from some hundred kB to up to 250 MB) per branch. However there are multiple developers working on various branches and various pipelines using caches are being run (i.e.for testing and before pull requests), so we would quickly have multiples of these 0.5-0.7 GB sized caches per branch. So is there any limit? There is nothing in the documentation (or we simply didn't find it).
As for the API based approach for deleting caches: thanks for that hint, we'll have a look at that and fiddle around a bit :)
Thanks and best regards
Deniz
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
There is no restriction on overall cache size per repository. you can create any number of node_modules branch-specific caches. But only caches under 1GB are compressed and saved.
Regards,
Suhas
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @Cengiz Deniz ,
may I ask how you did this branch name replacement?!
# <branch-name> will be replaced by a hash of the branches name and an internal prefix
Thanks!
Max
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
we used a simple shell-script that checked for `<branch-name>` in a given file and used `sed` to replace it with an adjusted and hashed value.
Something along the lines of
REPLACEMENTS="s:<branch-name>:$BRANCH_NAME_HASHED:g"
sed "$REPLACEMENTS" "$TEMPLATE" >> "$OUTPUT"
We executed this script within a pre-commit hook. So there was alot of added conditional logic before and after this snippet, but just so you get an idea.
Kind regards
Deniz
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
HI @Cengiz Deniz,
That makes sense. Thanks for the reply! I was investigating this topic a little deeper.
We changed to this approach now and it looks like it works very well. It creates a new cache version for the hash for the given files. So PRs and main do not share the cache if there is a difference while several PRs can re-use the cache then.
dependencies:
key:
files:
- "**/package.json"
- package-lock.json
# Uncomment next line if you want to play with cache settings
# - bitbucket-pipelines.yml
path: node_modules
#!/bin/bash
DIR="node_modules"
# init
# look for empty dir
if [ -d "$DIR" ]
then
if [ "$(ls -A $DIR)" ]; then
echo "$DIR exists and is not Empty"
else
npm ci
fi
else
npm ci
fi
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
wow, a millisecond-long install (even if it's "only" an install via `npm ci`) sounds really great. Love to see that you went through with it and got usable results from this experiment of ours :)
We actually moved away from this approach (it was only a prototype) as it actually changed the bitbucket-pipelines.yaml file with hashed values for cache names, which then got checked into dev (and would have made their way into master) on merging feature branches and we did not want "weird" branch-specific cache names in our dev and productive pipeline specifications.
I'd be really interested in checking out your approach in more detail. Granted, only if that's something you is can and would share. This topic is still on my list and performance boosts for pipelines are always a great thing. Maybe let's get in touch privately? Is it okay if I shoot you a message on LinkedIn to get in touch?
Kind regards
Deniz
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.