Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Dynamic Branch-Specific Pipeline Cache and Cache Size Limits

Cengiz Deniz September 6, 2022

Hi community,

we are currently in the midst of a phase of heavy pipeline performance improvements. During this process we came across several issues regarding caches:

  • Caches are repository global and not branch specific, meaning once you update a cache with say dependencies from a feature branch you may cause bugs and/or issues with production-ready code that uses the same cache(s) and pipeline(s).
  • There is no out of the box update for dependency changes regarding caches. We are aware of this proposed solution regarding automatically refreshing caches upon dependecy changes, however this still is limited by point #1.

There is also this open ticket dating back all the way 2018 with engaged discussions up to last month regarding cache refreshing. 

In the discussion thread of said ticket, a fellow community member proposed a workaround with adding unique hashed endings for checksum-tests regarding files that need caching (in his case yarn lock files) and thereby allowing for individual caching.

Leaning on this solution we have implemented our own approach of using the branches name and some scripting to on-the-fly generate a new bitbucket-pipelines.yaml upon commit that let's us have branch-specific caches for pnpm and node_modules.

#bitbucket pipeline template

definitions:
caches:
pnpm-<branch-name>: $BITBUCKET_CLONE_DIR/.pnpm-store
node-<branch-name>: node_modules
# some-other-nested-node_modules-here

# <branch-name> will be replaced by a hash of the branches name and an internal prefix
During our tests this works as intended and so far we did not face any problems, however we are generating about 0,5 GB of branch-specific caches. Thus, we are now facing the lingering question:
  • Is there a MAXIMUM cache size per repository, if so what is its size?
  • Also is there a way to dynamically clear ALL caches present in a repository by script/pipeline run and not by using the caches popup and pressing the delete-button for each cache.

Input is very welcome, thanks in advance!

Best regards
Deniz

2 answers

1 accepted

0 votes
Answer accepted
Suhas Sundararaju
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 6, 2022

Hi, @Cengiz Deniz Thanks for reaching out to Atlassian Community!

Only caches under 1GB once compressed are saved. we have a feature request https://jira.atlassian.com/browse/BCLOUD-21484 - to Increase the cache limit.

More details about caching can be found at: https://support.atlassian.com/bitbucket-cloud/docs/cache-dependencies/#How-does-caching-work

You can use the combination of list and delete cache API endpoints to list the cache and delete all of them from the pipelines.

documentation:

https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pipelines/#api-repositories-workspace-repo-slug-pipelines-config-caches-get

https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pipelines/#api-repositories-workspace-repo-slug-pipelines-config-caches-cache-uuid-delete

Let me know if this helps.

Regards,

Suhas

Cengiz Deniz September 7, 2022

Hi @Suhas Sundararaju 

regarding cache sizing: We are aware of cache size limits during build teardown and compressing. What we are interested in is wether there is a limit on overall cache size per repository. 

Going from our branch-specific caching we now have setup up with our workaround, we would have about 0.5-0.7 GB of caches (17 in total, ranging from some hundred kB to up to 250 MB) per branch. However there are multiple developers working on various branches and various pipelines using caches are being run (i.e.for testing and before pull requests), so we would quickly have multiples of these 0.5-0.7 GB sized caches per branch. So is there any limit? There is nothing in the documentation (or we simply didn't find it).

As for the API based approach for deleting caches: thanks for that hint, we'll have a look at that and fiddle around a bit :)

Thanks and best regards
Deniz

Suhas Sundararaju
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 7, 2022

Hi @Cengiz Deniz 

There is no restriction on overall cache size per repository. you can create any number of node_modules branch-specific caches. But only caches under 1GB are compressed and saved.

Regards,
Suhas

0 votes
Maximilian Beckenbach April 24, 2023

Hi @Cengiz Deniz ,

may I ask how you did this branch name replacement?!

# <branch-name> will be replaced by a hash of the branches name and an internal prefix

 Thanks!

Max

Cengiz Deniz April 25, 2023

Hi @Maximilian Beckenbach 

we used a simple shell-script that checked for `<branch-name>` in a given file and used `sed` to replace it with an adjusted and hashed value.

Something along the lines of

REPLACEMENTS="s:<branch-name>:$BRANCH_NAME_HASHED:g"
sed "$REPLACEMENTS" "$TEMPLATE" >> "$OUTPUT"

We executed this script within a pre-commit hook. So there was alot of added conditional logic before and after this snippet, but just so you get an idea.

Kind regards
Deniz

Maximilian Beckenbach April 26, 2023

HI @Cengiz Deniz,

That makes sense. Thanks for the reply! I was investigating this topic a little deeper.

We changed to this approach now and it looks like it works very well. It creates a new cache version for the hash for the given files. So PRs and main do not share the cache if there is a difference while several PRs can re-use the cache then. 

dependencies:
key:
files:
- "**/package.json"
- package-lock.json
# Uncomment next line if you want to play with cache settings
# - bitbucket-pipelines.yml
path: node_modules

Because we use "npm ci", it still deleted and reinstalled all the node modules in each step... So we 
combined this with a shell script that runs 'npm ci' only when the node_modules folder was not retrieved from cache or is empty. I assume the bash script could be improved, but it works :-). That script runs as the first one for each step. This way our 'npm ci' usually takes 0s. 
#!/bin/bash

DIR="node_modules"
# init
# look for empty dir
if [ -d "$DIR" ]
then
if [ "$(ls -A $DIR)" ]; then
echo "$DIR exists and is not Empty"
else
npm ci
fi
else
npm ci
fi

Kind regards
Max
Cengiz Deniz April 27, 2023

Hi @Maximilian Beckenbach 

wow, a millisecond-long install (even if it's "only" an install via `npm ci`) sounds really great. Love to see that you went through with it and got usable results from this experiment of ours :)

We actually moved away from this approach (it was only a prototype) as it actually changed the bitbucket-pipelines.yaml file with hashed values for cache names, which then got checked into dev (and would have made their way into master) on merging feature branches and we did not want "weird" branch-specific cache names in our dev and productive pipeline specifications. 

I'd be really interested in checking out your approach in more detail. Granted, only if that's something you is can and would share. This topic is still on my list and performance boosts for pipelines are always a great thing. Maybe let's get in touch privately? Is it okay if I shoot you a message on LinkedIn to get in touch?

Kind regards
Deniz

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
TAGS
AUG Leaders

Atlassian Community Events