Community
Q&A
Bitbucket
Questions
Dynamic Branch-Specific Pipeline Cache and Cache Size Limits

Dynamic Branch-Specific Pipeline Cache and Cache Size Limits

Hi community,

we are currently in the midst of a phase of heavy pipeline performance improvements. During this process we came across several issues regarding caches:

Caches are repository global and not branch specific, meaning once you update a cache with say dependencies from a feature branch you may cause bugs and/or issues with production-ready code that uses the same cache(s) and pipeline(s).
There is no out of the box update for dependency changes regarding caches. We are aware of this proposed solution regarding automatically refreshing caches upon dependecy changes, however this still is limited by point #1.

There is also this open ticket dating back all the way 2018 with engaged discussions up to last month regarding cache refreshing.

In the discussion thread of said ticket, a fellow community member proposed a workaround with adding unique hashed endings for checksum-tests regarding files that need caching (in his case yarn lock files) and thereby allowing for individual caching.

Leaning on this solution we have implemented our own approach of using the branches name and some scripting to on-the-fly generate a new bitbucket-pipelines.yaml upon commit that let's us have branch-specific caches for pnpm and node_modules.

#bitbucket pipeline template

definitions:
   caches:
      pnpm-<branch-name>: $BITBUCKET_CLONE_DIR/.pnpm-store
      node-<branch-name>: node_modules
      # some-other-nested-node_modules-here

# <branch-name> will be replaced by a hash of the branches name and an internal prefix

During our tests this works as intended and so far we did not face any problems, however we are generating about 0,5 GB of branch-specific caches. Thus, we are now facing the lingering question:

Is there a MAXIMUM cache size per repository, if so what is its size?
Also is there a way to dynamically clear ALL caches present in a repository by script/pipeline run and not by using the caches popup and pressing the delete-button for each cache.

Input is very welcome, thanks in advance!

Best regards
Deniz

2 answers

1 accepted

0 votes

Answer accepted

Hi, @Cengiz Deniz Thanks for reaching out to Atlassian Community!

Only caches under 1GB once compressed are saved. we have a feature request https://jira.atlassian.com/browse/BCLOUD-21484 - to Increase the cache limit.

More details about caching can be found at: https://support.atlassian.com/bitbucket-cloud/docs/cache-dependencies/#How-does-caching-work

You can use the combination of list and delete cache API endpoints to list the cache and delete all of them from the pipelines.

documentation:

https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pipelines/#api-repositories-workspace-repo-slug-pipelines-config-caches-get

https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pipelines/#api-repositories-workspace-repo-slug-pipelines-config-caches-cache-uuid-delete

Let me know if this helps.

Regards,

Suhas

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Suhas Sundararaju

regarding cache sizing: We are aware of cache size limits during build teardown and compressing. What we are interested in is wether there is a limit on overall cache size per repository.

Going from our branch-specific caching we now have setup up with our workaround, we would have about 0.5-0.7 GB of caches (17 in total, ranging from some hundred kB to up to 250 MB) per branch. However there are multiple developers working on various branches and various pipelines using caches are being run (i.e.for testing and before pull requests), so we would quickly have multiples of these 0.5-0.7 GB sized caches per branch. So is there any limit? There is nothing in the documentation (or we simply didn't find it).

As for the API based approach for deleting caches: thanks for that hint, we'll have a look at that and fiddle around a bit :)

Thanks and best regards
Deniz

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Cengiz Deniz

There is no restriction on overall cache size per repository. you can create any number of node_modules branch-specific caches. But only caches under 1GB are compressed and saved.

Regards,
Suhas

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi @Cengiz Deniz ,

may I ask how you did this branch name replacement?!

# <branch-name> will be replaced by a hash of the branches name and an internal prefix

Thanks!

Max

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Maximilian Beckenbach

we used a simple shell-script that checked for `<branch-name>` in a given file and used `sed` to replace it with an adjusted and hashed value.

Something along the lines of

REPLACEMENTS="s:<branch-name>:$BRANCH_NAME_HASHED:g"
sed "$REPLACEMENTS" "$TEMPLATE" >> "$OUTPUT"

We executed this script within a pre-commit hook. So there was alot of added conditional logic before and after this snippet, but just so you get an idea.

Kind regards
Deniz

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

HI @Cengiz Deniz,

That makes sense. Thanks for the reply! I was investigating this topic a little deeper.

We changed to this approach now and it looks like it works very well. It creates a new cache version for the hash for the given files. So PRs and main do not share the cache if there is a difference while several PRs can re-use the cache then.

dependencies:
 key:
   files:
      - "**/package.json"
      - package-lock.json
      # Uncomment next line if you want to play with cache settings
      # - bitbucket-pipelines.yml
   path: node_modules

Because we use "npm ci", it still deleted and reinstalled all the node modules in each step... So we combined this with a shell script that runs 'npm ci' only when the node_modules folder was not retrieved from cache or is empty. I assume the bash script could be improved, but it works :-). That script runs as the first one for each step. This way our 'npm ci' usually takes 0s.

#!/bin/bash

DIR="node_modules"
# init
# look for empty dir
if [ -d "$DIR" ]
then
 if [ "$(ls -A $DIR)" ]; then
 echo "$DIR exists and is not Empty"
 else
 npm ci
fi
else
 npm ci
fi

Kind regards

Max

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Maximilian Beckenbach

wow, a millisecond-long install (even if it's "only" an install via `npm ci`) sounds really great. Love to see that you went through with it and got usable results from this experiment of ours :)

We actually moved away from this approach (it was only a prototype) as it actually changed the bitbucket-pipelines.yaml file with hashed values for cache names, which then got checked into dev (and would have made their way into master) on merging feature branches and we did not want "weird" branch-specific cache names in our dev and productive pipeline specifications.

I'd be really interested in checking out your approach in more detail. Granted, only if that's something you is can and would share. This topic is still on my list and performance boosts for pipelines are always a great thing. Maybe let's get in touch privately? Is it okay if I shoot you a message on LinkedIn to get in touch?

Kind regards
Deniz

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Suggest an answer

Was this helpful?

Thanks!

Bitbucket

DEPLOYMENT TYPE

CLOUD

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Dynamic Branch-Specific Pipeline Cache and Cache Size Limits

2 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

DEPLOYMENT TYPE

TAGS

Atlassian Community Events