Stash and caching

JIRA Autobot November 21, 2013

Hi,

Can somebody explain what these two config options actually mean and how do they work in practice i mean how does the user see this in-use.

plugin.stash-scm-cache.refs.enabled=false
Controls whether ref advertisement operations are cached.
plugin.stash-scm-cache.upload-pack.enabled=true

Controls whether clone operations are cached.

I tried reading https://confluence.atlassian.com/display/STASH/Scaling+Stash+for+Continuous+Integration+performance but it doesn't say much how the user uses this and when and why you should do it.

I mean why would you want to cache a clone ? When will that be a good idea ? In a fast changing world will a clone that is cached not expire on the next commit ?

/donnib

1 answer

1 accepted

3 votes
Answer accepted
Michael Heemskerk
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
November 21, 2013

Hi donnib,

There are really two benefits of caching a clone. The main benefit is for CI servers that perform multiple clones for any given build. Examples are multi-step builds or dependent build plans. In these cases a single push to a repository will trigger multiple clones of the same repository. For instance, in our CI environment, a single change to the repository will trigger our main build which has been split in 6 parallel build steps. When that main build goes green, a number of specialized builds are triggered (performance, multi-database, etc.). All in all that single push can trigger up to 25 clones. Given that cloning a repository is a fairly CPU and memory heavy operation, caching the clone significantly reduces that cost.

The second benefit is that by streaming the clone to a cache, the git command can terminate as soon as the output has been written to disk (vs streamed over the network). Since the git process holds on to a big chunk of memory until the clone has been fully sent to the client, that can reduce the time that memory on the server is being claimed.

Your point about the cache expiring on the next commit is valid, but from the CI-triggered clones typically happen in a fairly short time span (within 1m for parallel builds and say within 30m - 1h for dependent builds) and there is a good chance that the cache won't expire because of a push in that time frame.

You can check whether you're benefitting from the cache or not by sending a REST request to

/rest/scm-cache/latest/caches

The output lists the number of cache hits and misses (overall and per repository).

Hope this helps,

Michael

JIRA Autobot November 21, 2013

@Michael thank you for the quick answer. Why is it that your build does a clone and not just a Fetch.

How does the clone cache actually work ? I mean is the clone cache linked to a commit so you have multiple clone caches for different commits ?

Do you have any comment on the cache-refs ? What are those for ?

Michael Heemskerk
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
November 21, 2013

We have many different build agents that build many different projects. A fetch requires that there is an initial clone of the repository on that agent, which isn't always true. For simplicity the agents always perform a (shallow) clone.

The clone cache inspects the clone request and creates a cache key from the list of refs and other parameters that the git client requests. So, a 'git clone --depth 1 <repo-url>' would result in a different cache-key then 'git clone <repo-url>', etc.

The cache-refs cache the output of the 'ref advertisment' phase of the clone/fetch protocol, in which the server sends a list of refs + (the commit hashes they point to) to the client. This call is typically not expensive, but if you have a build server that's configured to poll for changes frequently it could still be beneficial to cache these refs as well.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events