caches vs artifacts

kvernon-maxdigital August 10, 2018

I feel like the documentation is lacking between these two items. Equally, I'm not really seeing anything that says one or the other is working. I'm hoping that someone can help me understand this better. My goal is to break up our all-in-one step, into multiple steps for clarity.

From this, I have two questions:

1. What's the difference between caches and artifacts?
2. Are they needed in each step?

To elaborate more on the process. I've attempted to break up the build process from two steps (build deploy and test, then monitor test) into this manner:

1. build

2. unit testing

3. deploy

4. integration testing

5. monitor testing

In my pipelines file, I have the following:

pipelines:
branches:
develop:
- step:
caches:
- dotnetcore
name: Build
deployment: test
script:
- export ENVIRONMENT_VALUE=alpha
- apt-get update && apt-get install -y zip
- cd Source
- dotnet restore
- dotnet build --no-restore
artifacts:
- dist/**
- step:
name: Running Unit Tests
script:
- export ENVIRONMENT_VALUE=alpha
- for i in $(find . -name "*.Tests" -type d ); do dotnet test $i; done;
artifacts:
- dist/**
- step:
name: Deploying to Alpha
script:
- export ENVIRONMENT_VALUE=alpha
- cd dist
- cd Source
- cd TheAuthenticationServiceFolder
- dotnet lambda deploy-serverless --template-parameters EnvironmentValue=$ENVIRONMENT_VALUE --configuration $CONFIGURATION --stack-name ${STACK_NAME}-${ENVIRONMENT_VALUE} --template serverless-lambda.template
- step:
name: Running Integration Tests
script:
- export INTEGRATION_FILE=$INTEGRATION_FILE_ALPHA
- for i in $(find . -name "*.IntegrationTests" -type d ); do dotnet test $i; done;
- step:
caches:
- node
name: Running Monitor Tests
script: # Modify the commands below to build your repository.
- export ENVIRONMENT_VALUE=alpha
- npm i -g gulp
- npm i
- gulp test:monitor

I've noticed something things in here I don't understand, which led to the questions above. Really, if I break these up, it becomes slow. Each step does a "Build Setup" where it seems to be linking with the repo to get latest. 

Also, through the flow, it does the following:

1. build

2. unit test

3. fails

Number 3, the deploy, fails because it cannot get into the folder:  TheAuthenticationServiceFolder. Now that TheAuthenticationServiceFolder exists in the project. Because of this, I added the line above, which was CD into dist first. From that statement, it always fails. It cannot find the folder dist.

This is where I'm getting antsy over needing better docs and help with understanding Pipelines.

 

The best I have is this write up: https://confluence.atlassian.com/bitbucket/using-artifacts-in-steps-935389074.html. They state that artifacts are only needed in the step to apply changes with that step, but what about going through to multiple steps.

And to repeat. I really don't understand cache vs artifacts, so what are the differences between caches and artifacts, and are they needed in each step to push the artifacts onto the next step?

 

Thanks,

Kelly

 

3 answers

3 votes
Philip Hodder
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 13, 2018

Hi Kelly,

The main difference between a cache and an artifact is the following:

  • A cache is something that is something you want to store across multiple pipeline runs. For example, installed dependencies. The idea is to download the dependencies in a pipeline run, and then reuse those in other runs afterwards (to reduce pipeline time spent downloading).
    • These should also be used with the expectation that they may not be present (as they expire automatically after a week, and can be manually deleted too).
    • A cache needs to be defined in the step that you want to consume the cache in, this step will also create and upload a new cache if it does not exist.
  • An artifact is something that you want to persist across multiple steps, in a single pipeline run. For example, if you've built and tested a binary in one step, and you want to have deployment steps afterwards. Then you're deploying the same object that you've already tested (rather than recreating it).
    • These are used with the expectation that they will always be present in subsequent steps.
    • Artifacts are also available to download via the UI (but will be deleted 7 days after the pipeline completed).
    • An artifact needs to be defined on a step that has created the artifact. However, this artifact is propagated to all subsequent steps, without requiring any configuration on the other steps.

Some examples

Cache

pipelines:
default:
- step:
name: (Step 1) Build and test
caches:
- maven # A default cache (no definition required). Checks ~/.m2
script:
- mvn install # Download dependencies, build, and run tests.
- step:
name: (Step 2) Unrelated
script:
- echo "Hello world!"

Run one of this pipeline:

  • The first run of this pipeline will have no cache. So step 1 will not download a cache. It will download all of the maven dependencies and then build and run tests. After completing successfully, the step will upload the contents of ~/.m2 as the maven cache.
  • Step 2 has not configured any caches. And so, will not download any caches.

Run two of this pipeline:

  • The second run of this pipeline does have a cache. So step 1 will download the maven cache. Now it doesn't need to download all the dependencies. It may need to download new versions or new libraries. But it (hopefully) won't download the world. As there is already a cache that exists. Step 1 will not upload a cache. Caches are only created when they have been deleted.
  • Step 2 has not configured any caches. And so, will not download any caches.

Artifact

pipelines:
default:
- step:
name: (Step 1) Build and test
artifacts:
- target/**
script:
- mvn install # Download dependencies, build and run tests.
- step:
name: (Step 2) Deploy to production
script:
- ls -Ra target
- ./deploy-to-production

All runs of this pipeline will have the same behaviour. So lets just look at one run.

  • Step 1 will build a binary that we can release. However, we've split our pipeline up into multiple steps. So we use an artifact to transfer this. In this case, I'll transfer my "target" directory to the next step. This will also transfer all files inside of it.
  • Step 2 will have a directory called "target" with all the files from the previous step. If we run "ls -Ra target" we can see the contents of the target directory to confirm it has been copied. Then we can run the deployment script.

I'll answer the other part of your question in a separate response, as this reply is already quite large.

 

Feel free to ask for clarifications to help dilute this wall of text.

I'll pass on this to our main documentation author. As I've seen a few other people have similar confusion distinguishing the two.

Thanks,

Phil

0 votes
ViktorKonsta June 27, 2019

@Philip HodderHi. Thanks for the help. If I am using node.js and node modules with it and I cache node modules folder how it will work if I add another package or update some versions in the next pull request? Will it invalidate the cache and download the new versions?

Or should I use artifacts for things like node modules?

Thanks

Philip Hodder
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
June 27, 2019

Hi @ViktorKonsta,

Yes you should be using caches for this. 

Caches automatically expire every week. NPM can detect when there's a version change (if I recall correctly), so it will download only the new changes, instead of everything. However, the Pipelines caching system will not detect these changes and so you will originally pull in old dependencies until they expire.

You can manually remove caches in the Pipelines list UI.

We have an open feature request to provide a smarter way of invalidating caches that you can follow.

Thanks,

Phil

0 votes
Philip Hodder
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 13, 2018

Another response to address:

Number 3, the deploy, fails because it cannot get into the folder:  TheAuthenticationServiceFolder.

What's the name of the directory that "TheAuthenticationServiceFolder" gets created in? Is it "dist/**", or something else? You should mark the artifact as the directory name you want to copy. For example, if you want to copy the contents of the directory 'my-build', you would instead define it as:

- step:
name: Create build
artifacts:
- my-build/**
script:
- ./write-files-to-my-build-directory
- ls -aR my-build # View contents of my-build directory.
- step:
name: View build
script:
- ls -aR my-build # The directory has been copied to this step with the same name and path.

Hopefully that helps you debug your other problem.

As an aside, your "Build" step should not be marked with "deployment: test", as that indicates a step is deploying to a test environment. You'll see deployment concurrency limitations happen with that enabled (which will get annoying for a step that isn't doing a deployment). If you've seen any steps get marked as "Paused", this will be why. However, your "deploying to alpha" step could be marked as a deployment step, though. (One of: test, staging, or production).

Thanks,

Phil

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events