How to handle large number of git tags sensibly

Ian Boden January 5, 2017

I'm setting up a system where we have the following workflow:

1) User pushes change in git to gerrit with the JIRA issue number in the commit message

2) User submits change from gerrit

3) Build automatically kicked off from Jenkins if not already running, otherwise it's told to poll (so multiple commits can be collected into a single build if they are submitted whilst another build was in progress)

4) When a build starts it creates a git tag and pushes that into gerrit (tag is then of the form $branch_$buildnumber) currently $buildnumber is just a data and time.

5) Overnight the latest build is picked up for round 1 testing if it passes a tag is added (tag is of the form $branch_$buildnumber_r1)

6) At some point during a day a build that passed round 1 testing is picked up for round 2 testing if it passes a tag is added ($branch_$buildnumber_r2)

7) After a few weeks/months a build is released and is given a tag matching the release number eg v1.0.0.1

So we basically have levels of tags depicting confidence levels in builds.

This all seems to work fine, the problem is that people want an easy way to see what builds a change went into. Looking at an issue it shows what tags the commits are in, the problem is that there are just too many of them. Most of the time people only care about the first build that it goes into or the first one of each level.

A crude solution I've come up with is that when it adds a tag it renames any tags at a lower level to have a prefix that JIRA then filters out (basically archiving the tag). So you could have the following list of tags:

b1_120202003629

b1_120101150059

b1_120101142405

b1_120101123002

b1_created

 

The build overnight build picks up b1_120101150059 and it passes so we have:

b1_120202003629

__b1_120101150059 , b1_120101150059_r1

__b1_120101142405

__b1_120101123002

b1_created

 

Now when looking at an issue it will only show the tags:

b1_120202003629

b1_120101150059_r1

b1_created

 

The problem with this is:

1) Even with cleaning up the tags there will still be a lot of active tags on the old issues (but maybe people care less about seeing the information quickly on an old issue so having a large list isn't too much of an issue )

2) People will have the old tags locally, they can't push them to gerrit (as they don't have permissions for that) so they can't reincarnate a tag, but it still looks messy and could be confusing.

3) Is the huge number of tags going to cause performance problems at any point?

 

So is there some better way that I'm missing?

Currently using plugins:
Git Integration for JIRA
JIRA Gerrit plugin

2 answers

1 vote
crf
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 8, 2017

We use a similar system in house for tracking what CI a git commit has passed; however, we use git notes for this, not tags.  If you aren't familiar with git notes, it is a separate directed acyclic graph maintained in parallel with the main ref heads that are associated with them.  You can commit notes against a particular commit, update them, remove them, query them, etc.  This is a natural way to store information that is associated with the commit without polluting the tags list.  I'm only going to give a very rough outline of what we do, here, but it looks something like this:

 

target_branch="$1"  # What branch are we running CI on
build_id="$2"  # How does somebody find the build in question?
 
# Setting up the GIT_NOTES_REF variable makes these a little easier to work with
export GIT_NOTES_REF="ref/notes/ci/$target_branch"
 
# Notes aren't fetched by default; you have to explicitly ask for them
git fetch origin +"$GIT_NOTES_REF:$GIT_NOTES_REF" || echo 'No builds yet, apparently'
 
# It looks like 'git notes append' would be what you want, but for some silly
# reason they made it insert extra blank lines.  Annoying.
(
    # read the current notes, ignoring the error if there aren't any
    git notes show 2>/dev/null
 
    # Record that this build finished
    echo "$build_id"
) \
| grep . \
| git notes add -f -F - \
|| exit 1
 
# Actually push up the change.  Note that like any other git push, if it is
# not fast-forward, then it will fail.  You should probably have a loop that
# will retry a few times in case other builds are finishing at close to
# the same time.
git push origin "$GIT_NOTES_REF"

 

We use this:

  1. To correlate when a commit has gone through all the required testing to be promoted to the next stage.
  2. To later hunt down how long CI is taking, how long it takes on average for a commit to be promoted, how long it takes to reach production, and other similar metrics.

This has the advantage of not polluting everybody's pulls with a bunch of tags they don't care about.  Since you have to explicitly ask for notes, you only fetch this information when you actually want it.

Ian Boden January 9, 2017

Is it possible to get JIRA to display git notes? I need the information in JIRA for people with no git knowledge (aka managers) to find the answers. 

Adam Wride
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 9, 2017

@Ian Boden - the Git Integration for JIRA add-on displays the content of git notes in the Git Commits issue tab (also - git notes also get indexed and can be used to add a JIRA issue association after the original commit has been added). 

Our documentation for Git Notes is here: 

https://bigbrassband.com/documentation.html#gitctrlvwr_issue_gn

In the example below - "GIT-488 Added note" is the Git Note content. 

issue-git-commits-git-notes.png

 

You may be wondering what the "jira" text (in the highlighted area of the example image) is for. This is the namespace where the note was added since:

"... you can only have one note per commit, Git allows you to have multiple namespaces for your notes. The default namespace is called 'commits', but you can change that."

From information about Git Notes: https://git-scm.com/blog/2010/08/25/notes.html

crf
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 9, 2017

@Ian Boden: As Adam's comment suggests, this is not something that JIRA does out of the box, but just about anything is possible with plugins, and you seem to have already installed one that can help with this.  I would recommend experimenting with it to see if it already shows what you want.  If it doesn't, the author may be open to recommendations on how to improve this.

Either way, attaching extra information like who has approved a commit or what testing it has gone through is exactly the job that "git notes" was invented to do, and I definitely think the solution will scale better than tags if you can find the right tools for working with them.

1 vote
Mark L. Smith
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 6, 2017

Hi Ian,

In a month, how many new tags do you think you add?

 

You say: Most of the time people only care about the first build that it goes into or the first one of each level.

I'm trying to put that in terms of sort order. Does that mean that people only want to see the oldest tags by default (where oldest is defined by looking at the commit each tag points at)?

Ian Boden January 6, 2017

A build plus verification and other stuff takes between 30 minutes and an hour. Due to being a worldwide team it's unusual for it to idle on the main stream apart from Sundays, so we are looking at around 144 levels a week on the main stream, other streams will be lower but some development streams can have a reduced verification so they actually build much more frequently so we are probably looking at 250 levels a week so about 1000 a month, the main stream plus 3 or 4 others then have the overnight run plus then there will be the next level of testing so we would be looking at say 1200 tags a month.

 

I think in most cases people want to answer the questions:

1) What was the first level that this fix went into in branch X

2) Is this fix in a level that has passed test round 1/2 in branch X

3) What was the first release the fix is in

4) What revision did the fix go in in all the current versions

 

All of those questions can be answers but having over 10000 tags showing for an issue once it's been in the code for a year is going to be awful to look at.

 

The first two questions only really gets asked very soon after the change has been submitted as it's people wanting to know if a bug is fixed or is meant to be fixed.

 

The last two are generally asked when a customer hits a bug and we are trying to work out how long it has been broken for, or what code level they need to upgrade to in the hopes of avoiding hitting it again.

 

I'm currently investigating having Jenkins update the issue with a revision for the level then I could filter out all level tags from the view which would greatly reduce things but so far I'm not having any luck with that.

Then if I found another way of answering question 2 it would mean that the number of tags created would only be for delivering versions/revisions etc for customers.

Even if I get it working it would be a little confusing as we would be using the revisions to track builds and tags to track releases which seems backwards.

Mark L. Smith
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 6, 2017

How many tags do you have now? Are you just starting out with this or do you already have 10,000s of tags?

In my experience, while git can physically handle 10,000's or many more of branches and tags, you'll find that many utilities essentially become unusable. 

Ian Boden January 9, 2017

This is a prototype, currently we don't use git. In the past 2 years we have created just over 12000 builds (and the frequency is increasing).

Suggest an answer

Log in or Sign up to answer