Why Bitbucket has gc.pruneexpire = never?

Our repositories after 3 years have a lot of dangling commits/blobs. Look like gc never collect them.

So, why repository have such settings in config?

[gc]
	auto = 0
	pruneexpire = never

1 answer

1 accepted

5 votes
Bryan Turner Atlassian Team Nov 01, 2016

Alexey,

The repository in question has been forked. Auto GC and pruning are only disabled for repositories which have been forked, and they're disabled because forks use alternates. Repositories which have not been forked will not have either setting (you can create a new repository and easily verify this is true).

Using alternates and disabling pruning is a space tradeoff: You either have all of your forks using the full repository disk size (not on creation, since Git will use hard links, but the first time GC is done in any fork it will start using the full disk size because those links will be broken), or you have the forked repository carrying some "unreachable" objects because those objects might be used by a fork via an alternate. git gc in the forked repository doesn't know which objects its forks are referencing; it only knows what objects that repository references. So if pruning was enabled, it might prune objects referenced by forks. When that happens the fork becomes corrupted and can no longer be used, which could result in losing work.

As an example of this tradeoff in action, in the repository with Bitbucket Server's own source code we're carting around ~25MB in "unreachable" objects, compared to 270MB for everything the repository considers reachable. That repository has 77 forks. That means we're paying 25 megabytes to save over 20 gigabytes (270MB * 77 forks = 20,790MB). The entire hierarchy, without alternates, would require over 21GB of disk (for the common objects, not considering objects unique to any forks), compared to 300MB when using alternates. That's a massive disk space win.

We have an internal issue tracking implementing "hierarchical" garbage collection, which will ultimately allow pruning objects that aren't used by each repository by essentially "moving" those "unused" objects into the forks that actually use them. Any object no longer used anywhere in the fork hierarchy will be fully pruned. Getting that implemented and sufficiently tested to be confident in shipping it is slow work, and is ongoing.

Until then, your option is to find all the forks for that repository (The projects/KEY/repos/slug/related REST endpoint can help with this) and delete them. When the last fork is deleted, Bitbucket Server will automatically reenable automatic GC and pruning for the repository and subsequent GC will run like normal, eventually pruning your unreachable objects (which are then known to be unreachable because there aren't any forks that could be borrowing them anymore).

Best regards,
Bryan Turner
Atlassian Bitbucket 

Suggest an answer

Log in or Sign up to answer
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Posted Jun 12, 2018 in Bitbucket

Do you use any Atlassian products for your personal projects?

After spinning my wheels trying to get organized enough to write a book for National Novel Writing Month (NaNoWriMo) I took my affinity for Atlassian products from my work life and decided to tr...

30,107 views 26 12
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you