Why Bitbucket has gc.pruneexpire = never?

Alexey_Efimov
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 1, 2016

Our repositories after 3 years have a lot of dangling commits/blobs. Look like gc never collect them.

So, why repository have such settings in config?

[gc]
	auto = 0
	pruneexpire = never

1 answer

1 accepted

5 votes
Answer accepted
Bryan Turner
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
November 1, 2016

Alexey,

The repository in question has been forked. Auto GC and pruning are only disabled for repositories which have been forked, and they're disabled because forks use alternates. Repositories which have not been forked will not have either setting (you can create a new repository and easily verify this is true).

Using alternates and disabling pruning is a space tradeoff: You either have all of your forks using the full repository disk size (not on creation, since Git will use hard links, but the first time GC is done in any fork it will start using the full disk size because those links will be broken), or you have the forked repository carrying some "unreachable" objects because those objects might be used by a fork via an alternate. git gc in the forked repository doesn't know which objects its forks are referencing; it only knows what objects that repository references. So if pruning was enabled, it might prune objects referenced by forks. When that happens the fork becomes corrupted and can no longer be used, which could result in losing work.

As an example of this tradeoff in action, in the repository with Bitbucket Server's own source code we're carting around ~25MB in "unreachable" objects, compared to 270MB for everything the repository considers reachable. That repository has 77 forks. That means we're paying 25 megabytes to save over 20 gigabytes (270MB * 77 forks = 20,790MB). The entire hierarchy, without alternates, would require over 21GB of disk (for the common objects, not considering objects unique to any forks), compared to 300MB when using alternates. That's a massive disk space win.

We have an internal issue tracking implementing "hierarchical" garbage collection, which will ultimately allow pruning objects that aren't used by each repository by essentially "moving" those "unused" objects into the forks that actually use them. Any object no longer used anywhere in the fork hierarchy will be fully pruned. Getting that implemented and sufficiently tested to be confident in shipping it is slow work, and is ongoing.

Until then, your option is to find all the forks for that repository (The projects/KEY/repos/slug/related REST endpoint can help with this) and delete them. When the last fork is deleted, Bitbucket Server will automatically reenable automatic GC and pruning for the repository and subsequent GC will run like normal, eventually pruning your unreachable objects (which are then known to be unreachable because there aren't any forks that could be borrowing them anymore).

Best regards,
Bryan Turner
Atlassian Bitbucket 

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events