Missed Team ’24? Catch up on announcements here.

×
Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

browsing deleted commits (ie rebased out of history) by sha

Seth Carter February 2, 2018

bbs v4.8.3

When a commit is rebased out of a branch's history it can still be browsed via the bitbucket web ui using the following url:

bbs-host/projects/[PROJECT]/repos/[REPO]/commits/[SHA]

The question is; How long might those "non-existing" commits continue to be served up? Will those commits exist until gc cleans them out? forever?  I do not see a property specific to this.

Use case is a pull request overview which shows rebase activity and provides links to commits no longer associated with the branch (or any branch).  Can a user view deleted commits n years in the future when reviewing a pull request. 

 

close answers but not quite definitive for this case:

https://jira.atlassian.com/browse/BSERV-4442

https://community.atlassian.com/t5/Bitbucket-questions/Why-Bitbucket-has-gc-pruneexpire-never/qaq-p/71622

2 answers

2 votes
Bryan Turner
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
February 2, 2018

@Seth Carter

Providing a definitive answer for this is somewhere between difficult and impossible. I'll qualify my answer where I can to try and make it clear where relevant edge cases exist.

In normal development there are any number of ways that commits become unreferenced. A couple of the more obvious ones:

  • Branches are deleted without ever being merged to another branch that still exists
  • Branches are rebased, replacing X original commit(s) with Y new ones

It doesn't matter too much how a commit becomes unreferenced; the end result is that it comes available for pruning. But when, or even if, it will actually get pruned, once it's possible to do so, varies.

If the repository in question has any forks, as has been explained elsewhere, then the answer is simple: The unreferenced commits will (currently) never be pruned. A future version of the product will address that--I've invested a significant amount of effort laying groundwork for it--but we're not there yet. (And I'll note that it's not on the short term roadmap, due to other, higher priority work.)

Given the behavior for forked repositories is straightforward, let's focus on the behavior for repositories which haven't been forked. To do that, I'll split my answer in half.

Bitbucket Server 5.1 and newer

In Bitbucket Server 5.1 we shipped changes which fully remove "git gc". It's never called in any repository the system manages. Instead, the system uses plumbing commands like "pack-objects", "pack-refs", "repack" and "prune" directly, as appropriate, to accomplish its goals. This makes GC processing significantly more stable, as well as often being faster.

As part of those changes, the system no longer uses the default pruning interval (2 weeks). Instead, the pruning interval is now based on the configured timeout for a hosting process ("receive-pack" for pushing, "upload-pack" for fetching) or running "repack", whichever is longer, plus one day. This means, by default, Bitbucket Server 5.1+ prunes unreferenced commits after 2 days, down from 14.

Bitbucket Server 5.0 and older

In Bitbucket Server 5.0 and older, including all versions of Stash, the system relies on a combination of "git gc --auto", for repositories which haven't been forked, and manually-triggered (based on the same heuristics as "--auto" uses) "git gc", for repositories which have. The default pruning interval of 2 weeks is used.

But...

Commits aren't quite as unreferenced as they might seem. This is related to the use case you're citing: "Use case is a pull request overview which shows rebase activity and provides links to commits no longer associated with the branch (or any branch).  Can a user view deleted commits n years in the future when reviewing a pull request."

This specific case is something the system has logic explicitly to assist with. When a pull request is opened, and as that pull request is updated, the system maintains reflogs which protect every commit it ever referenced from garbage collection. Commits which are referenced by reflogs are still considered reachable, even if no branch or tag exists which references them, and are not eligible for pruning. That means any commit a pull request has ever involved should always be available. Today, tomorrow, next month, and 5 years in the future.

Except...

In early versions of Stash (pull requests shipped in 1.3.0, October 2012), the configuration applied to protect the reflogs used to guard pull request commits was incorrect. As a result, the reflogs could be pruned over time, removing "old" rows until, sometimes, none remained. This was fixed in Stash 2.1.0 (February 2013), but for some old pull requests the damage had been done. In the repository for Bitbucket Server's own code, which has the oldest pull requests anywhere, the first 700 or so (of over 11,000 now) had their reflogs completely pruned before the bug was discovered and fixed.

The system has been hardened, over the years, such that, if any of those "historical" commits do end up pruned, functionality "gracefully" degrades. On a pull request overview, for example, some outdated comments simply won't show the diff context from when they were added. All of the early Bitbucket Server pull requests that had reflogs pruned can still be viewed (the overview loads, the diff loads, etc.), but many of their outdated comments no longer have context and their old rescope activities no longer show specific commits which were added or removed.

Keep in mind that...

Pruning is part of GC, which runs based on heuristics. Some repositories produce "garbage" faster than others, and may end up performing GC every day. Repositories that accumulate "garbage" more slowly may go weeks, or even months, without collection. In such repositories, unreferenced commits may exist well past the pruning interval. As a result, it's best to consider the prune interval as a lower bound. Unreferenced commits will always be kept at least that long, but may be kept far longer.

 

So, to sum up:

  • If a repository has been forked, unreferenced commits are never pruned until all forks have been deleted
  • If a repository has not been forked
    • If the commit has been referenced by a pull request, it should never be pruned
    • If the commit wasn't referenced by a pull request, it's pruned after 2 (5.1+) or 14 (5.0-) days when GC runs

In general, then, for the use case you've described, the answer should be yes, at any point in the future a user should still be able to view "deleted" commits that were used in a pull request.

 

Wildcards

Everything I've written here is what Bitbucket Server does. But there are two things we simply can't control: Administrators, and add-ons.

While we recommend against it as strongly as possible, administrators can and do manually adjust configuration, and manually run processes. That can result in behavior which is different from what I've outlined above, and can have functional impacts throughout the system. We can't prevent administrators from "accidentally" pruning objects.

Similarly, add-ons are considered trusted code. Administrators signal that trust by installing the add-ons in the first place. Since add-ons can assemble and execute arbitrary Git commands, they can also trigger unexpected behavior as a result. (I'm not pointing the finger at any add-on, to be clear; I'm simply observing that they're something the Bitbucket Server team can't police.)

 

Hope this helps!
Bryan Turner
Atlassian Bitbucket

Julius Davies _bit-booster_com_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 2, 2018

Amazing answer, @Bryan Turner.  Thanks.  I've been curious about this.

Seth Carter February 5, 2018

Thanks Bryan.  That was a comprehensive response. (and on a Friday afternoon too!  What kind of coffee are you drinking? I'll take a double.)

-- and thanks GSD for @ing Bryan, next best thing to knowing ...

0 votes
Julius Davies _bit-booster_com_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 2, 2018

Paging @Bryan Turner !!!   He'll know!   

I know Bitbucket Server's gc behaviour has evolved a bit over the last year, so the answer might be different for your version (4.8.3) compared to current production release (5.7.1).

I suspect you're right and the commits disappear after a few weeks via regular periodic housecleaning, but that's pure conjecture on my part.

I have used this fact to bring back commits to undo rebases/rewrites:  click on the deleted commit, tag it using Bitbucket tag feature through web UI, then do "git fetch --tags" from my local clone, and push the commit back to the branch to undo the rebase.

(By tagging the commit I make it visible to my own "git fetch" from my local clone.)

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events