Bitbucket Server: How to obtain real commit hashes from an effective diff hash? Edited

When querying the API for commit comments, I sometimes get anchors that have a diffType of "EFFECTIVE" with a toHash that appears to be generated somehow.  Is there a way that I can use this data to get a list of commits used for this "effective" diff?

Here's an example from the "activities" endpoint:

{"fromHash"=>"87fabe1ef09821868e789b5bde5b5cfb20c901fa",
"toHash"=>"da2f8b463cd5b28854958dea27f8f5e71884f445",
"line"=>7,
"lineType"=>"ADDED",
"fileType"=>"TO",
"path"=>"Rakefile",
"diffType"=>"EFFECTIVE",
"orphaned"=>true}

 

1 answer

This widget could not be displayed.

Hi @Synthead

Thanks for reaching out. I'm not 100% of what information you are after. If you are looking for a list commits in the pull request, then the pull request commits endpoint (or the PullRequestService.getCommits endpoint in Java) could give you what you are after.

The EFFECTIVE diff type is the diff that pull requests display by default. To produce this diff, the fromHash is the HEAD of the PullRequest.getToRef. The toHash is calculated by merging the PullRequest.getFromRef and PullRequest.getToRef. In essence the effective diff shows users "how the target branch will change if the PR was merged as is". You can find more information about the diff in this post.

Hope this is the information you were looking for.

Regards

Juan Palacios

This EFFECTIVE diff type does seem to make it very difficult to use the API and tell what commit a comment is on since there doesn't seem to be a way for the caller to relate an effective hash to an existing hash in the repo (or a hash returned by the commits API). Even if the toHash is calculated by merging the fromRef into the toRef, it is a constantly changing value and even if I do merge a pull request through bitbucket UI this hash will change since the commit timestamp will change.

Is there any way through the API to convert an effective toHash to its corresponding commit toHash, or list all commits with their effective hashes? Or alternatively is there any way to list all comments with their anchors expressed as real commit hashes instead of effective hashes?

Juan Palacios Atlassian Team Sunday

Hi @Tyler Mann,

Thanks for your feedback. Allow me to break down your comment to more clearly address your concerns.

This EFFECTIVE diff type does seem to make it very difficult to use the API and tell what commit a comment is on since there doesn't seem to be a way for the caller to relate an effective hash to an existing hash in the repo (or a hash returned by the commits API)

Comments are not anchored at a commit. They are anchored at a diff. The hashes in the anchor tell us which diff the path and line refer to.

Even if the toHash is calculated by merging the fromRef into the toRef, it is a constantly changing value and even if I do merge a pull request through bitbucket UI this hash will change since the commit timestamp will change.

That is correct. However comment threads anchored at an effective diff are processed on every update to the source or the target branch to update their anchor to the new diff. We call this comment drifting. Whenever a comment thread can't be drifted (e.g.: an update has removed the anchor file/line from the diff) it is marked as outdated, it no longer shows up on the diff, but can be seen in the activity stream.

Is there any way through the API to convert an effective toHash to its corresponding commit toHash, or list all commits with their effective hashes? Or alternatively is there any way to list all comments with their anchors expressed as real commit hashes instead of effective hashes?

There are three types of anchors available in the system.

  • EFFECTIVE: These anchors reference the effective diff described in my previous comment. The fromHash is the HEAD of the target branch and the toHash is the calculated merge commit hash. Whenever the pull request is rescoped (i.e.: the source or the target branch are updated) these anchors are processed so that we can update them to reference the new hashes (drifting their path and line if necessary). At any given time when retrieving these anchors you can be sure that the commits used to produce the merge hash are the HEADs of the pull request's source and target branch.
  • COMMIT: These anchors reference the diff between a commit and its first parent. They are used when looking at the diff for a commit either in the Commit page or when selecting a commit in the pull request diff drop down menu. These anchors are never drifted.
  • ITERATIVE: These anchors reference a diff for a commit range. The fromHash is an ancestor of the toHash. They are used in iterative review diffs. These diffs are produced when a reviewer comes back to a pull request they've marked as "Needs work" in the past after changes have been added to it. In this case we display for the reviewer a diff which shows them "what's new" by calculating a diff from the old HEAD of the source branch to the new one. These anchors are never drifted.

Hope this helps clarify how our Comment API works.

Regards

Juan Palacios

Hi Juan,

Thanks so much for the detailed explanation and quick response, this definitely makes sense to me and helps clarify how your API works. Unfortunately I still have the same issues though which is basically needing to know what line a comment is on on a known commit hash/path (this is how bitbucket cloud works as well, it gives you a real commit hash to reference the comment's location). However I think I have a hacky workaround for now which at leasts let me convert the EFFECTIVE comments to the hash of the tip of the branch.

My workaround is to call something like the <pullRequest>/changes API to get the current toHash which seems to be the same used for EFFECTIVE comments that are still visible. Then call the API to get the pull request details and take the FromRef.LatestCommit. Then call the /changes API one more time to make sure that the toHash hasn't changed. If it hasn't then EFFECTIVE comments that have this hash seen from the /changes API can be mapped to the real sha retrieved from the pull request FromRef.LatestCommit.

I am basically looking to see if there is any easier to way to get this mapping or an existing commit and path/line of a comment. Since although the comments are on the diff as you mentioned, you likely also have this kind of mapping internally in order to do the drifting you mention.

Thanks so much for your help and quick response!

Tyler

Juan Palacios Atlassian Team Monday

Hi @Tyler Mann

Glad my comment helped clarify how the API works.

Regarding your concerns, would it be possible for you to provide a little bit of context? I think I'll be able to provide a better solution if I understand what it is you are building.

Cheers

Juan Palacios

Hi @Juan Palacios,

Yes, essentially am building an integration that allows rendering of the diff of a pull request with comments overlaid as well as posting comments back to the pull request through the integration. Essentially can think of it as similar to the "Diff" tab for viewing a pull request on bitbucket server.

The modeling that we use references a comment's location using a commit hash, file path, and line number. If we have those 3 data points then we can tell where a comment was posted and do the "drifting" you mentioned ourselves using other git/commit data since we can inspect the commit at the specific hash. This works for us for all other providers (github, gitlab, and bitbucket cloud) as they all in some way reference a comment in a way that can be directly tied back to a commit hash, file path, and line number. The thing I am having trouble with is that the effective hashes are not something I can directly understand or calculate on my side and relate to any other git data/commits to tell where the comment should be located.

If there was some way to list the comment anchors with a commit hash that is backing the EFFECTIVE diff that exists in the repo that would be useful for me. Like possible a query parameter `?diffType=COMMIT` that would translate these into referencing commit hashes.

Or alternatively just a way list effective diffs that have existed for this pull request with both the effective hash and head hashes that were used to compute them.

If its not possible then that is okay. I do have the workaround I mentioned above which works, but it is just awkward to have to call 3+ APIs to get 1 piece of information and will only work for comments that are not orphaned yet.

Thanks!
Tyler

Juan Palacios Atlassian Team yesterday

Hi @Tyler Mann,

Thanks for providing the extra context. Let me see if I can provide some information to help you out.

The modeling that we use references a comment's location using a commit hash, file path, and line number.

This doesn't seem like it will work for iterative diff comments where the fromHash in the diff can be any ancestor of the toHash.

The thing I am having trouble with is that the effective hashes are not something I can directly understand or calculate on my side and relate to any other git data/commits to tell where the comment should be located.

Technically you should be able to change the refspec configuration in your local repository to fetch the pull request refs which would bring the effective diff objects into your local copy allowing you to work with the hash the same way you would any other commit. To do so you'll need to add the following:

fetch = +refs/pull-requests/*:refs/remotes/origin-pr/*

NOTE: I set the target to origin-pr to avoid overlapping with someone naming a branch with the "pull-requests/" prefix.

Finally, please consider the following:

  • Effective diff merges are produced using the HEAD of the pull request's source and target branches which you should be able to get from the comment itself. In the Java API commont.getThread().getCommentable() (the Commentable is either the CommitDiscussion or the PullRequest and you can use the commentable.accept(CommentableVisitor) to run type specific logic). In the REST API the getComments response has the pullRequest field.
  • In some extraordinary circumstances the system can fail to calculate an effective diff (e.g.: if we are out of disk git is unable to write the new objects). In these scenarios Bitbucket Sever falls back to the common ancestor strategy: it calculates the merge-base between the branches and displays a diff from the merge-base to the HEAD of the source branch. When this happens we still drift the comments so the hashes in a comment anchor may not be from an effective diff
  • Effective diffs can produce conflicts. Bitbucket Server has some pretty intricate logic to deal with all possible conflicts. It means though, that in a content conflict for instance, the diff may include conflict markers.

Hope this helps you get started.

Cheers

Juan Palacios

Hi @Juan Palacios,

The modeling that we use references a comment's location using a commit hash, file path, and line number.

This doesn't seem like it will work for iterative diff comments where the fromHash in the diff can be any ancestor of the toHash.

This technique does actually work well from our use so far. Essentially we are tracking comments similar to the way git blame works. If you have one point of reference of a line/path/hash then you can see the blame where that line was added/edited and use that to drift the comment to other commits regardless of if the fromHash or toHash is changing.

Technically you should be able to change the refspec configuration in your local repository to fetch the pull request refs which would bring the effective diff objects into your local copy allowing you to work with the hash the same way you would any other commit. 

Awesome, thanks for this! I will definitely check it out.

Effective diff merges are produced using the HEAD of the pull request's source and target branches which you should be able to get from the comment itself. In the Java API commont.getThread().getCommentable() (the Commentable is either the CommitDiscussion or the PullRequest and you can use the commentable.accept(CommentableVisitor) to run type specific logic). In the REST API the getComments response has the pullRequest field.

Ah thanks yes this could also work, we are using the REST API. Was detouring from using this getComment API at first because it seems to require a path to be specified which could make it need to be called many times even if there are no comments. Was instead using the getActivities API, but will keep it in mind as another option to play around with.

In some extraordinary circumstances the system can fail to calculate an effective diff (e.g.: if we are out of disk git is unable to write the new objects). In these scenarios Bitbucket Sever falls back to the common ancestor strategy: it calculates the merge-base between the branches and displays a diff from the merge-base to the HEAD of the source branch. When this happens we still drift the comments so the hashes in a comment anchor may not be from an effective diff

Effective diffs can produce conflicts. Bitbucket Server has some pretty intricate logic to deal with all possible conflicts. It means though, that in a content conflict for instance, the diff may include conflict markers.

Some other really great information here, will keep this in mind for testing.

Thanks for all the help here and walking me through all of this. I feel like I have a much better understanding of how things are working now.

Cheers,

Tyler

Juan Palacios Atlassian Team yesterday

Glad I could be of service @Tyler Mann!

I'd point out one more thing:

Was detouring from using this getComment API at first because it seems to require a path

If you use the getComments REST API you can get all comments for a pull request (in pages) and filter them by diff type (e.g.: EFFECTIVE if you don't want to work with COMMIT and ITERATIVE comments).

Good luck!

Juan

Thanks @Juan Palacios

If you use the getComments REST API you can get all comments for a pull request (in pages) and filter them by diff type (e.g.: EFFECTIVE if you don't want to work with COMMIT and ITERATIVE comments).

Yes if I try to call this API without a `?path=` query parameter then I get a validation error saying 'The path query parameter is required when retrieving comments.' which seems a little surprising that is required since it wouldn't appear so from the documentation. If you know of any way around this, that would be great to be able to page through all of the comments on the pull request. But calling this once per every file in the pull request seems somewhat tricky which is why I have been using the /activities API and filtering to COMMENTED activities.

Suggest an answer

Log in or Sign up to answer
Community showcase
Published Aug 21, 2018 in Bitbucket

Branch Management with Bitbucket

As a project manager, I have discovered that different developers want to bring their previous branching method with them when they join the team. Some developers are used to performing individual wo...

1,227 views 8 11
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you