Delete large files from git history without breaking Commit and PR links in Jira tickets

Piyush Nariani September 9, 2021

Scenario: I want to delete some large files from git history [specific commit #]

The scenario is, those files were added accidentally in a feature branch and later, I deleted them and commit again in the same branch.
The feature branch was merged to master and feature branch is now deleted. We have multiple commits and PRs merged to master after that bad commit.

As those files are still present in git history, they are consuming space in bitbucket and I am getting warning for that.

I was trying to find out a mechanism to delete those files from that commit#.

BFG tool is doing the needful but there are few concerns:

1. At any given time, we have dozens of branches open and actively being worked in the this repo. Syncing up all work streams to merge and rebase at the same will be extremely difficult

2. There are Jira tickets having commit# and PRs linked, and if we use BFG, that will rewrite history and create new commit #. In that case, we need to update those Jira tickets with correct commit# links

3. We have multiple PRs merged and branches created after that bad commit got merged

4. After the cleaning has been done, is there a way to prevent anyone from pushing changes from the old cloned repo

Please suggest if there is a better way to do, addressing above points.

@Theodora Boudale Do you have any tips for me?

1 answer

1 accepted

2 votes
Answer accepted
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 13, 2021

Hi @Piyush Nariani ,

Removing files from git history will result in new commits hashes indeed.

Apart from BFG, it is also possible to use git filter-branch command, but both options will result in commit hashes changing. I'm afraid that it is not possible to do a history rewrite and keep the old commit hashes.

1. At any given time, we have dozens of branches open and actively being worked in the this repo. Syncing up all work streams to merge and rebase at the same will be extremely difficult

If you decide to proceed with history rewrite, my suggestion would be to schedule it for a time when no one is working on the repo, they have pushed all changes to Bitbucket and they have ceased development until the process is completed.

There needs to be some planning and coordination to avoid any loss of work.

It is also important to take a backup of the repo (you can take a clone with the flag --mirror, separate from the mirror clone where you execute BFG), so you can restore the repo if needed.

2. There are Jira tickets having commit# and PRs linked, and if we use BFG, that will rewrite history and create new commit #. In that case, we need to update those Jira tickets with correct commit# links

If the only thing you do with history rewrite is removing files from repo history, the new commits will keep the messages, which includes any references to Jira issues.

The new commits will then be displayed in Jira issues, as well as any PRs with commits that reference Jira issues, so you won't need to do any update on your side.

However, one issue here is that Jira will continue to reference the old commit ids as well. So, in a certain Jira ticket that is referenced by Bitbucket commits, you will see both old commits (prior to history rewrite) and new commits (after history rewrite). This is because commit ids are indexed in Jira database.

Are you using Jira Cloud? If so, I would suggest creating a support ticket with Jira Cloud team (you can ask one of your Jira admins to do that, if you don't have admin access to the Jira Cloud instance), and ask whether it's possible to delete indexed Bitbucket data from Jira database, in case this will help solve the issue.

3. We have multiple PRs merged and branches created after that bad commit got merged

Could you please clarify what is your concern regarding this?

4. After the cleaning has been done, is there a way to prevent anyone from pushing changes from the old cloned repo

I'm afraid that this is not possible, users can still pull changes from the Bitbucket repo and then merge them to their old cloned repo, and therefore push any large commits back. There needs to be communication regarding taking a fresh clone once history rewrite has been completed to avoid this.

If you have any other questions, please feel free to let me know.

Kind regards,
Theodora

Piyush Nariani September 13, 2021

Thanks @Theodora Boudale 
I have one more concern, if I have open PRs, will I need to merge and close the PRs before running bfg tool, or is it okay to run bfg tool without merging PRs?

Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
September 14, 2021

Hi @Piyush Nariani,

You don't have to merge the open PRs before you run BFG.

From what I've seen when I have used BFG to delete files from history, the branches remained intact. So, after pushing, the PRs were not affected and included the new commits.

This is when BFG is run in a mirror clone of the repo, as described here https://rtyley.github.io/bfg-repo-cleaner/ so that all branches are included in that clone.

Kind regards,
Theodora

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events