Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,298,493
Community Members
 
Community Events
165
Community Groups

Delete large files from git history without breaking Commit and PR links in Jira tickets

Edited

Scenario: I want to delete some large files from git history [specific commit #]

The scenario is, those files were added accidentally in a feature branch and later, I deleted them and commit again in the same branch.
The feature branch was merged to master and feature branch is now deleted. We have multiple commits and PRs merged to master after that bad commit.

As those files are still present in git history, they are consuming space in bitbucket and I am getting warning for that.

I was trying to find out a mechanism to delete those files from that commit#.

BFG tool is doing the needful but there are few concerns:

1. At any given time, we have dozens of branches open and actively being worked in the this repo. Syncing up all work streams to merge and rebase at the same will be extremely difficult

2. There are Jira tickets having commit# and PRs linked, and if we use BFG, that will rewrite history and create new commit #. In that case, we need to update those Jira tickets with correct commit# links

3. We have multiple PRs merged and branches created after that bad commit got merged

4. After the cleaning has been done, is there a way to prevent anyone from pushing changes from the old cloned repo

Please suggest if there is a better way to do, addressing above points.

@Theodora Boudale Do you have any tips for me?

1 answer

1 accepted

2 votes
Answer accepted

Hi @Piyush Nariani ,

Removing files from git history will result in new commits hashes indeed.

Apart from BFG, it is also possible to use git filter-branch command, but both options will result in commit hashes changing. I'm afraid that it is not possible to do a history rewrite and keep the old commit hashes.

1. At any given time, we have dozens of branches open and actively being worked in the this repo. Syncing up all work streams to merge and rebase at the same will be extremely difficult

If you decide to proceed with history rewrite, my suggestion would be to schedule it for a time when no one is working on the repo, they have pushed all changes to Bitbucket and they have ceased development until the process is completed.

There needs to be some planning and coordination to avoid any loss of work.

It is also important to take a backup of the repo (you can take a clone with the flag --mirror, separate from the mirror clone where you execute BFG), so you can restore the repo if needed.

2. There are Jira tickets having commit# and PRs linked, and if we use BFG, that will rewrite history and create new commit #. In that case, we need to update those Jira tickets with correct commit# links

If the only thing you do with history rewrite is removing files from repo history, the new commits will keep the messages, which includes any references to Jira issues.

The new commits will then be displayed in Jira issues, as well as any PRs with commits that reference Jira issues, so you won't need to do any update on your side.

However, one issue here is that Jira will continue to reference the old commit ids as well. So, in a certain Jira ticket that is referenced by Bitbucket commits, you will see both old commits (prior to history rewrite) and new commits (after history rewrite). This is because commit ids are indexed in Jira database.

Are you using Jira Cloud? If so, I would suggest creating a support ticket with Jira Cloud team (you can ask one of your Jira admins to do that, if you don't have admin access to the Jira Cloud instance), and ask whether it's possible to delete indexed Bitbucket data from Jira database, in case this will help solve the issue.

3. We have multiple PRs merged and branches created after that bad commit got merged

Could you please clarify what is your concern regarding this?

4. After the cleaning has been done, is there a way to prevent anyone from pushing changes from the old cloned repo

I'm afraid that this is not possible, users can still pull changes from the Bitbucket repo and then merge them to their old cloned repo, and therefore push any large commits back. There needs to be communication regarding taking a fresh clone once history rewrite has been completed to avoid this.

If you have any other questions, please feel free to let me know.

Kind regards,
Theodora

Thanks @Theodora Boudale 
I have one more concern, if I have open PRs, will I need to merge and close the PRs before running bfg tool, or is it okay to run bfg tool without merging PRs?

Hi @Piyush Nariani,

You don't have to merge the open PRs before you run BFG.

From what I've seen when I have used BFG to delete files from history, the branches remained intact. So, after pushing, the PRs were not affected and included the new commits.

This is when BFG is run in a mirror clone of the repo, as described here https://rtyley.github.io/bfg-repo-cleaner/ so that all branches are included in that clone.

Kind regards,
Theodora

Suggest an answer

Log in or Sign up to answer
TAGS
Community showcase
Published in Bitbucket

Git push size limits are coming to Bitbucket Cloud starting April 4th, 2022

Beginning on April 4th, we will be implementing push limits. This means that your push cannot be completed if it is over 3.5 GB. If you do attempt to complete a push that is over 3.5 GB, it will fail...

2,229 views 2 9
Read article

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you