Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Large commit - unable to clean up

Kobus Myburgh August 17, 2022

Hi there,

This may not be directly related to Bitbucket, could be a Git problem I am having, but I am going to ask it anyway, hoping someone can assist. Asking on Stackoverflow as well, in case nobody here can answer me.

I have three branches contributing to the issue I have at the moment:

  • master_revert (based off of `master_v9` before a major problem got introduced).
  • master_v9 (the branch with the major issue in).
  • master_changes (changes made since the revert on 29 July 2022, based off master_revert).

master_changes contain one very large commit (1000+ deleted files) and about 40 other  smaller commits since 29 July 2022.

When trying to merge master_changes into master_v9, Bitbucket can't show me all files changed, because there are too many files. So - what I have done, and I think that is where things went wrong, but I don't know for sure:

  • Checked out master_v9
  • Cherry-picked the commit from master_changes that had the large amount of deleted files.
  • There were conflicts on two files, which I manually resolved locally.
  • But then I accidentally did git add . and git commit -m ... and git push instead of git cherry-pick --continue

While it did not give me any errors and I could happily complete that, I am now seeing that my conflicts are resolved in Bitbucket (the code is in order) and within the bitbucket SOURCE view, I do not see the folder with over 1000 files in that I deleted) so to all indications, this seems to have worked correctly regardless of where I think I went wrong.

The problem now is that if I make a Pull Request in Bitbucket cloud, the Pull Request screen STILL shows these files that have been deleted in master_v9 in the file list on the right side of Bitbucket Pull Request screen, preventing me from seeing all the files that were actually changed (not deleted).

I have tried to:

  • Reset head (hard) to before this was done, and then did a git push.
  • Repeated the process (but correctly this time around) and still the same problem.
  • Then, I reset head (hard) again to before I did this, simply deleted the files locally, and committed and pushed up again.

But unfortunately the issue remains.

Any idea if or how this can be fixed?

Thanks in advance,

Kobus

 

2 answers

0 votes
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 23, 2022

Hi Kobus,

The Source page of the repo will show the files at the latest commit in the main branch of the repo. If the main branch is e.g. master, you'll need to switch from the dropdown to a different branch to see the source at the latest commit of that branch.

Another thing to note is what I mentioned about 'files at the latest commit'. If you had a certain file at an earlier commit, and then committed this file's deletion, the file will not show on the Source page. However, it is still part of the history at an earlier commit (since committing a file's deletion does not remove the file from the Git history).

The first thing to figure out is if this folder is still part of the repo's history. If the repo doesn't have a very long or complicated history, you can check the commits from the Commits page of the repo, you can filter only the branch that you are using as a source branch in pull requests (a pull request diff will show the difference between the tip of a source branch and the commit from which it branched off the destination.). Otherwise, you can search in a clone of the repo with git log --all -- myFolder/my_file.txt where myFolder/my_file.txt is the path to one of the files in the large folder.

If the folder with the large files is no longer part of the history, a garbage collection triggered from our side should remove any unreferenced commits. However, if the files show in a pull request diff, I suspect that the folder might still be part of the Git history.

If this is the case and if you want to remove it from history, you can try Rob's suggestion with BFG (you can use it with the argument --delete-folders which allows you to delete a certain folder). Please note that this is a tool that rewrites history, which means that the commit hashes will change if you use it. It would be good to communicate this to other users of the repo.

If you use BFG, I would suggest pushing your changes first in an empty Bitbucket Cloud repo to inspect the changes and see if the repo is in good shape (prior to pushing to your current repo).

If you decide to push changes with BFG to the current repo, a garbage collection may be needed to reduce its size and remove unreferenced commits. You can either post here and I can run it for your repo or if the repo belongs in a workspace on a paid billing plan, you can create a support ticket via https://support.atlassian.com/contact/#/, in "What can we help you with?" select "Technical issues and bugs" and then Bitbucket Cloud as product.

Kind regards,
Theodora

0 votes
Rob van der Lee August 22, 2022

It seems like you are in a tight spot. This is not an easy task for sure. There is a tool for this. I've used this once and it worked for me but it will remain a risky tool. I do recommend to properly backup anything before attempting to solve your issues.

Big Fudging Git:
https://rtyley.github.io/bfg-repo-cleaner/

I think you would need the first example:

bfg --delete-files id_{dsa,rsa}  my-repo.git

I think you might accidentally added a folder that wasn't supposed to end up in your repo. 1000 files seems much, might be something like a caching folder? In any rate, BFG should get you sorted if you haven't already solved your issue by now.

For future questions, it might be a good idea to post the SO link also so we can check if you've already solved it :)

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
TAGS
AUG Leaders

Atlassian Community Events