Can an admin please run garbage collection on my repo? I have removed some large files using BFG. I do not know how else to request this other than posting a question to the community. Is there a way to request this through the UI? Thanks.
Hello Susan,
and welcome to the community
I've run Git GC against your repository, however, it's size still at 930 MB.
This means that additional clean-up might be necessary to free-up some extra space. For this, you can follow the below steps in a local clone of the repository:
1. Identify the largest files in your repositories by executing the following command:
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \| awk '/^blob/ {print substr($0,6)}' \
| sort -r --numeric-sort --key=2 \
| numfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
2. Perform cleanup operations locally to reduce the size
3. Confirm the new size locally running the following command inside the repo's folder:
git count-objects -vH
4. After confirming the size has reduced locally, push your changes to Bitbucket Cloud
After the cleanup is executed, please let us know here, as we might need to run an additional garbage collection.
Let us know in case you have any questions.
Thank you, @Susan Begley !
Patrik S
Hi there,
I already ran the following after doing the BFG removals:
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push
When I look at the largest files in my repo on a fresh clone, none of the previously largest files are there any more (the ones I removed).
If I do a new clone with the --mirror flag, go into the folder and run "git count-objects -vH", I get:
count: 0
size: 0 bytes
in-pack: 216854
packs: 1
size-pack: 756.67 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
So why does Bitbucket think it's 930 MB?
Thanks for your help!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey @Susan Begley ,
Thank you for your confirmation.
This size discrepancy, even after cleaning up the repository and executing a GC on the server side, is likely related to large files that were previously pushed to pull requests in that repository.
To display diffs of pull requests after they are merged, Bitbucket Cloud must preserve the Git references of the commits that were part of those pull requests. In this case, even though you have cleaned those commits, Bitbucket still needs to retain those references to accurately show the pull request diff.
If you'd like to further reduce the repository size, one option is to delete the existing repository and push your cleaned-up repo to a new repository in your workspace. This new repository will not include the Pull request history of the original one, and should reflect a size closer to the one you see locally.
Thank you, @Susan Begley !
Patrik S
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Are you saying there is no way to ever reduce the size of the repository? That even deleting the history of the files doesn't delete them entirely? I thought that there was a way to remove files permanently in case someone checked in sensitive data.
The files we deleted were never actually part of a pull request as they were added before we even started using branches and pull requests and never changed. Also, they are not text files. They should not be a part of the pull request diffs.
Is it possible for you to delete our pull request diffs?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey @Susan Begley ,
The cleanup reduced the actual size of your repository, as when cloning the repository, the pull request-protected references are not included.
However, as Bitbucket has to store them on the backend in order to show the diffs, they might inflate the repository size on the Bitbucket side, depending on the amount/size of the files that were pushed to those pull requests. Even though those files no longer exist in the repo, their references are still preserved.
The fastest option to solve this, if you don't mind preserving the previously merged pull request data/diff, is duplicating your repository and then deleting the original repository. The new repository won't have any pull request, so it won't have any references other than the ones currently existing in the repo. This won't delete any data from the repository itself, just the pull request history/metadata that is available in the UI won't exist in the new repo.
However, if you wouldn't like to be without all the PRs, please let us know so we can create a support ticket for you and discuss other options
Thank you, @Susan Begley !
Patrik S
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Patrik,
Thank you for your response. I still have questions:
1. My goal was to reduce the amount of storage that my repository takes up on my Bitbucket workspaces since there is now a 1 GB limit and we are approaching that limit. Is this impossible then since it does not seem possible to remove a large file completely?
2. If the files are not removed on the back-end from the diffs, what if they had contained sensitive information that we were trying to erase? Should it not be possible to permanently remove ALL references to a file, including pull-request diffs, without having to create a whole new repo?
3. As mentioned earlier, the large files that were removed were added to the repo long before we ever used branches/pull requests and were never edited. Thus they should never have been part of any pull-request diffs, and should not be taking up space in the workspace any more. Why are they still counting against our workspace quota?
4. I don't see how I can create a new repository and not go over the new 1GB limit without first deleting my existing repo, which I am not comfortable doing. Also, I do not want to lose the links between Jira and commits/pull-requests/etc that are in my current repo. This does not seem like an acceptable solution to the problem.
Thanks again,
Susan
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey Susan,
I'd have to take a closer look at your repository to confirm if it's indeed PR diffs that are contributing to the additional size or if there's something else inflating the size.
So I can open a support ticket for you and our team can assist, could you let us know which timezone you're in?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
HI Patrik,
I am in Eastern Standard Time (or Eastern Daylight Time at the moment).
Thanks,
Susan
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Perfect, Susan!
I've opened an internal support ticket for you in the timezone you've specified.
You'll receive an email notification with the ticket link soon.
In case you don't receive it, please let me know, so I can share it here with you (it's only visible to you and Atlassian staff).
Thank you!
Patrik S
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.