Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Repository size on cleared repo

angralon September 13, 2015

After overriding my repo, the size is far larger than it should be. Bitbucket tells me my repo is 1.2 GB. But if i clone it, i get a 380MB folder.

 

Here is what i did before, in order to reduce my repo size (back then it was about 800MB):

  • locally deleted my .git folder
  • added some entries to my .gitignore
  • $ git init
  • $ git add --all ./
  • $ git commit -m "initial commit"
  • $ git add origin [...]
  • $ git push --force -u origin master

Can someone explain to me, why this leads to a repo size of 1.2GB in bitbucket, while cloning results in a 380MB folder?

 

 

1 answer

1 vote
Steffen Opel _Utoolity_
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
September 13, 2015

Git uses Garbage Collection for automatic memory/storage management, see e.g. Garbage Collecting and esp. git-gc:

Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.


Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance. [emphasis mine]

Some git commands may automatically run git gc; see the --auto flag below for details. [...] 

As further outlined in section Configuration, there are various related settings and many default to several days or even months. All these can be changed of course, and in a hosted service like Bitbucket, the emphasized activity of running/configuring appropriate garbage collection is the responsibility of the provider. The possibly resulting size difference you are observing has been subject to many issues in the Bitbucket tracker already - Eric van Zijst explains how/why the Bitbucket schedule for garbage collection is somewhat dynamic in his recent answer to Purge dangling objects in Git:

We originally ran a gc on every push. However that can be expensive for larger repos and so the frequency was tied to several repo properties, including push frequency and size.

Furthermore, we don't run git gc --prune=now and so a gc run does not necessarily remove (all) unreferenced objects. This is necessary as some of Bitbucket's internal processes rely on unreferenced objects having a grace period. [...]

See Eric's answer for details on how to contact support, if this turns out to be an issue in your scenario.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events