Many Somewhat Large Binary Files; Need Special Handling?

We have a project with about 1,300 files, totalling about 1.61GB, largest file at 11MB. About 99% of these files are binary files. We're wanting to import this project into Stash/Git. Will something like this need something specially like Git Annex, Git BigFiles, Git Fat?

http://blogs.atlassian.com/2014/05/handle-big-repositories-git/

1 answer

1 accepted

Accepted Answer
0 votes

Hi Mark,

Not necessarily. Although your repository is on the larger side you don't have to start looking into specialised tools just yet if the repository works well for you.

The size of the repository will affect a number of things:

  • Time to clone and bandwidth used (for a normal clone you will be transferring the full 1.6 GB; if you have a lot of builds that could easily add up). If you need the data it probably doesn't matter where you get if from (git vs. external storage used in for example git annex).
  • Based on the above the load on the system will be higher and you need sufficient resources for git to be able to support cloning the repository (see https://confluence.atlassian.com/display/STASH/Scaling+Stashfor some background information on Stash's resource usage). As Stash will cache the packfile generated for a clone you also need sufficent space for the system.
  • You may also have to adjust some timeout/buffer settings.

Other things to think about:

  • What is the expected growth? Is the overall size likely to increase significantly or do you expect modest growth?

In general I wouldn't consider files up to 11MB to be too large to manage with git. It gets difficult when you talk about multiples of that though (e.g. over 50MB it becomes worth looking into alternatives IMHO).

Have you tried importing it to see how it impacts your usage and the instance?

Cheers,

Stefan

One important thing to consider is how often these binary files change. Every time a binary changes, the repo size goes up by almost the full file size (since the diff is almost the entire file).

=> If they change often, you might want to consider putting the binaries into a separate repo where you can (more easily) wipe (some of) the history to reduce the repo size. (This can be mapped as a submodule into the main repo, if necessary.)

Suggest an answer

Log in or Sign up to answer
Community showcase
Published Aug 21, 2018 in Bitbucket

Branch Management with Bitbucket

As a project manager, I have discovered that different developers want to bring their previous branching method with them when they join the team. Some developers are used to performing individual wo...

2,356 views 9 12
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you