Many Somewhat Large Binary Files; Need Special Handling?

We have a project with about 1,300 files, totalling about 1.61GB, largest file at 11MB. About 99% of these files are binary files. We're wanting to import this project into Stash/Git. Will something like this need something specially like Git Annex, Git BigFiles, Git Fat?

http://blogs.atlassian.com/2014/05/handle-big-repositories-git/

1 answer

1 accepted

0 vote

Hi Mark,

Not necessarily. Although your repository is on the larger side you don't have to start looking into specialised tools just yet if the repository works well for you.

The size of the repository will affect a number of things:

  • Time to clone and bandwidth used (for a normal clone you will be transferring the full 1.6 GB; if you have a lot of builds that could easily add up). If you need the data it probably doesn't matter where you get if from (git vs. external storage used in for example git annex).
  • Based on the above the load on the system will be higher and you need sufficient resources for git to be able to support cloning the repository (see https://confluence.atlassian.com/display/STASH/Scaling+Stashfor some background information on Stash's resource usage). As Stash will cache the packfile generated for a clone you also need sufficent space for the system.
  • You may also have to adjust some timeout/buffer settings.

Other things to think about:

  • What is the expected growth? Is the overall size likely to increase significantly or do you expect modest growth?

In general I wouldn't consider files up to 11MB to be too large to manage with git. It gets difficult when you talk about multiples of that though (e.g. over 50MB it becomes worth looking into alternatives IMHO).

Have you tried importing it to see how it impacts your usage and the instance?

Cheers,

Stefan

One important thing to consider is how often these binary files change. Every time a binary changes, the repo size goes up by almost the full file size (since the diff is almost the entire file).

=> If they change often, you might want to consider putting the binaries into a separate repo where you can (more easily) wipe (some of) the history to reduce the repo size. (This can be mapped as a submodule into the main repo, if necessary.)

Suggest an answer

Log in or Sign up to answer
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Posted Jun 06, 2018 in Bitbucket

Do you use Bitbucket Cloud and Jira Cloud? If so, let us know!

Hi Community, I'm Julia and I'm on the Jira Software Cloud marketing team!  We're looking for companies or teams using Bitbucket Cloud and Jira Software Cloud. If your team fits the t...

170 views 6 3
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you