Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Many Somewhat Large Binary Files; Need Special Handling?

Mark Tinsley August 14, 2014

We have a project with about 1,300 files, totalling about 1.61GB, largest file at 11MB. About 99% of these files are binary files. We're wanting to import this project into Stash/Git. Will something like this need something specially like Git Annex, Git BigFiles, Git Fat?

http://blogs.atlassian.com/2014/05/handle-big-repositories-git/

1 answer

1 accepted

0 votes
Answer accepted
Stefan Saasen
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 15, 2014

Hi Mark,

Not necessarily. Although your repository is on the larger side you don't have to start looking into specialised tools just yet if the repository works well for you.

The size of the repository will affect a number of things:

  • Time to clone and bandwidth used (for a normal clone you will be transferring the full 1.6 GB; if you have a lot of builds that could easily add up). If you need the data it probably doesn't matter where you get if from (git vs. external storage used in for example git annex).
  • Based on the above the load on the system will be higher and you need sufficient resources for git to be able to support cloning the repository (see https://confluence.atlassian.com/display/STASH/Scaling+Stashfor some background information on Stash's resource usage). As Stash will cache the packfile generated for a clone you also need sufficent space for the system.
  • You may also have to adjust some timeout/buffer settings.

Other things to think about:

  • What is the expected growth? Is the overall size likely to increase significantly or do you expect modest growth?

In general I wouldn't consider files up to 11MB to be too large to manage with git. It gets difficult when you talk about multiples of that though (e.g. over 50MB it becomes worth looking into alternatives IMHO).

Have you tried importing it to see how it impacts your usage and the instance?

Cheers,

Stefan

Balázs Szakmáry
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 31, 2014

One important thing to consider is how often these binary files change. Every time a binary changes, the repo size goes up by almost the full file size (since the diff is almost the entire file).

=> If they change often, you might want to consider putting the binaries into a separate repo where you can (more easily) wipe (some of) the history to reduce the repo size. (This can be mapped as a submodule into the main repo, if necessary.)

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events