On BitBucket Server:
Looking at the LFS storage under git-lfs/storage, I see that one of the directories is 52GB in size. I'm not sure if each directory under there corresponds to a single repository or if all repositories using LFS share directories.
At any rate, I'd like to figure out how much LFS storage space is being consumed by each repository where I've enabled LFS. Seems like a logical question any system admin would want to ask and yet the documentation seems to be completely silent on this.
Thanks for any insight!
The directory $Bitbucket_Home/shared/data/git-lfs/storage is used to store all repo LFS files, sorted based on git hashes, not sorted by repository.
The simplest way to identify the amount of LFS space used by any one repo would be to fetch the files and then look at the resulting folder size. This can be achieved with "git lfs fetch --all" as discussed at the end of Fetching extra Git LFS history. Once fetched, you can explore to your .git/lfs/objects directory and run a "du -hd 0" to find the full size of the current directory and all subdirectories.
Hopefully this helps!
I am working with some of my colleagues to go over any other possibilities, however, this appears to be the only way so far. Having said that, the process can be scripted to allow you to "set it and forget it". This can be achieved by following the below high-level steps.
If I make a request to <server>/rest/git-lfs/storage/<proj>/<repo>/<fake oid>, I get an error message that includes the path to where the file would have been stored if it existed. So if you iterate over all projects/repos you can build up a mapping from the git repositories to the the directories on the server in shared/data/git-lfs/storage.
At least in my case (BB Server 6.7.2), I got a perfect one-to-one mapping. So even though the documentation says that "all repositories share this object store", it appears that each repo gets its own directory. I guess there could be a collision, but that seems rare, and you could deal with that on a case by case basis. Or there is some something I've missed.
Anyhow, it would be nice if Atlassian could tell us what the hashing function is or confirm my guess.
Or I am really missing something. As an example, I get 1.9GB when I pull down a repo and lfs fetch all. But the server directory I think corresponds is 6.3GB. Bad or missing garbage collection, maybe. I don't think its packing on the local side, the number of files in .git/lfs/objects matches the output of git lfs ls-files.
Just a little heads up, if the server is leaking paths on disk in error messages, that's a bug that's likely to get fixed. No error from the server should ever report a path on disk; it's a potential security issue. So that mechanism for finding the path on disk is likely to stop working at some point (likely quite soon).
That said, there's no need to hassle with fake REST requests you expect to fail. The layout of the LFS storage on the server is straightforward and built using repository hierarchy IDs. That means it is _not possible_ to get the usage size per repository--unless the repository is a top-level repository (i.e. not a fork) which has never been forked. All of the LFS objects for every repository in a hierarchy are stored together.
The repository hierarchy ID for any repository is readily available at `/rest/api/1.0/projects/<key>/repos/<slug>`. If you have access to the repository you can find its hierarchy ID, and if you have access to the server you can use that ID to find the LFS objects shared amongst every repository in that hierarchy.
If per-repository numbers that differentiate between forks are required, then the answer provided previously remains the closest way to approximate it.
Thanks, that was the insight I was looking for.
My case is likely somewhat unique, in that forks are seldom used in my organization. That explains why I got such a nice mapping.
I looked at `/rest/api/1.0/projects/<key>/repos/<slug>`, I did not see a hierarchy ID. There is an numeric id field that indicates where the repo is stored under shared/data/repositories/, but I don't see a hash that indicates where the LFS files are stored under shared/data/git-lfs/storage/. Is that available somewhere else?
Also, FYI to all, there is no garbage collection of LFS right now. So the delta between the server and the "pull repo down and measure it" method could be a lot if you have a developer that is, um, let's say "prone to mistakes".
I'm not following you Bryan. When I do a get on /rest/api/1.0/projects/<key>/repos/<slug>, I get the repository ID as Charles mentioned, but I also don't see anything there that's a hierarchy ID. I don't see it mentioned in the documentation and I don't see it in the results when I try it against our BitBucket Data Center instance that's running the latest version.
It looks like the top-level directory for a storage hierarchy is looks something like this: /shared/data/git-lfs/storage/022e25516213ddd4f082. That long number at the end is the hierarchy ID you referred to I guess. On our system, I know we have 274 repositories with LFS enabled, but I only see 91 directories under git-lfs/storage, so I guess there are 91 separate hierarchies, with the rest being forks that are folded into these.
Could you please elaborate on the proper method of determining the hierarchy ID? I confirmed that I can currently see it in the error message that Charles mentioned, but I do *not* see it in the REST api output.
There's another way to find the necessary information, though. Once you have the repository ID, you can use that to navigate to the repository's directory on disk. For most repositories, that should contain a `repository-config` file, which will have a "hierarchy" value in it. That's the repository's hierarchy ID. (If the repository was created prior to Bitbucket Server 4.12 and hasn't been renamed or moved to a new project, it may not have a `repository-config`.)
Otherwise, the only other way to get the hierarchy ID is to check the `repository` table in the database.
Again, apologies for the misinformation on it being in the REST payload--but it will be there in the future. (See BSERV-12174; I'll have that change up for review internally later today.)
Hi everyone, We are looking to learn more about development teams’ workflows and pain points, especially around DevOps, integrations, administration, scale, security, and the related challeng...
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event
You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events