On BitBucket Server:
Looking at the LFS storage under git-lfs/storage, I see that one of the directories is 52GB in size. I'm not sure if each directory under there corresponds to a single repository or if all repositories using LFS share directories.
At any rate, I'd like to figure out how much LFS storage space is being consumed by each repository where I've enabled LFS. Seems like a logical question any system admin would want to ask and yet the documentation seems to be completely silent on this.
Thanks for any insight!
Hello Dave,
The directory $Bitbucket_Home/shared/data/git-lfs/storage is used to store all repo LFS files, sorted based on git hashes, not sorted by repository.
The simplest way to identify the amount of LFS space used by any one repo would be to fetch the files and then look at the resulting folder size. This can be achieved with "git lfs fetch --all" as discussed at the end of Fetching extra Git LFS history. Once fetched, you can explore to your .git/lfs/objects directory and run a "du -hd 0" to find the full size of the current directory and all subdirectories.
Hopefully this helps!
I appreciate the advice...truly. We have hundreds of repositories, though. The strategy of cloning all repositories and fetching the LFS contents doesn't really seem like a viable way for a large enterprise to manage space on BitBucket server. Is there really no other way?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Dave,
I am working with some of my colleagues to go over any other possibilities, however, this appears to be the only way so far. Having said that, the process can be scripted to allow you to "set it and forget it". This can be achieved by following the below high-level steps.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This solution is woefully inadequate for enterprise customers who have thousands of repositories. I'm going to mark it as the accepted answer, however, since it seems to be the only one available.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
If I make a request to <server>/rest/git-lfs/storage/<proj>/<repo>/<fake oid>, I get an error message that includes the path to where the file would have been stored if it existed. So if you iterate over all projects/repos you can build up a mapping from the git repositories to the the directories on the server in shared/data/git-lfs/storage.
At least in my case (BB Server 6.7.2), I got a perfect one-to-one mapping. So even though the documentation says that "all repositories share this object store", it appears that each repo gets its own directory. I guess there could be a collision, but that seems rare, and you could deal with that on a case by case basis. Or there is some something I've missed.
Anyhow, it would be nice if Atlassian could tell us what the hashing function is or confirm my guess.
Or I am really missing something. As an example, I get 1.9GB when I pull down a repo and lfs fetch all. But the server directory I think corresponds is 6.3GB. Bad or missing garbage collection, maybe. I don't think its packing on the local side, the number of files in .git/lfs/objects matches the output of git lfs ls-files.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Just a little heads up, if the server is leaking paths on disk in error messages, that's a bug that's likely to get fixed. No error from the server should ever report a path on disk; it's a potential security issue. So that mechanism for finding the path on disk is likely to stop working at some point (likely quite soon).
That said, there's no need to hassle with fake REST requests you expect to fail. The layout of the LFS storage on the server is straightforward and built using repository hierarchy IDs. That means it is _not possible_ to get the usage size per repository--unless the repository is a top-level repository (i.e. not a fork) which has never been forked. All of the LFS objects for every repository in a hierarchy are stored together.
The repository hierarchy ID for any repository is readily available at `/rest/api/1.0/projects/<key>/repos/<slug>`. If you have access to the repository you can find its hierarchy ID, and if you have access to the server you can use that ID to find the LFS objects shared amongst every repository in that hierarchy.
If per-repository numbers that differentiate between forks are required, then the answer provided previously remains the closest way to approximate it.
Best regards,
Bryan Turner
Atlassian Bitbucket
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks, that was the insight I was looking for.
My case is likely somewhat unique, in that forks are seldom used in my organization. That explains why I got such a nice mapping.
I looked at `/rest/api/1.0/projects/<key>/repos/<slug>`, I did not see a hierarchy ID. There is an numeric id field that indicates where the repo is stored under shared/data/repositories/, but I don't see a hash that indicates where the LFS files are stored under shared/data/git-lfs/storage/. Is that available somewhere else?
Also, FYI to all, there is no garbage collection of LFS right now. So the delta between the server and the "pull repo down and measure it" method could be a lot if you have a developer that is, um, let's say "prone to mistakes".
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I'm not following you Bryan. When I do a get on /rest/api/1.0/projects/<key>/repos/<slug>, I get the repository ID as Charles mentioned, but I also don't see anything there that's a hierarchy ID. I don't see it mentioned in the documentation and I don't see it in the results when I try it against our BitBucket Data Center instance that's running the latest version.
It looks like the top-level directory for a storage hierarchy is looks something like this: /shared/data/git-lfs/storage/022e25516213ddd4f082. That long number at the end is the hierarchy ID you referred to I guess. On our system, I know we have 274 repositories with LFS enabled, but I only see 91 directories under git-lfs/storage, so I guess there are 91 separate hierarchies, with the rest being forks that are folded into these.
Could you please elaborate on the proper method of determining the hierarchy ID? I confirmed that I can currently see it in the error message that Charles mentioned, but I do *not* see it in the REST api output.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
My apologies, @Dave Thomas and @Charles Pikscher. A bug I thought had been fixed apparently hasn't been, and so the hierarchy ID isn't in the REST response.
There's another way to find the necessary information, though. Once you have the repository ID, you can use that to navigate to the repository's directory on disk. For most repositories, that should contain a `repository-config` file, which will have a "hierarchy" value in it. That's the repository's hierarchy ID. (If the repository was created prior to Bitbucket Server 4.12 and hasn't been renamed or moved to a new project, it may not have a `repository-config`.)
Otherwise, the only other way to get the hierarchy ID is to check the `repository` table in the database.
Again, apologies for the misinformation on it being in the REST payload--but it will be there in the future. (See BSERV-12174; I'll have that change up for review internally later today.)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.