Enforce Single File Size Limit with Script Runner for Bitbucket Server?

PeteToscano October 1, 2019

I'm using Bitbucket Server 6.6.2 with the latest Script Runner for Bitbucket Server. 

We had an issue with new users using a mis-configured (or not configured) git-lfs. It was not tracking the right file types and the users were accidentally submitting these large files (PDFs) directly to the repo, then things went haywire between some correctly configured clients and the not-correctly configured clients.

After cleaning everything up, we wanted to add a safeguard to keep these large PDF files from being submitted again. Since git-lfs replaces the real file contents with pointers to the git-lfs file store, the actual size of the PDF files committed to our main git repo are not that big. Looking at the size of the pointer files, it looks like 200B is plenty large enough to allow the pointers through, but plenty small enough to disallow any real PDFs. 

Before we upgraded to Bitbucket Server 6+, we were using a plugin that did this perfectly. Unfortunately, this plugin is no longer under development, so we needed to find something else to do this job. Enter Script Runner for Bitbucket Server, which we were already using for another task. The "restrict file size" pre-commit hook seemed to fit the bill nicely. We told it to looks for PDFs (with `pathsMatch('glob:**.pdf')`) and gave it a max size of 200B. 

This works well if the commit consists of only PDFs. The problem is that if there's a mix of files in the commit, then the file size limit applies to all files in the commit, not just the ones matched with 'pathsMatch()`. For example, a commit is rejected if someone checks in a 132B PDF and a 1K MD file. Right now, I have the users of this repo only committing PDFs by themselves and not with any other file types, but I'd like to have them not be concerned about this. 

The fixes I can think of are:

1. Update the condition in the restrict file size pre-commit hook somehow.
2. Write a custom script hook.
3. Not use SR for BS and write a shell script.

Options 1 and 2 are kind of iffy because I've never used Groovy before and I'm not exactly sure what information is being passed to the plugin. On the other hand, it's just another programming language, so it shouldn't be too hard, if I can figure what goes where and what's exposed to the script. It seems that SR is so close to what I'm looking for that it shouldn't be too difficult to do, yet I'm hitting walls with everything I attempt.

Option 3 is appealing because I'm a shell scripter at heart ("When all you have is a hammer..."), but I'd rather not do hooks 50 different ways, so if I can keep everything contained within SR, all the better.

Anyone have advice for how to address this issue? 

Thanks,
Pete

2 answers

Suggest an answer

Log in or Sign up to answer
0 votes
PeteToscano October 14, 2019

FWIW, I hacked this script -- mostly the awk part where the large_files variable is defined -- so that it meets my needs.

large_pdfs="$($GIT rev-list --objects "$target" --not --branches=\* --tags=\* | \ 
$GIT cat-file $'--batch-check=%(objectname)\t%(objecttype)\t%(objectsize)\t%(rest)' | \
awk -F '\t' -v max_size="$MAX_SIZE_PDF" '{ if ($4 ~/\.pdf$/ && $3 > max_size) { print } }'| cut -f 4- )"

There's only one problem that we've found so far. If someone (with a broken or mis-configured git-lfs client) takes a PDF and adds it to the repo, then copies that PDF to a different filename that doesn't end with ".pdf" and comes before the PDF filename alphabetically, the PDF file will be allowed through. This is because (I believe) git will see that both files have the same hash, so they are the same object and that object is given the name that comes first alphabetically. This name is what's returned by git cat-file, so the object isn't caught by the awk filter that looks at the git cat-file output. 

At least for the repo it's intended for, we think this is unlikely enough to happen. 

Still, I'd love to see other suggestions for improving things. I'm not thrilled about moving to production with this bug, but this might be another case where perfect is the enemy of good enough. 

Julius Davies _bit-booster_com_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 19, 2019

Hi, Pete,

Control Freak now supports your usecase (as of v2019.10.20):

large-files.png

In my opinion the bug you identified is unavoidable since it's coming from "git rev-list --objects" behaviour.

The only way I can think to workaround that bug is to run "git ls-tree -r HEAD" and look for all objectId matches.  Control Freak won't do that, however, because "git ls-tree -r" is pretty expensive (from an I/O and RAM perspective) just to defend against this one bug.

(Also, the workaround fails if the maliciously renamed file arrives in an earlier commit, anyway!)

Like PeteToscano likes this
PeteToscano October 22, 2019

Yeah, I think the chance of this happening without intending it to happen is slim-to-none, but it was a concern. Thanks for verifying.

0 votes
Julius Davies _bit-booster_com_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 13, 2019

 

Paging Adaptavist!  @Reece Lander [ScriptRunner - The Adaptavist Group] 

Meanwhile, I plan to implement something for you in my free Control Freak plugin.  Stay tuned!

Julius Davies _bit-booster_com_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 19, 2019

 

We've implemented this as of v2019.10.20 of Control Freak.

TAGS
AUG Leaders

Atlassian Community Events