Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Speccing a Stash server

Andrew Milne April 28, 2013

I'm reaching the end of an evaluation period using Atlassian Stash and so far it's faired significantly better than the alternatives and as such I'm starting to look in to what sort of hardware we'd need to deploy it.

I'm not particularlly familiar with the sort of criteria that need factored in but I need to give some guidance to our IT department and I'm hoping I can get some help here from other users and from any Atlassian employees kicking arround (should I be contacting sales for this sort of help?).

I've been looking at the https://confluence.atlassian.com/display/STASH/Scaling+Stash page and while it's got some usefull guidelines it's not very specific.

Overview of what we're lookign to deploy: We're looking at over a hundred developers. We'll be starting with a few Git repositories (20 or 30) and slowly migrating and adding new ones, that could eventually bring us to hundreds of repositories (potentially over a 1000 if we did a full migration). The vast majority are under 10MB but a few go up to the 300-400MB range.

Disk space for the git repos is relativlly easy to estimate based on average size per commit and the commit rate but the rest is harder.

For memory and CPU I'm guessing the most important figure is how many concurrent clone operations we need to support and what distribution of sizes they are? Does anyone have any experience of how many concurrent operation you tend to get concurrently based on company sizes? I know how many developers we have but I've no idea how to transfer that to activety rates on the server.

Any help would be greatlly appreciated.

Thanks

2 answers

1 accepted

2 votes
Answer accepted
Stefan Saasen
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 29, 2013

Hi Andrew,

For memory and CPU I'm guessing the most important figure is how many concurrent clone operations we need to support and what distribution of sizes they are?

Yes, that is correct. Clones tend to be expensive both CPU and memory wise.

Does anyone have any experience of how many concurrent operation you tend to get concurrently based on company sizes?

Unfortunately the company size or the number of users using the system is not a good proxy for estimating the resource usage. Especially with heavy CI usage a system with just a handful of users could easily be under more load than a system with hundreds of users but barely any CI. It further depends on how CI is configured (i.e. whether build agents need a full clone vs. a shallow clone or just a fetch).

For heavy CI usage with a lot of clones we recommend using our caching plugin we have documented here: https://confluence.atlassian.com/display/STASH/Scaling+Stash+for+Continuous+Integration+performance

To come back to your question and to give you some rough ideas around numbers we have one internal system with the following specs:

  • 4 hyper threaded cores (so cat /proc/cpuinfo | grep processor | wc -l reports 8 on that machine)
  • 12 GB of RAM
  • With the cache plugin enabled for both clone and ref advertisement operations

This machine handles roughly 3500 git operations per hour. The average number of git operations is unfortunately not very useful as especially load peaks will determine the overall performance (for the timeframe I'm looking at we saw a peak of 11000 git ops per hour with the majority being ref advertisements). This is largely due to a large number of builds that run against this machine.

The number of concurrent clone operations is roughly between 14 and 30. The number of concurrent fetch operations around is 40. For both the variance is fairly high.

It's important to point out that the cache plugin is essential for this type of configuration to handle the load and the spikes we see.

Please don't take the absolute numbers too seriously, the usage profiles do vary a lot between companies and teams and even more if you add CI to the mix.

Hope this helps.

Cheers,

Stefan


Andrew Milne April 29, 2013

Thanks for that, some good information in there.

One thing confuses me in your answer Stefan though is your saying a 8 core machine (4 with hyperthreading) is handling 14 to 30 concurrent clone operations but I thought the default was only to allow 1.5*core count. so 12 on that machine? Are you saying that the throttle.resource.scm-hosting value can generally be increased to a higher multiple? or is that due to the cache plugin?

I'm not actually sure how to use http(s) as it requires entering the password every time. The credentials helper could help end users but I'm not sure how to use it for a build account running in the background as its only permanent storage option is plain text on Linux. Is there any way to get anonymous clone access for a specific user? The only mention I can see is an open ticket (STASH-2565). I can of course use SSH but if that's a performance issue...

Michael Heemskerk
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 29, 2013

The higher level of concurrency is due to the cache plugin. Any clone or ref advertisement that is served from the cache does not count against the throttle.resource.scm-hosting limits. Those clones are considered to be nearly 'free' in terms of memory and CPU usage.

With regards to HTTP(s) and authentication. We don't have a mechanism yet that would allow you to anonymously clone. STASH-2565 is the issue you'd want to watch for that one, and/or perhaps https://jira.atlassian.com/browse/STASH-2722. If you're using a build server (Bamboo, Jenkins, etc.) you can configure the credentials in the build server.

Switching to SSH is an option, but the performance will suffer a bit. Also note that if you're planning to use the cache plugin, it only supports HTTP(S) at the moment.

0 votes
Harry Chan
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 29, 2013

Hi, clones don't and shouldn't appear that often. If they do, it can be slower. Most times you'd pull and/or push. Clone only occurs when a new developer is starting a project or being moved onto a project, etc... I don't see it being the norm.

Based on that, I'd go with an Intel Xeon E3 (that has 4 cores) and up to 32GB of RAM.

However, if you really do anticipate a lot of clones and the speed of this is vital, go with Intel Xeon E5 that has up to 8 cores per CPU and supports dual CPU.

Andrew Milne April 29, 2013

Thanks for that.

Quick question: Why so much RAM? My understanding is Stash uses about a gig itself with 1.5 times the size of the repo for a clone. Even with the system overhead 32G seems a lot if youve got a maximum of 12 clones in operation at the same time.

Harry Chan
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 29, 2013

Up to. You don't need the full amount. Just quoting what it can support in the future. You can probably get by with a lot less.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events