Too many open files exception in Bamboo 3.1.4

My workplace is running a Bamboo 3.1.4 instance under CentOS 5. Recently, after over a year of fairly stable usage, Bamboo has started returning 500 errors and the logs are full of various "too many open files" exceptions. Here are the things I've tried so far:

Up the per-user file limits to 102400 for both hard and soft.

su bamboo -
ulimit -n

shows 102400, so it's in effect. `lsof -p <bamboo-jetty pid> | wc -l` shows between 900 and 1500 file descriptors open at any given time.

Up the system-wide open file limit to 300000. Checking the contents of /proc/sys/fs/file-max shows that's also taken effect.

I've also seen the Bamboo error message in the UI mentioning that too many files were open, which reccommends running a full reindex. I have done that as well, but we're still having the problem.

Anyone out there have any other ideas?

4 answers

Hi Patrick,

You can also check if the ulimit value for the database user.

Best regards,
Lucas Timm

It's the same - the hard and soft limits are not per-user, but server wide. In other words, we have

* soft nofile 102400
* hard nofile 102400

in our /etc/security/limits.conf file.

We ran into this exact issue in our system and spent a better part of 6 months going back and forth between Atlassian Support and the internet to try and figure out the problem before we found out the cause.

Our "Too Many Open Files" problem wasn't with bamboo, it was with Crucible/Fisheye specifically, but I am imagine due to what we found out the problem was, it's most likely revolves around the same issue.

Our problem was the Atlassian tool's interaction with Perforce. Perforce, at least for us, does not like having a lot of different users downloading repositories from it at the same time (4-5 simultaneous interactions). It causes perforce to ignore a few of those requests from the Atlassian tools that request access to it, which causes the Atlassian tools to error out.

For us the problem was this. We had set up fisheye with several "repositories" for different areas the same perforce server. We probably had 6-7 different scanning setups in Fisheye for the different areas in Perforce. Each one scanned every 2-3 minutes to see if there were changes in their area. We encountered the "Too Many Open Files" problem when several of those repositories re-scanned at the same time. Our only solutions were to restart Fisheye. Once we found out that out, we changed Fisheye to have a single scanning repository for our entire perforce sever, and the problem hasn't shown up in several months. (And it actually sped things up for us)

But since you are messing with Bamboo, I would imagine (especially if you are using Perforce) that you have several plans trying to access the same perforce server at the same time.

We use Bamboo as well, and during our initial setup of Bamboo, which I helped do for our company, we found out that if we needed to build 10 or configurations of a single product, each one done in it's own job that executed simultaneously, then half the jobs would fail because of perforce connection issues, and we'd have to re-run it. What we've done to fix that is to have a "pre-build" stage that pull the repository, zips it up, and then shares it to the other jobs that execute at the same time in the next stage.

This, though, is only pheasable if your repository is less than a few hundred megs.... If you have a repository you are building that is several gigs, then sharing an artifact that large can definitely slow things down.

If you are in fact building a super large repository (several gigs) then another thing you can do, which would only mitigate the problem, not solve it) would for your simultaneous jobs, have them lag at different intervals in an inline script task. Simply just sleep for 20-30 seconds or w/e to prevent pulling from perforce at the same time...

... OR ...

You can simply do your builds in different stages so they don't execute at the same time.

I know this was a wall of information, but hopefully it's helpful

After I was done typing my answer, I thought of something that might be worth trying as well...

The "Too Many Open Files" problem at it's base simply means that there are too many open file descriptors. You are looking at it correctly when you've tried the "ulimit -n" option, but what might have been the problem for us all along, was not neccessaily the Atlassian tools running out of file descriptors, but in fact the Perforce server running out of file rescriptors and reporting that back to the Atlassian tool...

If you feel that based off of my explanation that you think it could be your repository server, try doing the "ulimit -n" for your repository (Perforce, SVN, etc), not neccessarily the Atlassian tool

Yes, we use Perforce and we have way more than 10 or so plans accessing Perforce, possibly simultaneously. What's strange is that this hasn't been a problem until now - we've actually been running Bamboo in this setup for a couple of years, and it hasn't been touched since we upgraded to Bamboo 3.2.

All of the acutal building is on one of 3 remote agents, so that's the maximum number of simultaneous source checkouts, but certainly there could be many more connections from the Bamboo server itself as it validates client info and checks for changes.

Thanks for the info - I'll investigate that a bit more and see if I can narrow it down to Perforce.

They're on two different servers, one of which (the Perforce proxy that the Bamboo server is connecting to) is on a Windows machine... so no ulimit and no sharing of file descriptors. The exception is happening on the Bamboo server, which in turn does no builds - it delegates all builds to remote build agents (which are also Windows machines). The only thing on our CentOS server is the Bamboo server itself.

0 votes

Could you open a support case and attach the list from lsof there? Or just paste the output here?

Here it is. Domain names for computers have been sanitized by request of others here. Nothing has been removed.

Please set up a job on a local agent with an Inline Script Task set to execute ulimit -a. Does it output the increased values you've mentioned?

Yes, it shows the increased values. Paste from the Bamboo logs:

08-Oct-2012 08:58:52	core file size          (blocks, -c) 0
08-Oct-2012 08:58:52	data seg size           (kbytes, -d) unlimited
08-Oct-2012 08:58:52	scheduling priority             (-e) 0
08-Oct-2012 08:58:52	file size               (blocks, -f) unlimited
08-Oct-2012 08:58:52	pending signals                 (-i) 72704
08-Oct-2012 08:58:52	max locked memory       (kbytes, -l) 32
08-Oct-2012 08:58:52	max memory size         (kbytes, -m) unlimited
08-Oct-2012 08:58:52	open files                      (-n) 102400
08-Oct-2012 08:58:52	pipe size            (512 bytes, -p) 8
08-Oct-2012 08:58:52	POSIX message queues     (bytes, -q) 819200
08-Oct-2012 08:58:52	real-time priority              (-r) 0
08-Oct-2012 08:58:52	stack size              (kbytes, -s) 10240
08-Oct-2012 08:58:52	cpu time               (seconds, -t) unlimited
08-Oct-2012 08:58:52	max user processes              (-u) 72704
08-Oct-2012 08:58:52	virtual memory          (kbytes, -v) unlimited
08-Oct-2012 08:58:52	file locks                      (-x) unlimited
08-Oct-2012 08:58:53	Finished task 'Ulimit'

Can you upload a log file from Bamboo showing the HTTP 500?

Curious what came of this since we are seeing the same issue, but writing to bamboo logs.

[lbfrg@vxpit-elforg02 ~]$ /usr/sbin/lsof -p 32583 | wc -l

617 /opt/lforge/atlassian_data/bamboo_home/xml-data/build-dir/logs/TEST-MIDWAY-CHEC/download-data/build_logs/TEST-MIDWAY-CHEC-1534.log (Too many open files)

[lbfrg@vxpit-elforg02 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 46669
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 80000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 46669
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Just to add some new info, `lsof | wc -l` as root only shows ~4000 open file descriptors period at any given time.

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Tuesday in Statuspage

Introducing Statuspage Getting Started guides! First up: What is Statuspage?

Over the next several weeks we'll be sharing some of our Getting Started guides here in the community. Throughout this series of posts, we'd love to hear from customers and non-customers ab...

175 views 4 1
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you