Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Too many open files exception in Bamboo 3.1.4

Patrick Aikens October 4, 2012

My workplace is running a Bamboo 3.1.4 instance under CentOS 5. Recently, after over a year of fairly stable usage, Bamboo has started returning 500 errors and the logs are full of various "too many open files" exceptions. Here are the things I've tried so far:

Up the per-user file limits to 102400 for both hard and soft.

su bamboo -
ulimit -n

shows 102400, so it's in effect. `lsof -p <bamboo-jetty pid> | wc -l` shows between 900 and 1500 file descriptors open at any given time.

Up the system-wide open file limit to 300000. Checking the contents of /proc/sys/fs/file-max shows that's also taken effect.

I've also seen the Bamboo error message in the UI mentioning that too many files were open, which reccommends running a full reindex. I have done that as well, but we're still having the problem.

Anyone out there have any other ideas?

4 answers

0 votes
Patrick Aikens October 4, 2012

Just to add some new info, `lsof | wc -l` as root only shows ~4000 open file descriptors period at any given time.

0 votes
Przemek Bruski
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 4, 2012

Could you open a support case and attach the list from lsof there? Or just paste the output here?

Patrick Aikens October 4, 2012

Here it is. Domain names for computers have been sanitized by request of others here. Nothing has been removed.

https://answers.atlassian.com/upfiles/lsof.txt

Przemek Bruski
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 6, 2012

Please set up a job on a local agent with an Inline Script Task set to execute ulimit -a. Does it output the increased values you've mentioned?

Patrick Aikens October 7, 2012

Yes, it shows the increased values. Paste from the Bamboo logs:

08-Oct-2012 08:58:52	core file size          (blocks, -c) 0
08-Oct-2012 08:58:52	data seg size           (kbytes, -d) unlimited
08-Oct-2012 08:58:52	scheduling priority             (-e) 0
08-Oct-2012 08:58:52	file size               (blocks, -f) unlimited
08-Oct-2012 08:58:52	pending signals                 (-i) 72704
08-Oct-2012 08:58:52	max locked memory       (kbytes, -l) 32
08-Oct-2012 08:58:52	max memory size         (kbytes, -m) unlimited
08-Oct-2012 08:58:52	open files                      (-n) 102400
08-Oct-2012 08:58:52	pipe size            (512 bytes, -p) 8
08-Oct-2012 08:58:52	POSIX message queues     (bytes, -q) 819200
08-Oct-2012 08:58:52	real-time priority              (-r) 0
08-Oct-2012 08:58:52	stack size              (kbytes, -s) 10240
08-Oct-2012 08:58:52	cpu time               (seconds, -t) unlimited
08-Oct-2012 08:58:52	max user processes              (-u) 72704
08-Oct-2012 08:58:52	virtual memory          (kbytes, -v) unlimited
08-Oct-2012 08:58:52	file locks                      (-x) unlimited
08-Oct-2012 08:58:53	Finished task 'Ulimit'

Przemek Bruski
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 7, 2012

Can you upload a log file from Bamboo showing the HTTP 500?

EddieW
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 12, 2013

Curious what came of this since we are seeing the same issue, but writing to bamboo logs.

[lbfrg@vxpit-elforg02 ~]$ /usr/sbin/lsof -p 32583 | wc -l

617

java.io.FileNotFoundException: /opt/lforge/atlassian_data/bamboo_home/xml-data/build-dir/logs/TEST-MIDWAY-CHEC/download-data/build_logs/TEST-MIDWAY-CHEC-1534.log (Too many open files)

[lbfrg@vxpit-elforg02 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 46669
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 80000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 46669
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

0 votes
Cameron Ferguson
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 4, 2012

We ran into this exact issue in our system and spent a better part of 6 months going back and forth between Atlassian Support and the internet to try and figure out the problem before we found out the cause.

Our "Too Many Open Files" problem wasn't with bamboo, it was with Crucible/Fisheye specifically, but I am imagine due to what we found out the problem was, it's most likely revolves around the same issue.

Our problem was the Atlassian tool's interaction with Perforce. Perforce, at least for us, does not like having a lot of different users downloading repositories from it at the same time (4-5 simultaneous interactions). It causes perforce to ignore a few of those requests from the Atlassian tools that request access to it, which causes the Atlassian tools to error out.

For us the problem was this. We had set up fisheye with several "repositories" for different areas the same perforce server. We probably had 6-7 different scanning setups in Fisheye for the different areas in Perforce. Each one scanned every 2-3 minutes to see if there were changes in their area. We encountered the "Too Many Open Files" problem when several of those repositories re-scanned at the same time. Our only solutions were to restart Fisheye. Once we found out that out, we changed Fisheye to have a single scanning repository for our entire perforce sever, and the problem hasn't shown up in several months. (And it actually sped things up for us)

But since you are messing with Bamboo, I would imagine (especially if you are using Perforce) that you have several plans trying to access the same perforce server at the same time.

We use Bamboo as well, and during our initial setup of Bamboo, which I helped do for our company, we found out that if we needed to build 10 or configurations of a single product, each one done in it's own job that executed simultaneously, then half the jobs would fail because of perforce connection issues, and we'd have to re-run it. What we've done to fix that is to have a "pre-build" stage that pull the repository, zips it up, and then shares it to the other jobs that execute at the same time in the next stage.

This, though, is only pheasable if your repository is less than a few hundred megs.... If you have a repository you are building that is several gigs, then sharing an artifact that large can definitely slow things down.

If you are in fact building a super large repository (several gigs) then another thing you can do, which would only mitigate the problem, not solve it) would for your simultaneous jobs, have them lag at different intervals in an inline script task. Simply just sleep for 20-30 seconds or w/e to prevent pulling from perforce at the same time...

... OR ...

You can simply do your builds in different stages so they don't execute at the same time.

I know this was a wall of information, but hopefully it's helpful

Cameron Ferguson
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 4, 2012

After I was done typing my answer, I thought of something that might be worth trying as well...

The "Too Many Open Files" problem at it's base simply means that there are too many open file descriptors. You are looking at it correctly when you've tried the "ulimit -n" option, but what might have been the problem for us all along, was not neccessaily the Atlassian tools running out of file descriptors, but in fact the Perforce server running out of file rescriptors and reporting that back to the Atlassian tool...

If you feel that based off of my explanation that you think it could be your repository server, try doing the "ulimit -n" for your repository (Perforce, SVN, etc), not neccessarily the Atlassian tool

Patrick Aikens October 4, 2012

Yes, we use Perforce and we have way more than 10 or so plans accessing Perforce, possibly simultaneously. What's strange is that this hasn't been a problem until now - we've actually been running Bamboo in this setup for a couple of years, and it hasn't been touched since we upgraded to Bamboo 3.2.

All of the acutal building is on one of 3 remote agents, so that's the maximum number of simultaneous source checkouts, but certainly there could be many more connections from the Bamboo server itself as it validates client info and checks for changes.

Thanks for the info - I'll investigate that a bit more and see if I can narrow it down to Perforce.

Patrick Aikens October 4, 2012

They're on two different servers, one of which (the Perforce proxy that the Bamboo server is connecting to) is on a Windows machine... so no ulimit and no sharing of file descriptors. The exception is happening on the Bamboo server, which in turn does no builds - it delegates all builds to remote build agents (which are also Windows machines). The only thing on our CentOS server is the Bamboo server itself.

0 votes
LucasA
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 4, 2012

Hi Patrick,

You can also check if the ulimit value for the database user.

Best regards,
Lucas Timm

Patrick Aikens October 4, 2012

It's the same - the hard and soft limits are not per-user, but server wide. In other words, we have

* soft nofile 102400
* hard nofile 102400

in our /etc/security/limits.conf file.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events