Deployment triggers are sometimes fired after a long delay

Jensen Somers May 3, 2018

Since upgrading from Bamboo 5 to Bamboo 6, we have seen several issues with build and deployment triggers. Sometimes hey hang, or they don't fire at all. Our plans and configurations increased too, so I'm not sure what the actual cause.

I've listed my Bamboo system configuration below:

Operating system: Windows Server 2012 R2 6.3
Operating system architecture: amd64
Available processors: 8
Java version: 1.8.0_172
Total memory: 1008 MB (--> this seems off, the server has 16GB of RAM)

Version: 6.5.0
Build number: 60509
Build date: 4/20/18

git version: 2.17.0.windows.1

I cannot give exact numbers, but we have around 80 projects, each containing one to 5 build plans. Each build plan has at least one repository branch, but plan branches for at least half of them can go as high as 5 at the same time. Let's say 75% of all projects have a deployment plan too, each having 3 environments (DEV, QA, PROD). Every build plan is using git polling (default 180 seconds timeout) and each deployment plan for DEV and QA has a trigger so that it fires after a successful build of the corresponding plan branch.

Now, the last couple of months we have observed the following:

  • Sometimes a build hangs during a git checkout. If I kill all the git processes on the machine, the build detects this, retries and succeeds. I've seen as much as 15 parallel "Git for Windows" processes existing during such times.
  • Sometimes a deployment plan is not immediately triggered. We have seen it take as long as 2 hours before the trigger fires. The problem is that a developer might have triggered the deploy manually because Bamboo did not do it, and then suddenly, after business hours, the trigger goes of and a build is deployed again. Most of the times this is not a problem, but we have seen deployments fail, causing the application that is deployed to stop and leaving it in an unusable state for our customers. If we deploy during business hours and we see this happening, we can of course fix it, but if the deploy goes of in the middle of the night, we can't.

We are running a single Bamboo instance, with 3 local agents and 1 remote agent. The remote agent is not configured with build capabilities, and can thus only be used for some deployment plans. 

I searched the bug tracker, but I could not find any similar issue.

Is it possible we have too much plans and configurations for a single instance? Our the server resources insufficient? Is there additional information I could provide to help track down this issue?

1 answer

1 vote
Jeremy Owen
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
May 4, 2018

Hey Jensen,

From what you've described; the hanging / long running Git processes seem to be the most likely culprit here.

To explain, Bamboo has a couple of thread pools for actions like change detection, branch detection and general plan execution steps.

When we see huge delays in triggering, usually it's because these thread pools are tied up waiting for something. Long running / hung Git processes are the main offenders.

My initial pointers for now:

  • Since you've got 8 processors, you can add the below JVM argument to and restart Bamboo to increase the plan execution thread pool:
    • -Dbamboo.plan.exe.threads=8
    • Configuring your system properties
    • This is somewhat of a bandaid to increase the throughput, there's a chance they'll get tied up even with 8 too.
  • If you're using the Enable Quiet Period setting under Advanced Options of any of your repositories, consider disabling it or making it much less conservative. The quiet period will absorb one of these threads for the duration of the wait period configured which can be problematic.
  • Think about the size of the repository being cloned in Bamboo and how long it should take under worst case conditions. Then adjust the Command Timeout Advanced Option under each repository to be enough to cover that. Bamboo will then terminate Git processes for this repository that run longer than this value. The default setting is 3 hours which is very generous and can be reduced under most circumstances.
  • Take a look at the task manager and examine the command-line of the Git processes that are running for a long time. Is it expected for that command for that repository to be running that long? If it's an ls-remote command and taking longer than 10 seconds (this is pretty generous still) -- best to try and understand why that operation is taking so long:
    • Network congestion related?
    • Is Git Credential Manager for Windows installed? I've seen it hang non-interactive processes for GitHub repos if the password isn't being supplied from Bamboo because it opened an interactive Window prompting for credentials that never gets serviced in an automated environment.
  • Excess repository polling can cause it too but given your numbers, I wasn't initially concerned by this. 180 seconds is a very sensible default. 

It can be a bit tricky to troubleshoot and we may need to open up a support ticket to help out further but take a look into some of those things and let us know how it goes. :)

Jensen Somers May 4, 2018

Hello Jeremy,

Thanks for the quick reply.

  • I will try to increase the thread amount and see if that helps.
  • I am not using the quiet period on any plans. I'll check them to be sure, but I don't think I enabled them.
  • Most of the repositories are relative small (we have a lot of simple micro services which do a simple thing), there are a few big ones, but I don't think they do a full clone each time. I will check to make sure this isn't causing any issue.
  • I think the Git Credential Manager might be installed, but I have not seen a prompt when I logon to the VM. I'll need to check when I observe it hanging again.
Jensen Somers May 15, 2018

Hello Jeremy,

Unfortunately, none of the changes to my Bamboo installation seem to have had an impact. We still observe issues with with builds and deployments that are not triggered. I see no abnormal memory or CPU usage, but I do see a lot of "Git for Windows" processes active on the server. Both git.exe and git-https-remote.exe. Numbers can go as high as 50!

If I kill most of the processes, then sometimes Bamboo comes through and picks up a queued build, manual or automatic trigger doesn't make a difference. Sometimes I have to reboot the server before any new builds are picked up.

Is there any additional information I can provide?

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events