My team has a self-hosted Bitbucket and Bamboo instance with the latest version (6.11 i guess and 6.9 it is) running connected to a LDAP cluster setup. As far as i know the LDAP connection is based on a Load Balancer just so you know (Atlassian notes that this is not supported). But we do not seek the issue there yet.
The problem applies to both applications but i will describe it based upon the Bitbucket server.
What happened?
After two years our LDAP grew larger and surpassed 600 users this year when suddenly few users complained that their SSH login was forfeited or could not login thrugh the Web UI:
git@xxx.xxx: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists.
and
2019-06-03 09:20:22,483 INFO [http-nio-7990-exec-41] @SAYPYHx560x5515238x0 x.x.x.x,127.0.0.1 "GET /login HTTP/1.1" c.a.s.i.a.DefaultRememberMeService Expired remember-me token detected for series '306ac0cd6e26eb759b833ad0e265ed74f4ddbd82' for user 'xxx' (used from 'x.x.x.x,127.0.0.1'). As a safety precaution, all (2) tokens from that series have been canceled.
We could sort out the first problem as the keys were properly in place and would work again after a login in the Web UI had been made. However, after some time the user would be logged out again (remember-me dropped) and git access would be denied.
I crawled the logs but found no errors or warnings regarding this issue. What i noted was that even though we had +600 users in our LDAP but Bitbucket would only find 500 and sync those.
Further more when looking at the database while one of your unlucky users signed in i would discover that the user was newly created and then dropped after 5 minutes and put onto the Graveyard (Tombstone i guess is the correct work). Based on our logs this happens since May 2019.
And this is very confusing. We checked the LDAP setup and Atlassian configuration but there were no limits (neither lookup nor caching) which would explain that issue. There were updates and server restarts (the log message is quite old as you can see) but nothing seems to fix that permanently and i found no clue in the www.
Edit: I forgot to mention that whil Bitbucket requires 1sec for the LDAP sync Bamboo takes 23sec.
Hi @Kevin Katze,
Thank you for all the details you shared about this. They help us to better understand the issue you are facing there.
From what I can get, It looks like your LDAP is restricting its responses to 500 users max. Bamboo and Bitbucket are configured by default to request all users at once and usually, LDAP servers limit this to 1000 (which does not seem to be your case).
When we are facing such limitation we often recommend enabling the Use Paged Results feature. It will allow Bamboo and Bitbucket to request for more objects until they are all covered. It is a good practice to set the max Bamboo and Bitbucket request to a number below the limit to make sure we will not face any issues with this anymore. In your case, you could set it to 400.
Please give it a try if you didn't configure this before and let us know the results you get.
Hi Daniel,
thank you for your response and sorry for the late answer. My team will try that as solution and see what we get.
Will try it first on Bamboo then on Bitbucket.
Update 1:
As of now the sync is running for >1000sec and i cannot seem to stop it. There is nothing in the logs yet and none of the missing users seem to have been imported yet.
Update 2:
Bamboo sync is now at >1700sec is see the CPU spiking to >80% (8 cores) while all agents are idling. I don't want to relate it but i can see in our monitoring that the sync and the spike happened at the same time.
Bitbucket was already paged but at 2000 results instead of 400.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Awesome! Keep us posted!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I don't know what to say but in short: I did not work.
Our bamboo test took 4400sec for the sync but the result was the same as before (still mising users). It would spike at 98% CPU usage when syncing pumping the RAM to 20GB. We could also see a spike in TCP connections at around 3200.
The bitbucket had to be restarted because of either Tomcat giving up or SQL exceptions. So syncing did not work at all.
Also noticed that the configuration that we templated was not in effect on our test environment so i'm going to a clean setup today and see if it would work on a fresh installation. For now i also doubt that the read-only LDAP is properly set up.
Update: I scaled the page request from 400 back to 2000 on both systems. This seems to heal at least the performance impact i described earlier.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @Kevin Katze
You shared that you had about 600 users and the results you got after configuring the page results are really unexpected. I'm sorry that my suggestion got you in trouble.
Considering the instability you had I would try to investigate which part of this is caused by the interaction with the LDAP server before attempting more adjustments in Bamboo and Bitbucket.
You could use the ldapsearch command to check how it will behave using page results with 400 objects max.
The command would be something like:
ldapsearch -h <SERVER_ADDRESS> -p 389 \
-b "CN=users,DC=testserver,DC=local" \
-D "administrator@testserver.local" \
-E pr=400/noprompt \
-W "(&(objectCategory=Person)(sAMAccountName=*))" > full_ldif.txt
This is just to help us understand what could be causing the instability you faced when updating the configuration.
Let me know how it will behave.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.