Hello,
We are currently experiencing an issue where our self-hosted Bitbucket runners are no longer autoscaling correctly when pipeline steps run in parallel.
Our setup consists of multiple runners on a physical machine, and we are in the process of migrating to EKS-based runners on AWS. The autoscaler controller version remains unchanged:
bitbucketpipelines/runners-autoscaler:3.9.0This setup was working correctly a few weeks ago. However, during the migration of some pipelines from the physical server, we noticed that parallel steps remain queued and only one step runs at a time.
After observing this behavior, we switched back to the (previously disabled) runners on the physical server. To our surprise, we found that the runner version had been automatically updated to v5, and the same parallelism issue appeared there as well.
Given that:
the autoscaler controller version has not changed
no new autoscaler releases have been published
the issue started after the runners were upgraded to v5
we suspect either a regression or behavior change introduced in runner v5, or that an additional configuration change is now required.
Has anyone encountered a similar issue or knows if there are additional settings required for runner v5 to support parallel execution?
Thanks in advance for your help.
Hello there, thanks @Syahrul for your reply. Today I found the main issue.
That option of Change limit for the concurrency limit wasn't in the UI last day I checked. This by default has now been set or reset to 2 parallel runs which was the bottleneck I was perceiving as an issue of the new runners.
In case this happens to somebody else check workspace setttings --> runners -->concurrency limit
Welcome to the community.
I believe this is related to our recent change in Runner v5, which is currently mentioned and discussed in the following community article.
Announcing powerful upgrades & a new pricing model for self-hosted runners
You can consider increasing the slot or temporarily downgrading the runner to version 4.
I hope this helps.
Regards,
Syahrul
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey @Syahrul how can I downgrade the K8s runners? I don't see the option in the workspace settings, neither the yaml files of the autoscaler deployments.
I thought the problem was solved by increasing the limit of the concurrent limits, nevertheless another strange thing is happening. The behavior is the following:
4 steps in parallel from the beginning, with a simple wait command.
1 runner is up
Start the pipeline:
1 step starts instantly, no problem there, 3 get in queue, no other steps are running anywhere else.
1 new runner spawns but a second step does not start.
Once the first step finishes.
Step 2 a 3 start using the 2 runners I had up (one being used by step 1, another not used)
Step 4 remain in queued and a 3rd runner spawns but it is not taken by this step 4 as expected.
This is the current misbehavior. The pipelines seem to look for available runner either when the pipeline starts or when a step finishes, which lead to some runners scaling up but not being used at all until one step is finished.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.