Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Bitbucket runner - An error occurred whilst updating runner state to "ONLINE"

meliha.huskovic October 15, 2024

Hi,

I’m experiencing an issue where the runner on my CI machine stops sending the online status during test execution and appears to be ‘frozen’. After a while, when I press enter a few times, I get the following exception:

[2024-10-15 14:26:53,887] An error occurred whilst updating runner state to "ONLINE". com.atlassian.pipelines.stargate.client.core.exceptions.StargateConflictException: Response Summary: HttpResponseSummary{httpStatusCode=409, httpStatusMessage=Conflict, bodyAsString={"key":"agent-service.runner.conflict","message":"Simultaneous state updates were attempted for runner with id: {5eca4cbe-c8c9-59d0-8d28-934bf3e8918b}","arguments":{}}} at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: Error has been observed at the following site(s): *__checkpoint ⇢ 409 from PUT https://api.atlassian.com/ex/bitbucket-pipelines/rest/internal/accounts/%7B371c7d36-842f-4518-a96b-3c2f87c1113b%7D/repositories/%7B1dd7083e-74ce-421b-821c-f3c8ffa96ea4%7D/runners/%7B5eca4cbe-c8c9-59d0-8d28-934bf3e8918b%7D/state [DefaultWebClient] Original Stack Trace: at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:125) at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220) at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:106) at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122) at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:101) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129) at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:299) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableConditionalSubscriber.onNext(FluxFilterFuseable.java:337) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1839) at reactor.core.publisher.MonoCollect$CollectSubscriber.onComplete(MonoCollect.java:160) at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:144) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:103) at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:103) at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:144) at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:415) at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:439) at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:493) at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:789) at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:114) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1349) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1389) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829)

 

I’m using the latest version of runner 3.1.0, but I also tried version 3.0.0 with no change. The firewall is off, and the machine doesn’t have any network connection issues. Could you help me with this problem?

2 answers

0 votes
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 16, 2024

Hi Meliha and welcome to the community!

I would like to ask for some additional info, so I can better help you:

1. What type of runner are you using? Linux Docker, Linux Shell, MacOS, or Windows runner?

2. Is the CI machine using a supported platform and does it meet the minimum requirements?

You can check the page below for supported platforms and right below that is a list of minimum requirements for each type of runner:

3. You mentioned that the issue occurs during test execution, so I assume this is while a build is running. Does the build appear stuck as well? If you check the build logs for the pipeline in Bitbucket's website, does the running command stop generating output?

4. Does the runner log show any additional exceptions before you press Enter and get the exception you posted here?

5. What happens if you don't press Enter? Do you get a different exception after a while?

6. Does this issue occur every time you run a Pipelines build?

7. Does it every occur while there is no build running?

Kind regards,
Theodora

meliha.huskovic October 16, 2024

Hi, thanks a lot!

  1. I’m using Windows runner (3.1.0).
  2. Yes, it meets the minimum requirements. The problem started 10 days ago, and before that, everything was running smoothly. I’ve been running the pipeline on the same machine for over a year.
  3. Yes, we have a build running followed by test execution, which takes about 1 hour and 20 minutes. The issue always occurs at the same time, approximately after 1 hour. In the build logs on Bitbucket’s website, the running command stops generating output when this problem with the runner happens. If I press enter a few times, the runner updates its status, and I can see the updated logs on Bitbucket’s website. I have a shorter pipeline that takes about 40 minutes, and it runs without any issues, so it seems the problem only occurs with longer builds.
  4. It doesn’t show any exceptions until I press enter a few times. It just stops receiving online status.
  5. If I don’t press anything, the window remains in the same state. After about 2 hours, the pipeline fails due to a timeout with the message “Step exceeded processing limits and has timed out.” The only way to get runner online again is to press enter or to do restart. 
  6. This happens every time I run the ‘longer build’ that takes approximately 1 hour and 20 minutes in my case.
  7. I haven’t noticed anything else.
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 18, 2024

Hi Meliha,

Thank you for the info!

Based on the information you have provided, I suggest the following next steps:

1. Try running the longer build outside Pipelines on the same Windows server, to narrow down if this is a Pipelines specific issue or not. You can do this the following way:

  • In PowerShell, clone the repository
  • Inside the clone, check out the branch that the build is running for
  • Do a git reset to the commit of the last failed build
  • Then, run the commands of the Pipelines build

Does the command that gets stuck in Pipelines also get stuck here after 1 hour?

2. When the build runs in Pipelines, can you check CPU and memory usage at the time that the build is stuck? Are they close to the system's limit?

3. Is the Windows machine perhaps configured to go to sleep after some time?

Kind regards,
Theodora

meliha.huskovic October 24, 2024

Hi! I wanted to let you know that the runner is working again! I upgraded from Windows 10 to Windows 11 and updated the runner once more. After that, it started functioning properly and hasn’t frozen since. Thanks for your assistance!

Like Theodora Boudale likes this
Theodora Boudale
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 25, 2024

That's good to hear, Meliha. Thank you for the update!

Please feel free to reach out if you ever need anything else!

0 votes
vikram
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 16, 2024

Hi @meliha.huskovic 

Welcome to Atlassian Community. 

In below url thread same type of question is answered, have have a look 

https://community.atlassian.com/t5/Bitbucket-questions/Self-hosted-runners-unable-to-update-status-to-ONLINE/qaq-p/2038793

Vikram P 

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
TAGS
AUG Leaders

Atlassian Community Events