Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,297,234
Community Members
 
Community Events
165
Community Groups

Runner is stuck on a step "runner in a non retryable state" error

Hello.

I have a new Issue this morning. My pipelines are not running on the web client. when I look in the runner the last error I have is the following.

 



[2021-04-26 15:35:38,650] Runner is in a non retryable state. (state: CompletedStepStateModel{result=SuccessfulResultForCompletedStepStateModel{}})
[2021-04-26 15:35:38,655] Step was already complete skipping.
[2021-04-26 15:35:38,711] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.step.execution-error', message='null', arguments={}})}.
[2021-04-26 15:35:39,020] Translating HTTP 400 BAD_REQUEST response into a com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException
[2021-04-26 15:35:39,097] An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=PASSED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ 400 from POST https://api.atlassian.com/ex/bitbucket-pipelines/rest/1.0/accounts/xxxxxxxxx/stepComplete [DefaultWebClient]
Stack trace:
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:119)
at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114).

 

Now even if I stop the web client pipeline and then I restart the docker container it goes back to this error. 

I also destroyed the container I started a new one. I this issue persists

5 comments

Nothing gets me out of this.

restarting the container gets back to this issue.
removing the container and creating a new one gets me back to this issue

This is the information of the step that is stuck.

ex.


Getting step StepId{accountUuid={b14cf156-b835-40a8-9fd4-eba8785bb22c}, repositoryUuid={dfc78678-e1b2-4eaf-835c-9adb5687c21b}, pipelineUuid={8d3fca5a-b8e2-447e-b085-25d32996db6f}, stepUuid={d2e3a735-68e4-4dcd-8e1a-24b23311e328}

Can anyone from the Atlassian team please help with the above? We have not been able to use the runner in the last 2 days

cc: @Justin Thomas 

lassian Atlassian Team Apr 27, 2021

Hi Jorge/Mohamed

Apologies for the delay, we have identified a bug that was causing this behaviour and deployed a fix to our backend services.

However the small number of runners that were affected by this are now in an invalid state.

Can you delete and recreate those runners?

That will fix these now corrupt runners.

Kind Regards,
Nathan Burrell

Like Jorge Cotelo likes this

Hi @lassian 

 

Thank you. I will do that now and let you know.

Hi @lassian 

I have stopped and remove the container, deleted the image and pulled the new one and as soon as I start it then it goes back to the same step and get stuck there:

[2021-04-28 13:57:54,883] Getting step StepId{accountUuid={b14cf156-b835-40a8-9fd4-eba8785bb22c}, repositoryUuid={dfc78678-e1b2-4eaf-835c-9adb5687c21b}, pipelineUuid={8d3fca5a-b8e2-447e-b085-25d32996db6f}, stepUuid={d2e3a735-68e4-4dcd-8e1a-24b23311e328}}.
[2021-04-28 13:57:54,899] Getting oauth token for step.
[2021-04-28 13:57:54,914] Getting environment variables for step.
[2021-04-28 13:57:55,628] Getting all artifacts for step.
[2021-04-28 13:57:55,994] Runner is in a non retryable state. (state: CompletedStepStateModel{result=SuccessfulResultForCompletedStepStateModel{}})
[2021-04-28 13:57:56,000] Step was already complete skipping.
[2021-04-28 13:57:56,021] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.step.execution-error', message='null', arguments={}})}.
[2021-04-28 13:57:56,697] Translating HTTP 400 BAD_REQUEST response into a com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException
[2021-04-28 13:57:56,722] An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=PASSED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ 400 from POST https://api.atlassian.com/ex/bitbucket-pipelines/rest/1.0/accounts/%7Bb14cf156-b835-40a8-9fd4-eba8785bb22c%7D/repositories/%7Bdfc78678-e1b2-4eaf-835c-9adb5687c21b%7D/pipelines/%7B8d3fca5a-b8e2-447e-b085-25d32996db6f%7D/steps/%7Bd2e3a735-68e4-4dcd-8e1a-24b23311e328%7D/stepComplete [DefaultWebClient]
Stack trace:
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:119)
at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:92)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121)
at reactor.core.publisher.FluxContextStart$ContextStartSubscriber.onNext(FluxContextStart.java:96)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:287)
at reactor.core.publisher.FluxFilterFuseable$FilterFuseableConditionalSubscriber.onNext(FluxFilterFuseable.java:330)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
at reactor.core.publisher.MonoCollect$CollectSubscriber.onComplete(MonoCollect.java:152)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:252)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:374)
at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:373)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:429)
at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:655)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:96)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1526)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1275)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1322)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
[2021-04-28 13:57:56,728] Setting runner state to not executing step.
[2021-04-28 13:57:56,729] Waiting for next step.
[2021-04-28 13:57:56,803] Setting runner state to executing step.
[2021-04-28 13:58:22,524] Updating runner state to "ONLINE".

 As you can see in the logs it pick up that step and then it shows is back online but is not. when we start a new pipeline from the web client it does not work.

The only thing I havent try is to restart the EC2 instance the runner lives in but I guess that job is in a queue and it will keep coming back until we do some manual work?

regards

Hello @lassian , @Justin Thomas 

Any update on this?

We have tried everything from destroying the container to destroying the EC2 instance and This step is still stuck there.

The only thing I haven't tried is to delete the runner from bitbucket and creating a new one.

please advise on this issue

Thanks

@Jorge Cotelo Sorry for the late reply. The state of the runner is broken on the Bitbucket side, you will have to delete the existing runner and create a new one.

@Justin Thomas  Thank you for the reply. I deleted the runner and created the new one and everything is back.

There is still the Issue mentioned here: 
https://community.atlassian.com/t5/Bitbucket-Pipelines-Runners/Runner-stop-responding-when-idle-for-too-long/qaq-p/1662478

Any update on that?

 

Regards

@Jorge Cotelo The hung bug was a tough one, it's fixed now. Sorry for the trouble.

Comment

Log in or Sign up to comment
TAGS

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you