Runner is stuck on a step "runner in a non retryable state" error

Hello.

I have a new Issue this morning. My pipelines are not running on the web client. when I look in the runner the last error I have is the following.



[2021-04-26 15:35:38,650] Runner is in a non retryable state. (state: CompletedStepStateModel{result=SuccessfulResultForCompletedStepStateModel{}})
[2021-04-26 15:35:38,655] Step was already complete skipping.
[2021-04-26 15:35:38,711] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.step.execution-error', message='null', arguments={}})}.
[2021-04-26 15:35:39,020] Translating HTTP 400 BAD_REQUEST response into a com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException
[2021-04-26 15:35:39,097] An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=PASSED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ 400 from POST https://api.atlassian.com/ex/bitbucket-pipelines/rest/1.0/accounts/xxxxxxxxx/stepComplete [DefaultWebClient]
Stack trace:
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:119)
at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114).

Now even if I stop the web client pipeline and then I restart the docker container it goes back to this error.

I also destroyed the container I started a new one. I this issue persists

7 comments

Comment

Nothing gets me out of this.

restarting the container gets back to this issue.
removing the container and creating a new one gets me back to this issue

This is the information of the step that is stuck.

ex.


Getting step StepId{accountUuid={b14cf156-b835-40a8-9fd4-eba8785bb22c}, repositoryUuid={dfc78678-e1b2-4eaf-835c-9adb5687c21b}, pipelineUuid={8d3fca5a-b8e2-447e-b085-25d32996db6f}, stepUuid={d2e3a735-68e4-4dcd-8e1a-24b23311e328}

Can anyone from the Atlassian team please help with the above? We have not been able to use the runner in the last 2 days

cc: @Justin Thomas

Hi Jorge/Mohamed

Apologies for the delay, we have identified a bug that was causing this behaviour and deployed a fix to our backend services.

However the small number of runners that were affected by this are now in an invalid state.

Can you delete and recreate those runners?

That will fix these now corrupt runners.

Kind Regards,
Nathan Burrell

Like • Jorge Cotelo likes this

Hi @lassian

Thank you. I will do that now and let you know.

Hi @lassian

I have stopped and remove the container, deleted the image and pulled the new one and as soon as I start it then it goes back to the same step and get stuck there:

[2021-04-28 13:57:54,883] Getting step StepId{accountUuid={b14cf156-b835-40a8-9fd4-eba8785bb22c}, repositoryUuid={dfc78678-e1b2-4eaf-835c-9adb5687c21b}, pipelineUuid={8d3fca5a-b8e2-447e-b085-25d32996db6f}, stepUuid={d2e3a735-68e4-4dcd-8e1a-24b23311e328}}.
[2021-04-28 13:57:54,899] Getting oauth token for step.
[2021-04-28 13:57:54,914] Getting environment variables for step.
[2021-04-28 13:57:55,628] Getting all artifacts for step.
[2021-04-28 13:57:55,994] Runner is in a non retryable state. (state: CompletedStepStateModel{result=SuccessfulResultForCompletedStepStateModel{}})
[2021-04-28 13:57:56,000] Step was already complete skipping.
[2021-04-28 13:57:56,021] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.step.execution-error', message='null', arguments={}})}.
[2021-04-28 13:57:56,697] Translating HTTP 400 BAD_REQUEST response into a com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException
[2021-04-28 13:57:56,722] An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=PASSED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ 400 from POST https://api.atlassian.com/ex/bitbucket-pipelines/rest/1.0/accounts/%7Bb14cf156-b835-40a8-9fd4-eba8785bb22c%7D/repositories/%7Bdfc78678-e1b2-4eaf-835c-9adb5687c21b%7D/pipelines/%7B8d3fca5a-b8e2-447e-b085-25d32996db6f%7D/steps/%7Bd2e3a735-68e4-4dcd-8e1a-24b23311e328%7D/stepComplete [DefaultWebClient]
Stack trace:
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:119)
at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:92)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121)
at reactor.core.publisher.FluxContextStart$ContextStartSubscriber.onNext(FluxContextStart.java:96)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:287)
at reactor.core.publisher.FluxFilterFuseable$FilterFuseableConditionalSubscriber.onNext(FluxFilterFuseable.java:330)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
at reactor.core.publisher.MonoCollect$CollectSubscriber.onComplete(MonoCollect.java:152)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:252)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:374)
at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:373)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:429)
at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:655)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:96)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1526)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1275)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1322)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
[2021-04-28 13:57:56,728] Setting runner state to not executing step.
[2021-04-28 13:57:56,729] Waiting for next step.
[2021-04-28 13:57:56,803] Setting runner state to executing step.
[2021-04-28 13:58:22,524] Updating runner state to "ONLINE".

As you can see in the logs it pick up that step and then it shows is back online but is not. when we start a new pipeline from the web client it does not work.

The only thing I havent try is to restart the EC2 instance the runner lives in but I guess that job is in a queue and it will keep coming back until we do some manual work?

regards

Hello @lassian , @Justin Thomas

Any update on this?

We have tried everything from destroying the container to destroying the EC2 instance and This step is still stuck there.

The only thing I haven't tried is to delete the runner from bitbucket and creating a new one.

please advise on this issue

Thanks

@Jorge Cotelo Sorry for the late reply. The state of the runner is broken on the Bitbucket side, you will have to delete the existing runner and create a new one.

@Justin Thomas Thank you for the reply. I deleted the runner and created the new one and everything is back.

There is still the Issue mentioned here:
https://community.atlassian.com/t5/Bitbucket-Pipelines-Runners/Runner-stop-responding-when-idle-for-too-long/qaq-p/1662478

Any update on that?

Regards

@Jorge Cotelo The hung bug was a tough one, it's fixed now. Sorry for the trouble.

Unfortunately bug is still around.

An error occurred whilst completing step.

com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=STOPPED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}

Like • aurelien.bettini likes this

Also got the same problem and I had to delete and recreate the runner in bitbucket settings.

Like • Mahmut Ali Ozkuran likes this

Yes nothing solves except recreating the runner.

Was this helpful?

Thanks!

Bitbucket Pipelines: Runners

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Runner is stuck on a step "runner in a non retryable state" error

7 comments

Comment

Was this helpful?

Thanks!

TAGS

Atlassian Community Events

Ask a question

Start a discussion

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Runner is stuck on a step "runner in a non retryable state" error

7 comments

Comment

Was this helpful?

Thanks!

TAGS

Atlassian Community Events