Hello.
I have a new Issue this morning. My pipelines are not running on the web client. when I look in the runner the last error I have is the following.
[2021-04-26 15:35:38,650] Runner is in a non retryable state. (state: CompletedStepStateModel{result=SuccessfulResultForCompletedStepStateModel{}})
[2021-04-26 15:35:38,655] Step was already complete skipping.
[2021-04-26 15:35:38,711] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.step.execution-error', message='null', arguments={}})}.
[2021-04-26 15:35:39,020] Translating HTTP 400 BAD_REQUEST response into a com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException
[2021-04-26 15:35:39,097] An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=PASSED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ 400 from POST https://api.atlassian.com/ex/bitbucket-pipelines/rest/1.0/accounts/xxxxxxxxx/stepComplete [DefaultWebClient]
Stack trace:
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:119)
at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114).
Now even if I stop the web client pipeline and then I restart the docker container it goes back to this error.
I also destroyed the container I started a new one. I this issue persists
Can anyone from the Atlassian team please help with the above? We have not been able to use the runner in the last 2 days
cc: @Justin Thomas
Hi Jorge/Mohamed
Apologies for the delay, we have identified a bug that was causing this behaviour and deployed a fix to our backend services.
However the small number of runners that were affected by this are now in an invalid state.
Can you delete and recreate those runners?
That will fix these now corrupt runners.
Kind Regards,
Nathan Burrell
Hi @lassian
I have stopped and remove the container, deleted the image and pulled the new one and as soon as I start it then it goes back to the same step and get stuck there:
[2021-04-28 13:57:54,883] Getting step StepId{accountUuid={b14cf156-b835-40a8-9fd4-eba8785bb22c}, repositoryUuid={dfc78678-e1b2-4eaf-835c-9adb5687c21b}, pipelineUuid={8d3fca5a-b8e2-447e-b085-25d32996db6f}, stepUuid={d2e3a735-68e4-4dcd-8e1a-24b23311e328}}.
[2021-04-28 13:57:54,899] Getting oauth token for step.
[2021-04-28 13:57:54,914] Getting environment variables for step.
[2021-04-28 13:57:55,628] Getting all artifacts for step.
[2021-04-28 13:57:55,994] Runner is in a non retryable state. (state: CompletedStepStateModel{result=SuccessfulResultForCompletedStepStateModel{}})
[2021-04-28 13:57:56,000] Step was already complete skipping.
[2021-04-28 13:57:56,021] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.step.execution-error', message='null', arguments={}})}.
[2021-04-28 13:57:56,697] Translating HTTP 400 BAD_REQUEST response into a com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException
[2021-04-28 13:57:56,722] An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=PASSED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ 400 from POST https://api.atlassian.com/ex/bitbucket-pipelines/rest/1.0/accounts/%7Bb14cf156-b835-40a8-9fd4-eba8785bb22c%7D/repositories/%7Bdfc78678-e1b2-4eaf-835c-9adb5687c21b%7D/pipelines/%7B8d3fca5a-b8e2-447e-b085-25d32996db6f%7D/steps/%7Bd2e3a735-68e4-4dcd-8e1a-24b23311e328%7D/stepComplete [DefaultWebClient]
Stack trace:
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.atlassian.bitbucketci.client.reactive.ResponseExceptionFactory$ConstructorInvoker.invokeConstructor(ResponseExceptionFactory.java:119)
at io.vavr.CheckedFunction1.lambda$unchecked$43b513dd$1(CheckedFunction1.java:220)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:92)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121)
at reactor.core.publisher.FluxContextStart$ContextStartSubscriber.onNext(FluxContextStart.java:96)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:287)
at reactor.core.publisher.FluxFilterFuseable$FilterFuseableConditionalSubscriber.onNext(FluxFilterFuseable.java:330)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
at reactor.core.publisher.MonoCollect$CollectSubscriber.onComplete(MonoCollect.java:152)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:252)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:374)
at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:373)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:429)
at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:655)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:96)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1526)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1275)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1322)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
[2021-04-28 13:57:56,728] Setting runner state to not executing step.
[2021-04-28 13:57:56,729] Waiting for next step.
[2021-04-28 13:57:56,803] Setting runner state to executing step.
[2021-04-28 13:58:22,524] Updating runner state to "ONLINE".
As you can see in the logs it pick up that step and then it shows is back online but is not. when we start a new pipeline from the web client it does not work.
The only thing I havent try is to restart the EC2 instance the runner lives in but I guess that job is in a queue and it will keep coming back until we do some manual work?
regards
Hello @lassian , @Justin Thomas
Any update on this?
We have tried everything from destroying the container to destroying the EC2 instance and This step is still stuck there.
The only thing I haven't tried is to delete the runner from bitbucket and creating a new one.
please advise on this issue
Thanks
@Jorge Cotelo Sorry for the late reply. The state of the runner is broken on the Bitbucket side, you will have to delete the existing runner and create a new one.
@Justin Thomas Thank you for the reply. I deleted the runner and created the new one and everything is back.
There is still the Issue mentioned here:
https://community.atlassian.com/t5/Bitbucket-Pipelines-Runners/Runner-stop-responding-when-idle-for-too-long/qaq-p/1662478
Any update on that?
Regards
@Jorge Cotelo The hung bug was a tough one, it's fixed now. Sorry for the trouble.
Unfortunately bug is still around.
An error occurred whilst completing step.
com.atlassian.pipelines.stargate.client.core.exceptions.StargateBadRequestException: Response Summary: HttpResponseSummary{httpStatusCode=400, httpStatusMessage=Bad Request, bodyAsString={"error":{"message":"Bad request","detail":"Attempt to complete an already completed step (existing status=STOPPED, new status=ERROR, existing error=null, new error=runner.step.execution-error)","data":{"key":"result-service.step.invalid-state","arguments":{}}}}}
Also got the same problem and I had to delete and recreate the runner in bitbucket settings.