The Atlassian Community can help you and your team get more value out of Atlassian products and practices.
Context: during EAP, we use the runner only once a day. Once in a while the runner gets offline (running, but offline) for up to 4 hours for infrastructure maintenance.
My problem:
I've noticed that very frequently the runner stop responding to the cloud. On logs I have just "Waiting for next step", no errors. Once I restart the runner container, everything runs as it's has never been offline.
Anyone with similar issue?
Hi All,
Can you all please provide your runner logs when it has "hung" and what version of the runner you are using so we can investigate this.
Kind Regards,
Nathan Burrell
Hi, sorry for the late reply.
I've updated the runner to the latest version on friday (2021-04-16) and couldn't notice this issue on this version.
When it was happening there was no log output, just the normal logs ending with "Waiting for next step".
I'll monitor this runner closely this week and report if the problem shows up again.
Thank you.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Here is the logs from my Runner on v1.107 after it goes "Offline":
[2021-04-17 00:11:18,777] An error occurred whilst listening to RUNNER_UPDATED websocket events.
java.io.EOFException: null
at okio.RealBufferedSource.require(RealBufferedSource.java:65)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:78)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.RuntimeException: java.io.EOFException
at io.reactivex.internal.util.ExceptionHelper.wrapOrThrow(ExceptionHelper.java:46)
at io.reactivex.internal.observers.BlockingMultiObserver.blockingGet(BlockingMultiObserver.java:93)
at io.reactivex.Completable.blockingAwait(Completable.java:1227)
at com.atlassian.pipelines.runner.core.ApplicationImpl.main(ApplicationImpl.java:47)
Caused by: java.io.EOFException
at okio.RealBufferedSource.require(RealBufferedSource.java:65)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:78)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @lassian
It just happened again, pipelines shows "Starting Agent" on web (longer than 1 hour now). The runner shows as ONLINE on admin page.
On the logs we get: a successful pipeline followed by hundreds of health checks.
Despite of not responding, the runner keeps reporting, logging and show on admin as ONLINE until now.
......
[2021-04-20 12:19:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:20:10,015] Updating runner state to "ONLINE".
[2021-04-20 12:20:35,310] Appending log line to main log.
[2021-04-20 12:20:38,309] Appending log line to main log.
[2021-04-20 12:20:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:20:45,309] Appending log line to main log.
[2021-04-20 12:20:46,315] Appending log line to main log.
[2021-04-20 12:21:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:21:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:22:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:22:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:22:53,308] Appending log line to main log.
[2021-04-20 12:22:56,919] Container has state (exitCode: Some(5), OOMKilled Some(false))
[2021-04-20 12:22:56,920] Removing container build
[2021-04-20 12:22:57,144] Not uploading caches. (numberOfCaches: 0, resultOrError: FAILED)
[2021-04-20 12:22:57,144] Not uploading artifacts. (numberOfArtifacts: 0, resultOrError: FAILED)
[2021-04-20 12:22:57,144] Updating step progress to PARSING_TEST_RESULTS.
[2021-04-20 12:22:57,615] Test report processing complete.
[2021-04-20 12:22:57,615] Removing container clone
[2021-04-20 12:22:58,309] Appending log line to main log.
[2021-04-20 12:22:58,409] Removing container clone
[2021-04-20 12:22:58,411] Removing container build
[2021-04-20 12:22:58,413] Updating step progress to COMPLETING_LOGS.
[2021-04-20 12:22:58,878] Shutting down log uploader.
[2021-04-20 12:22:58,878] Tearing down directories.
[2021-04-20 12:22:59,043] Cancelling timeout
[2021-04-20 12:22:59,044] Completing step with result Result{status=FAILED, error=None}.
[2021-04-20 12:22:59,530] Waiting for next step.
[2021-04-20 12:23:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:23:40,015] Updating runner state to "ONLINE".
[2021-04-20 12:24:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:24:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:25:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:25:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:26:10,015] Updating runner state to "ONLINE".
[2021-04-20 12:26:40,015] Updating runner state to "ONLINE".
[2021-04-20 12:27:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:27:40,015] Updating runner state to "ONLINE".
[2021-04-20 12:28:10,014] Updating runner state to "ONLINE".
[2021-04-20 12:28:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:29:10,015] Updating runner state to "ONLINE".
[2021-04-20 12:29:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:30:10,015] Updating runner state to "ONLINE".
[2021-04-20 12:30:40,014] Updating runner state to "ONLINE".
[2021-04-20 12:31:10,015] Updating runner state to "ONLINE".
.....
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
YESSS. I just got that issue as well. pipelines shows "Starting Agent" on web and the agent is showing online every 30 sec as normal
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I have identified a bug with websockets above and retries and have released a fix, can you please pull the latest version of the runner (ensuring you have v1.110 or greater).
When you receive the error above it will than resubscribe to the websocket I found an edge case where it was not.
Can you update your runners and tell us if your still experiencing these issues with this version.
Kind Regards,
Nathan Burrell
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I pulled the latest image. (v1.110) and started the runner again. After two builds I got the same issue @Paulo Antoniassi was getting without the error.
Basically the web shows starting agent but it stays there forever while the logs in the runner still shows [2021-04-21 13:54:00,057] Updating runner state to "ONLINE"
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Jorge,
Can you provide me the uuid of the step that was stuck at starting agent, and the uuid of the runner.
Kind Regards,
Nathan Burrell
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @lassian
I am using a private repo so I don't think you will be able to have any visual on it. I don't know if @Paulo Antoniassi still have the issue.
After a few hours the pipeline times out
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I've setup "dummy steps" on every commit to keep te runner alive so I don't have anything recent. Mine is a private repository too but I'm pretty sure Atlassian can get interesting logs even for private repos.
I guess the following one was stuck, but I'm not sure about that cause restarting the runner resumes the step as anything happened, I'm just guessing considering 12 minutes of "build setup".
Step b0394b49-4dc3-4cd9-9e0b-3eabc89350ee (#108)
Repository eabc3499-aff6-4a5e-a5ff-4058c6bc57a2 (b5s/sic718_atendimento)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@lassian ,
Just got an confirmed one, from logs:
{accountUuid={105381f7-a165-4737-899a-c14d91b3b768}, repositoryUuid={eabc3499-aff6-4a5e-a5ff-4058c6bc57a2}, pipelineUuid={a4e15047-11c4-444e-99f8-9ecd1f769d3d}, stepUuid={d130fdd6-26ab-464d-bd41-86a030539f2d}}
I had to restart the runner to resume the step.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Paulo Antoniassi, @lassian
Hello, Any update on this?
I can confirm this issue still happens. I created a few custom pipelines and run them on schedule every 1 hour. After approximately 10 hours the runner stops receiving request and then I have to restart it.
So is not "When idle for too long" it just stop responding and it still shows online on the web client and in the runner itself.
Regards
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Just wanted to circle back since my original comment.
Since updating to v1.110, my runner has consistently run according to schedule.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I have just released a patch to the runner that should stop it from hanging.
Please upgrade the runner and ensure you are running 1.133 and confirm for me if this occurs again.
Kind Regards,
Nathan Burrell
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
Can you all please raise a support case and attach the runner logs at the time a step was scheduled onto it but the runner didnt pick it up so we can investigate, and what step was assigned to it (one that is not in queued in the UI).
Kind Regards,
Nathan
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Bump.
Same sort of thing is happening to me. But mine just hangs during checkout every time.
I posed an issue for it because it seems a little different (mine never runs):
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Un-bump.
My similar issue is gone. See the note on the above thread (started working in v1.110).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello,
I also experience an issue. But mine comes after an error:
An error occurred whilst listening to RUNNER_UPDATED websocket events.
java.io.EOFException: null
at okio.RealBufferedSource.require(RealBufferedSource.java:65)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:78)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.RuntimeException: java.io.EOFException
at io.reactivex.internal.util.ExceptionHelper.wrapOrThrow(ExceptionHelper.java:46)
at io.reactivex.internal.observers.BlockingMultiObserver.blockingGet(BlockingMultiObserver.java:93)
at io.reactivex.Completable.blockingAwait(Completable.java:1227)
at com.atlassian.pipelines.runner.core.ApplicationImpl.main(ApplicationImpl.java:47)
Caused by: java.io.EOFException
at okio.RealBufferedSource.require(RealBufferedSource.java:65)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:78)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I've experienced this issue myself. Currently have a single runner that's scheduled to run weekly Packer image builds. The runner is inactive the rest of the week and consistently fails to run my scheduled build until the runner is restarted.
The error I've seen when it fails to run is "ERROR Step exceeded processing limits and has timed out:.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Connect with like-minded Atlassian users at free events near you!
Find an eventConnect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.
Host an eventYou're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.