We have encountered a strange problem with Bitbucket since the last upgrade - Both test and production. If I start both cluster nodes in an ordered manner everything looks fine. But if I stop one of the nodes, ExecutorService dies on the other node and things like "Rolling Upgrades" stop working (starting the stopped node again does not make any difference).
We are running atlassian-bitbucket-9.4.3
No users have complained, so everything else seems to work.
The logs on the node that is not sopped says:
2025-02-18 08:51:29,119 INFO [hz.hazelcast.generic-operation.thread-1] c.h.i.p.impl.MigrationManager [172.20.91.103]:5701 [bvst01-git] [5.4.1] Shutdown request of Member [172.20.91.102]:5701 - e39fa5bb-72de-4c22-9832-6821e7382025 is handled 2025-02-18 08:51:29,122 INFO [hz.hazelcast.migration] c.h.i.p.impl.MigrationManager [172.20.91.103]:5701 [bvst01-git] [5.4.1] Repartitioning cluster data. Migration tasks count: 135 2025-02-18 08:51:29,336 INFO [hz.hazelcast.migration] c.h.i.p.impl.MigrationManager [172.20.91.103]:5701 [bvst01-git] [5.4.1] All migration tasks have been completed. (repartitionTime=Tue Feb 18 08:51:29 CET 2025, plannedMigrations=135, completedMigrations=135, remainingMigrations=0, totalCompletedMigrations=406) 2025-02-18 08:51:29,347 INFO [hz.hazelcast.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection [172.20.91.103]:5701 [bvst01-git] [5.4.1] Connection[id=1, /172.20.91.103:5701->/172.20.91.102:33949, qualifier=null, endpoint=[172.20.91.102]:5701, remoteUuid=e39fa5bb-72de-4c22-9832-6821e7382025, alive=false, connectionType=MEMBER, planeIndex=0] closed. Reason: Connection closed by the other side 2025-02-18 08:51:29,355 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Connecting to /172.20.91.102:5701, timeout: 10000, bind-any: true 2025-02-18 08:51:29,357 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Could not connect to: /172.20.91.102:5701. Reason: IOException[Connection refused to address /172.20.91.102:5701] 2025-02-18 08:51:29,458 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Connecting to /172.20.91.102:5701, timeout: 10000, bind-any: true 2025-02-18 08:51:29,458 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Could not connect to: /172.20.91.102:5701. Reason: IOException[Connection refused to address /172.20.91.102:5701] 2025-02-18 08:51:29,559 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Connecting to /172.20.91.102:5701, timeout: 10000, bind-any: true 2025-02-18 08:51:29,559 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Could not connect to: /172.20.91.102:5701. Reason: IOException[Connection refused to address /172.20.91.102:5701] 2025-02-18 08:51:29,660 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Connecting to /172.20.91.102:5701, timeout: 10000, bind-any: true 2025-02-18 08:51:29,661 INFO [hz.hazelcast.cached.thread-6] c.h.i.server.tcp.TcpServerConnector [172.20.91.103]:5701 [bvst01-git] [5.4.1] Could not connect to: /172.20.91.102:5701. Reason: IOException[Connection refused to address /172.20.91.102:5701] 2025-02-18 08:51:29,661 WARN [hz.hazelcast.cached.thread-6] c.h.i.s.t.TcpServerConnectionErrorHandler [172.20.91.103]:5701 [bvst01-git] [5.4.1] Removing connection to endpoint [172.20.91.102]:5701 Cause => java.io.IOException {Connection refused to address /172.20.91.102:5701}, Error-Count: 5 2025-02-18 08:51:29,661 INFO [hz.hazelcast.cached.thread-6] c.h.i.cluster.impl.MembershipManager [172.20.91.103]:5701 [bvst01-git] [5.4.1] Removing Member [172.20.91.102]:5701 - e39fa5bb-72de-4c22-9832-6821e7382025 2025-02-18 08:51:29,663 INFO [hz.hazelcast.migration] c.h.i.p.impl.MigrationManager [172.20.91.103]:5701 [bvst01-git] [5.4.1] Partition balance is ok, no need to repartition. 2025-02-18 08:51:29,665 INFO [hz.hazelcast.cached.thread-6] c.h.internal.cluster.ClusterService [172.20.91.103]:5701 [bvst01-git] [5.4.1] Members {size:1, ver:5} [ Member [172.20.91.103]:5701 - b3899afb-f3b1-4975-b692-02550b49e9fa this ] 2025-02-18 08:51:29,665 INFO [hz.hazelcast.event-1] c.a.s.i.c.HazelcastClusterService Node '/172.20.91.102:5701 (bvst01-git-03)' was REMOVED from the cluster. Updated cluster: [/172.20.91.103:5701 master this name='bvst01-git-02' uuid='b3899afb-f3b1-4975-b692-02550b49e9fa' vm-id='1f6b7b77-e249-4da0-b35e-938b0f85b924'] 2025-02-18 08:51:29,666 INFO [hz.hazelcast.cached.thread-5] c.h.t.TransactionManagerService [172.20.91.103]:5701 [bvst01-git] [5.4.1] Committing/rolling-back live transactions of [172.20.91.102]:5701, UUID: e39fa5bb-72de-4c22-9832-6821e7382025 2025-02-18 08:51:32,769 ERROR [https-jsse-nio-8443-exec-7] admin @10Z2XKTx531x80x0 8hs6al 172.20.101.6,172.20.47.60 "GET /rest/zdu/cluster HTTP/1.1" c.a.b.i.r.e.UnhandledExceptionMapper Unhandled exception while processing REST request: "GET /rest/zdu/cluster HTTP/1.1" java.util.concurrent.RejectedExecutionException: ExecutorService[bitbucket.core] is shutdown! In order to create a new ExecutorService with name 'bitbucket.core', you need to destroy current ExecutorService first! at com.hazelcast.executor.impl.ExecutorServiceProxy.checkNotShutdown(ExecutorServiceProxy.java:235) at com.hazelcast.executor.impl.ExecutorServiceProxy.submitToMember(ExecutorServiceProxy.java:407) at com.hazelcast.executor.impl.ExecutorServiceProxy.submitToMembers(ExecutorServiceProxy.java:455) at com.hazelcast.executor.impl.ExecutorServiceProxy.submitToAllMembers(ExecutorServiceProxy.java:463) at com.atlassian.stash.internal.cluster.HazelcastClusterService.getNodeStates(HazelcastClusterService.java:80) at jdk.internal.reflect.GeneratedMethodAccessor662.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at jdk.internal.reflect.GeneratedMethodAccessor203.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at jdk.proxy3/jdk.proxy3.$Proxy84.getNodeStates(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor662.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.atlassian.plugin.util.ContextClassLoaderSettingInvocationHandler.invoke(ContextClassLoaderSettingInvocationHandler.java:26) at jdk.proxy2/jdk.proxy2.$Proxy622.getNodeStates(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor662.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.eclipse.gemini.blueprint.service.importer.support.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:56) at org.eclipse.gemini.blueprint.service.importer.support.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:60) at org.eclipse.gemini.blueprint.service.util.internal.aop.ServiceTCCLInterceptor.invokeUnprivileged(ServiceTCCLInterceptor.java:70) at org.eclipse.gemini.blueprint.service.util.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:53) at org.eclipse.gemini.blueprint.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:57) at jdk.proxy11/jdk.proxy11.$Proxy858.getNodeStates(Unknown Source) at com.atlassian.zdu.bitbucket.impl.BitbucketClusterManagerAdapter.getNodes(BitbucketClusterManagerAdapter.java:27) at com.atlassian.zdu.NodeInfoAccessor.getNodes(NodeInfoAccessor.java:26) at com.atlassian.zdu.impl.ZduServiceImpl.getNodes(ZduServiceImpl.java:70) at com.atlassian.zdu.impl.ZduServiceImpl.getCluster(ZduServiceImpl.java:137) at com.atlassian.zdu.rest.ZduResource.getCluster(ZduResource.java:84) at jdk.internal.reflect.GeneratedMethodAccessor694.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) at org.glassfish.jersey.internal.Errors.process(Errors.java:292) at org.glassfish.jersey.internal.Errors.process(Errors.java:274) at org.glassfish.jersey.internal.Errors.process(Errors.java:244) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:359) at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:432) at com.atlassian.applinks.core.rest.context.ContextFilter.doFilter(ContextFilter.java:28) at com.atlassian.applinks.core.rest.context.ContextFilter.doFilter(ContextFilter.java:28) at com.atlassian.applinks.core.rest.context.ContextFilter.doFilter(ContextFilter.java:28) at com.atlassian.applinks.core.rest.context.ContextFilter.doFilter(ContextFilter.java:28) at com.atlassian.applinks.core.rest.context.ContextFilter.doFilter(ContextFilter.java:28) at com.atlassian.analytics.client.filter.UniversalAnalyticsFilter.doFilter(UniversalAnalyticsFilter.java:77) at com.atlassian.analytics.client.filter.AbstractHttpFilter.doFilter(AbstractHttpFilter.java:33) at com.atlassian.bitbucket.internal.xcode.web.XcodeUserAgentFilter.doFilter(XcodeUserAgentFilter.java:38) at com.atlassian.stash.internal.spring.lifecycle.LifecycleJohnsonServletFilterModuleContainerFilter.doFilter(LifecycleJohnsonServletFilterModuleContainerFilter.java:42) at com.atlassian.bitbucket.internal.ratelimit.servlet.filter.RateLimitFilter.doFilter(RateLimitFilter.java:75) at com.atlassian.theme.filter.DefaultRequestOverrideServletFilter.doFilter(DefaultRequestOverrideServletFilter.java:72) at com.atlassian.troubleshooting.thready.filter.AbstractThreadNamingFilter.doFilter(AbstractThreadNamingFilter.java:46) at com.atlassian.stash.internal.spring.lifecycle.LifecycleJohnsonServletFilterModuleContainerFilter.doFilter(LifecycleJohnsonServletFilterModuleContainerFilter.java:42) at com.atlassian.stash.internal.web.auth.AuthorizationFailureInterceptor.doFilterInternal(AuthorizationFailureInterceptor.java:39) at com.atlassian.stash.internal.spring.security.StashAuthenticationFilter.doFilter(StashAuthenticationFilter.java:86) at com.atlassian.stash.internal.web.auth.BeforeLoginPluginAuthenticationFilter.doInsideSpringSecurityChain(BeforeLoginPluginAuthenticationFilter.java:112) at com.atlassian.stash.internal.web.auth.BeforeLoginPluginAuthenticationFilter.doFilter(BeforeLoginPluginAuthenticationFilter.java:75) at com.atlassian.security.auth.trustedapps.filter.TrustedApplicationsFilter.doFilter(TrustedApplicationsFilter.java:94) at com.atlassian.oauth.serviceprovider.internal.servlet.OAuthFilter.doFilter(OAuthFilter.java:69) at com.atlassian.oauth2.provider.core.web.AccessTokenFilter.doFilter(AccessTokenFilter.java:88) at com.atlassian.stash.internal.spring.lifecycle.LifecycleJohnsonServletFilterModuleContainerFilter.doFilter(LifecycleJohnsonServletFilterModuleContainerFilter.java:42) at com.atlassian.plugins.authentication.sso.web.filter.loginform.DisableNativeLoginAuthFilter.doFilterInternal(DisableNativeLoginAuthFilter.java:73) at com.atlassian.plugins.authentication.sso.web.filter.AbstractJohnsonAwareFilter.doFilter(AbstractJohnsonAwareFilter.java:29) at com.atlassian.plugins.authentication.basicauth.filter.DisableBasicAuthFilter.doFilter(DisableBasicAuthFilter.java:79) at com.atlassian.jwt.internal.servlet.JwtAuthFilter.doFilter(JwtAuthFilter.java:39) at com.atlassian.analytics.client.filter.DefaultAnalyticsFilter.doFilter(DefaultAnalyticsFilter.java:28) at com.atlassian.analytics.client.filter.AbstractHttpFilter.doFilter(AbstractHttpFilter.java:33) at com.atlassian.troubleshooting.thready.filter.AbstractThreadNamingFilter.doFilter(AbstractThreadNamingFilter.java:46) at com.atlassian.stash.internal.spring.lifecycle.LifecycleJohnsonServletFilterModuleContainerFilter.doFilter(LifecycleJohnsonServletFilterModuleContainerFilter.java:42) at com.atlassian.stash.internal.web.auth.BeforeLoginPluginAuthenticationFilter.doBeforeBeforeLoginFilters(BeforeLoginPluginAuthenticationFilter.java:90) at com.atlassian.stash.internal.web.auth.BeforeLoginPluginAuthenticationFilter.doFilter(BeforeLoginPluginAuthenticationFilter.java:73) at com.atlassian.stash.internal.request.DefaultRequestManager.doAsRequest(DefaultRequestManager.java:85) at com.atlassian.stash.internal.hazelcast.ConfigurableWebFilter.doFilter(ConfigurableWebFilter.java:38) at java.base/java.lang.Thread.run(Thread.java:840) ... 257 frames trimmed