Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

How does Bamboo Cluster works?

Nat Tsai July 12, 2022

I have two Bamboo Server in a cluster, first start Bamboo Server1 everything is fine, but when I start Bamboo server2, will become Active not standby, it's very weird, and the listener port also switches to server2

I think when Bamboo Server1 started, Bamboo Server2 will become to standby, until Bamboo Server1 goes down Server2 will become active

Maybe I misconfiguration or does anyone have some idea?

 

My bamboo-init.properties:

bamboo.home=/var/atlassian/application-data/bamboo/
bamboo.shared.homed=/var/atlassian/application-data/bamboo/shared

1 answer

1 accepted

0 votes
Answer accepted
Eduardo Alvarenga
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 12, 2022

Hello @Nat Tsai

Indeed, the Bamboo Standby node is expected to take over as an Active node only after 5 minutes of missing heartbeats on the DB from the current Active node. As your shared filesystem configuration looks correct I would advise you to check the shared filesystem for any inconsistencies such as not providing the same file simultaneously between each mounted node.

I would also check for the Database address listed on <bamboo-home>/bamboo.cfg.xml and validate if both instances are connecting to the same DB.

When the immediate takeover happens, do you see the "Active" node stopping/being killed?

You can check if you have any BAMBOO_HOME environment variables declared either globally or on <bamboo-install>/bin/setenv.sh (usually at the beginning of the file) that might be breaking the startup process.

If you still see things are not working properly, I advise you to open a support ticket on https://support.atlassian.com/contact so we can help you in detail.

Due to the fact that Atlassian Community is a public forum, please do not provide any detail that may be classified.

 

Best regards,

 

Eduardo Alvarenga
Atlassian Support APAC 

Nat Tsai July 12, 2022

@Eduardo Alvarenga 

Thanks for the Reply.

 

When Bamboo Server2 takeover immediate, the Bamboo Server1 log has some WARN messages:

2022-07-12 21:02:27,799 INFO [atlassian-scheduler-quartz2.local_Worker-1] [DbLimiterJobRunner] DbLimiterJobRunner Started
2022-07-12 21:02:27,802 INFO [atlassian-scheduler-quartz2.local_Worker-1] [DbLimiterJobRunner] DbLimiterJobRunner Finished
2022-07-12 21:02:28,131 INFO [ActiveMQ Journal Checkpoint Worker] [PageFile] Unexpected io error on pagefile write of 1 pages.
java.io.IOException: Stale file handle
at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:82)
at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:461)
at org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:401)
at org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1187)
at org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:608)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1795)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:1104)
at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:445)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-07-12 21:02:28,134 ERROR [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint failed
java.io.IOException: Stale file handle
at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:82)
at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:461)
at org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:401)
at org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1187)
at org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:608)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1795)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:1104)
at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:445)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-07-12 21:02:28,135 INFO [ActiveMQ Journal Checkpoint Worker] [DefaultIOExceptionHandler] Stopping BrokerService[bamboo] due to exception, java.io.IOException: Stale file handle
java.io.IOException: Stale file handle
at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:82)
at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:461)
at org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:401)
at org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1187)
at org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:608)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1795)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:1104)
at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:445)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-07-12 21:02:28,137 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [BrokerService] Apache ActiveMQ 5.16.3 (bamboo, ID:bamboo1-36725-1657620117267-0:1) is shutting down
2022-07-12 21:02:28,139 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [TransportConnector] Connector nio://bamboo1:54663?wireFormat.maxInactivityDuration=300000 stopped
2022-07-12 21:02:28,139 INFO [ActiveMQ Transport Server Thread Handler: nio://0.0.0.0:54663?wireFormat.maxInactivityDuration=300000] [TcpTransportServer] socketQueue interrupted - stopping
2022-07-12 21:02:28,139 INFO [ActiveMQ Transport Server Thread Handler: nio://0.0.0.0:54663?wireFormat.maxInactivityDuration=300000] [TransportConnector] Could not accept connection during shutdown : null (null)
2022-07-12 21:02:28,144 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [TransportConnector] Connector tcp://localhost:54665?wireFormat.maxInactivityDuration=300000 stopped
2022-07-12 21:02:28,144 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [TransportConnector] Connector ssl://bamboo1:54664?wireFormat.maxInactivityDuration=300000 stopped
2022-07-12 21:02:28,145 INFO [ActiveMQ Transport Server Thread Handler: ssl://0.0.0.0:54664?wireFormat.maxInactivityDuration=300000] [TcpTransportServer] socketQueue interrupted - stopping
2022-07-12 21:02:28,145 INFO [ActiveMQ Transport Server Thread Handler: ssl://0.0.0.0:54664?wireFormat.maxInactivityDuration=300000] [TransportConnector] Could not accept connection during shutdown : null (null)
2022-07-12 21:02:28,166 WARN [buildTailMessageListenerConnector-1] [FingerprintMatchingMessageListenerContainer] Setup of JMS message listener invoker failed for destination 'queue://com.atlassian.bamboo.buildTailQueue' - trying to recover. Cause: The Session is closed
2022-07-12 21:02:28,166 WARN [bambooHeartBeatMessageListenerConnector-1] [BambooDefaultMessageListenerContainer] Setup of JMS message listener invoker failed for destination 'queue://com.atlassian.bamboo.heartbeatQueue' - trying to recover. Cause: The Session is closed
2022-07-12 21:02:28,166 WARN [bambooAgentMessageListenerConnector-1] [FingerprintMatchingMessageListenerContainer] Setup of JMS message listener invoker failed for destination 'queue://com.atlassian.bamboo.serverQueue' - trying to recover. Cause: The Session is closed
2022-07-12 21:02:28,184 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [TransportConnector] Connector vm://bamboo stopped
2022-07-12 21:02:28,196 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [BrokerPluginSupport] Broker Plugin org.apache.activemq.broker.util.TimeStampingBrokerPlugin stopped
2022-07-12 21:02:28,197 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [PListStoreImpl] PListStore:[/var/atlassian/application-data/bamboo/shared/jms-store/bamboo/tmp_storage] stopped
2022-07-12 21:02:28,197 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [KahaDBStore] Stopping async queue tasks
2022-07-12 21:02:28,197 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [KahaDBStore] Stopping async topic tasks
2022-07-12 21:02:28,197 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [KahaDBStore] Stopped KahaDB
2022-07-12 21:02:28,197 ERROR [IOExceptionHandler: stopping BrokerService[bamboo]] [KahaDBStore] Could not stop service: KahaDB:[/var/atlassian/application-data/bamboo/shared/jms-store/bamboo/KahaDB]. Reason: java.lang.IllegalStateException: PageFile is not loaded
java.lang.IllegalStateException: PageFile is not loaded
at org.apache.activemq.store.kahadb.disk.page.PageFile.assertLoaded(PageFile.java:906)
at org.apache.activemq.store.kahadb.disk.page.PageFile.tx(PageFile.java:315)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1789)
at org.apache.activemq.store.kahadb.MessageDatabase.close(MessageDatabase.java:517)
at org.apache.activemq.store.kahadb.MessageDatabase.unload(MessageDatabase.java:556)
at org.apache.activemq.store.kahadb.MessageDatabase.doStop(MessageDatabase.java:314)
at org.apache.activemq.store.kahadb.KahaDBStore.doStop(KahaDBStore.java:311)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStop(KahaDBPersistenceAdapter.java:262)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.util.ServiceStopper.stop(ServiceStopper.java:41)
at org.apache.activemq.broker.BrokerService.stop(BrokerService.java:878)
at org.apache.activemq.util.DefaultIOExceptionHandler$2.run(DefaultIOExceptionHandler.java:188)
2022-07-12 21:02:28,203 ERROR [IOExceptionHandler: stopping BrokerService[bamboo]] [KahaDBPersistenceAdapter] Could not stop service: KahaDBPersistenceAdapter[/var/atlassian/application-data/bamboo/shared/jms-store/bamboo/KahaDB,Index:/var/atlassian/application-data/bamboo/shared/jms-store/bamboo/KahaDB]. Reason: java.lang.IllegalStateException: PageFile is not loaded
java.lang.IllegalStateException: PageFile is not loaded
at org.apache.activemq.store.kahadb.disk.page.PageFile.assertLoaded(PageFile.java:906)
at org.apache.activemq.store.kahadb.disk.page.PageFile.tx(PageFile.java:315)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1789)
at org.apache.activemq.store.kahadb.MessageDatabase.close(MessageDatabase.java:517)
at org.apache.activemq.store.kahadb.MessageDatabase.unload(MessageDatabase.java:556)
at org.apache.activemq.store.kahadb.MessageDatabase.doStop(MessageDatabase.java:314)
at org.apache.activemq.store.kahadb.KahaDBStore.doStop(KahaDBStore.java:311)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStop(KahaDBPersistenceAdapter.java:262)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.util.ServiceStopper.stop(ServiceStopper.java:41)
at org.apache.activemq.broker.BrokerService.stop(BrokerService.java:878)
at org.apache.activemq.util.DefaultIOExceptionHandler$2.run(DefaultIOExceptionHandler.java:188)
2022-07-12 21:02:28,203 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [BambooAmqClusterLocker] Bamboo amq cluster locker stopped
2022-07-12 21:02:28,203 ERROR [IOExceptionHandler: stopping BrokerService[bamboo]] [KahaDBPersistenceAdapter] Could not stop service: KahaDBPersistenceAdapter[/var/atlassian/application-data/bamboo/shared/jms-store/bamboo/KahaDB,Index:/var/atlassian/application-data/bamboo/shared/jms-store/bamboo/KahaDB]. Reason: java.lang.IllegalStateException: PageFile is not loaded
java.lang.IllegalStateException: PageFile is not loaded
at org.apache.activemq.store.kahadb.disk.page.PageFile.assertLoaded(PageFile.java:906)
at org.apache.activemq.store.kahadb.disk.page.PageFile.tx(PageFile.java:315)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1789)
at org.apache.activemq.store.kahadb.MessageDatabase.close(MessageDatabase.java:517)
at org.apache.activemq.store.kahadb.MessageDatabase.unload(MessageDatabase.java:556)
at org.apache.activemq.store.kahadb.MessageDatabase.doStop(MessageDatabase.java:314)
at org.apache.activemq.store.kahadb.KahaDBStore.doStop(KahaDBStore.java:311)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStop(KahaDBPersistenceAdapter.java:262)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.util.ServiceStopper.stop(ServiceStopper.java:41)
at org.apache.activemq.broker.BrokerService.stop(BrokerService.java:878)
at org.apache.activemq.util.DefaultIOExceptionHandler$2.run(DefaultIOExceptionHandler.java:188)
2022-07-12 21:02:28,211 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [BrokerService] Apache ActiveMQ 5.16.3 (bamboo, ID:bamboo1-36725-1657620117267-0:1) uptime 3 hours
2022-07-12 21:02:28,211 INFO [IOExceptionHandler: stopping BrokerService[bamboo]] [BrokerService] Apache ActiveMQ 5.16.3 (bamboo, ID:bamboo1-36725-1657620117267-0:1) is shutdown
2022-07-12 21:02:28,211 WARN [IOExceptionHandler: stopping BrokerService[bamboo]] [DefaultIOExceptionHandler] Failure occurred while stopping broker
java.lang.IllegalStateException: PageFile is not loaded
at org.apache.activemq.store.kahadb.disk.page.PageFile.assertLoaded(PageFile.java:906)
at org.apache.activemq.store.kahadb.disk.page.PageFile.tx(PageFile.java:315)
at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1789)
at org.apache.activemq.store.kahadb.MessageDatabase.close(MessageDatabase.java:517)
at org.apache.activemq.store.kahadb.MessageDatabase.unload(MessageDatabase.java:556)
at org.apache.activemq.store.kahadb.MessageDatabase.doStop(MessageDatabase.java:314)
at org.apache.activemq.store.kahadb.KahaDBStore.doStop(KahaDBStore.java:311)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStop(KahaDBPersistenceAdapter.java:262)
at org.apache.activemq.util.ServiceSupport.stop(ServiceSupport.java:71)
at org.apache.activemq.util.ServiceStopper.stop(ServiceStopper.java:41)
at org.apache.activemq.broker.BrokerService.stop(BrokerService.java:878)
at org.apache.activemq.util.DefaultIOExceptionHandler$2.run(DefaultIOExceptionHandler.java:188)
2022-07-12 21:02:28,221 INFO [bambooHeartBeatMessageListenerConnector-1] [BrokerService] Using Persistence Adapter: KahaDBPersistenceAdapter[/activemq-data/bamboo/KahaDB]
2022-07-12 21:02:28,224 INFO [bambooHeartBeatMessageListenerConnector-1] [SharedFileLocker] Database activemq-data/bamboo/KahaDB/lock is locked by another server. This broker is now in slave mode waiting a lock to be acquired
2022-07-12 21:02:30,276 INFO [scheduler_Worker-7] [PlanVcsRevisionHistoryCleanupScheduler] Starting Plan VCS Revision History Cleanup

Eduardo Alvarenga
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 12, 2022

Hello @Nat Tsai,

Unexpected io error on pagefile write of 1 pages.
java.io.IOException: Stale file handle 

The message is an indicator of a broken NFS/CIFS implementation.

Please check your NFS server and client mount options. This might happen when the NFS server reboots without the client umounting the NFS volumes first.

Bamboo works on NFSv3. It was not tested on NFSv4 so your mileage may vary.

Eduardo Alvarenga
Atlassian Support APAC 

Nat Tsai July 12, 2022

Hi @Eduardo Alvarenga

NFS Server still available, and we use NFSv3

My mount options is: 10.11.x.x:/datastore/bamboo /var/atlassian/application-data/bamboo/shared nfs rw,nfsvers=3,lookupcache=pos,noatime,intr,rsize=32768,wsize=32768,_netdev 0 0

Eduardo Alvarenga
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 12, 2022

Hi @Nat Tsai

Please reboot both Bamboo nodes and your NFS server and try again. If the issue still persists, kindly open a support ticket on https://support.atlassian.com/contact so we can help you in detail.

Make sure to reference this Community post so the assigned engineer can have knowledge of what's been evaluated so far.

Eduardo Alvarenga
Atlassian Support APAC 

Nat Tsai July 12, 2022

Hi @Eduardo Alvarenga ,

 

Thanks for your support, but the problem still exists when I restart NFS and Bamboo.

I'll open a support ticket.

 

Thanks a lot

Eduardo Alvarenga
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 12, 2022

Hey @Nat Tsai 

Thank you for the interaction on the ticket!

We have created a KB article that describes your issue with a solution:

Thanks a lot for helping us create better content for our customers!

Please make sure to mark this answer as accepted.

Regards,

Eduardo Alvarenga
Atlassian Support APAC

Like Nat Tsai likes this

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events