ARTEMIS-4325 - Ability to failback after failover #4522

AntonRoskvist · 2023-06-22T10:19:33Z

Opening for feedback as this might not be the right solution. It does however demonstrate the functionality I'm after.

If the feature makes sense on a high level, please let me know if the implementation needs any changes or can be made better in some way.

Use case detailed in: https://issues.apache.org/jira/browse/ARTEMIS-4325

brusdev · 2023-06-23T12:55:23Z

This feature makes sense to me if your are not using HA (live/backup pairs), have you already considered waiting for a topology update to know when a node is up?
https://github.com/apache/activemq-artemis/blob/main/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/client/impl/ClientSessionFactoryImpl.java#L1567

jbertram · 2023-06-23T13:16:30Z

I had the same thought about using the topology update to know when the "original" node came back. I think that would be simpler than manually checking and it would save a thread as well.

AntonRoskvist · 2023-06-27T19:46:28Z

@brusdev
@jbertram

Thanks for your feedback, much appreciated!

In the case of failback between HA live/backup-pairs, that should be covered already. Since failback would only follow a failover which would only happen if the retry-logic failed first i.e both live and backup servers are unavailable.

I think failing over (and back) between different live/backup pairs in a scenario like this make some sense, as long as it doesn't happen internally within the groups. If not, would it make sense to add some additional check to inhibit failback between live-backup-pairs?

I might have missed something with the topology listener though, it would be nice to have this happen without having to "ping" the old node... but would this work if the nodeID of the broker changes (say the broker cluster is part of a blue/green deploy or similar, such that FQDN/IP remains the same but the underlying broker is replaced by one started on a new journal)?

I can give that a try regardless, but it might take some time due to vacation.

Br,
Anton

jbertram · 2023-06-30T20:07:40Z

Isn't the use-case in view here live-to-live failback after failover (implemented recently via ARTEMIS-4251)? I'd leave live/backup pairs out of this completely since there are already semantics in place for that use-case.

would this work if the nodeID of the broker changes (say the broker cluster is part of a blue/green deploy or similar, such that FQDN/IP remains the same but the underlying broker is replaced by one started on a new journal)?

The org.apache.activemq.artemis.api.core.client.TopologyMember has the node ID as well as the TransportConfiguration which you should be able to use to identify host details like name and/or IP address. There are also the isMember methods which may be useful in this use-case.

AntonRoskvist · 2023-08-07T13:58:35Z

@jbertram @brusdev

Hello, back again from vacation.

I made some changes as per our discussion and rebased against main. I'm still running the failback check in a separate thread since I didn't want to potentially stall/lock the TopologyHandler but now it's not starting until the node should be available (or available soon). Let me know what you think and see if there are any other changes that should be made here.

jbertram marked this pull request as draft June 26, 2023 17:50

ARTEMIS-4325 - Ability to failback after failover

8dba605

AntonRoskvist force-pushed the FailoverFailback branch from 0dde73c to 8dba605 Compare August 7, 2023 13:49

AntonRoskvist marked this pull request as ready for review August 30, 2023 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARTEMIS-4325 - Ability to failback after failover #4522

ARTEMIS-4325 - Ability to failback after failover #4522

AntonRoskvist commented Jun 22, 2023

brusdev commented Jun 23, 2023

jbertram commented Jun 23, 2023

AntonRoskvist commented Jun 27, 2023

jbertram commented Jun 30, 2023

AntonRoskvist commented Aug 7, 2023

ARTEMIS-4325 - Ability to failback after failover #4522

Are you sure you want to change the base?

ARTEMIS-4325 - Ability to failback after failover #4522

Conversation

AntonRoskvist commented Jun 22, 2023

brusdev commented Jun 23, 2023

jbertram commented Jun 23, 2023

AntonRoskvist commented Jun 27, 2023

jbertram commented Jun 30, 2023

AntonRoskvist commented Aug 7, 2023