Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activate() and deactivate need to be done in 2 phases #5

Open
belaban opened this issue Nov 15, 2023 · 0 comments
Open

activate() and deactivate need to be done in 2 phases #5

belaban opened this issue Nov 15, 2023 · 0 comments
Assignees
Milestone

Comments

@belaban
Copy link
Member

belaban commented Nov 15, 2023

When activate() is called in {A,B,C}, then all members connect to the UpgradeServer, register their view and set active=true. This may cause the following issue:

  • A and B are done, active is true
  • C is delayed, active is still false and registerView() has not yet been called
  • C sends a message to B. This succeeds because the message is sent via the JGroups stack (not via UPGRADE, as active==false), and B does receive the message via the JGroups stack. However, B would not be able to send a response to C, because it would send it via UPGRADE. However, C would not receive the message as it hasn't yet called registerView(), which enables the UpgradeServer to send B's response to C, as C doesn't yet use UPGRADE.

We therefore need to ensure that everyone is registered with the UpgradeServer, before switching to use of UPGRADE:

  • In a first phase, registerView() in all members makes sure that everyone can send/receive message to/from the UpgradeServer.
  • Only when everyone has successfully registered, we can switch to using UPGRADE by setting active=true. If the first phase doesn't complete successfully, an exception will be thrown and the second phase will not be started, which means that the switch to UPGRADE will not be made.

The second phase does not need to be synchronous: since everyone is connected with the UpgradeServer and JGroups, messages can be sent via JGroups or UPGRADE and will be received all the same! For example, a member might not yet be active, therefore a message is sent via JGroups. The recipient receives it via JGroups, but might send the response via UPGRADE, as it is already active. The original sender will then receive the response via UPGRADE, as it registered with the UpgradeServer in the first phase.

This issue would not cause incorrect behavior in Infinispan, as an RPC would simply time out (e.g. in the above example). However, it reduces the number of failures, which is important when we do a rolling upgrade during heavy traffic.

The same is true for deactivate(): because it is not synchronous (ie., received by all members at the same time), the following can happen:

  • All members (A,B,C) are active
  • deactivate() is called
  • A is the coordinator of the global view and would install the new MergeView locally
  • However, C receives deactivate() first and disconnects
  • Because A is still active, it gets a new view {A,B} from the UpgradeServer!

-> We therefore have to activate/deactivate in 2 phases:

Solution for 'activate()`:

  • All members register with UpgradeServer. Now, they can receive messages either via UpgradeServer or still locally
  • When this is done (and confirmed): all members switch active to false
    ** Because this phase is not synchronous, some members might activate before others. However, this is not an issue as members can receive message temporarily through the local channel before switching to UpgradeServer

Solution for 'deactivate()`:

  • All members set active to false. This means that members send messages via the local channel, but are still able to receive messages via UpgradeServer. However, view changes from UpgradeServer are ignored.
  • When this is done, members disconnect from UpgradeServer. TBD: we need to make sure that a member has no pending messages sent via UpgradeServer. TBD: perhaps don't disconnect a member at all; when restarted without UPGRADE, the connection to UpgradeServer will be torn down anyway
@belaban belaban self-assigned this Nov 15, 2023
@belaban belaban changed the title activate() needs to be done in2 phases activate() and deactivate need to be done in 2 phases Nov 28, 2023
@belaban belaban transferred this issue from jgroups-extras/RollingUpgrades Dec 4, 2023
@belaban belaban added this to the 1.1 milestone Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant