Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster.Tools: fix mutability and oldest state bugs with ClusterSingletonManager #7298

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Jul 24, 2024

Changes

Looks like the list of available nodes was getting mangled - seems like a basic mutability issue inside the ClusterSingletonManager.

close #6973
close #7196

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Looks like the list of available nodes was getting mangled - seems like a basic mutability issue inside the `ClusterSingletonManager`.
Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed my changes

@@ -31,7 +31,7 @@ public class ClusterSingletonRestart2Spec : AkkaSpec
akka.loglevel = INFO
akka.actor.provider = "cluster"
akka.cluster.roles = [singleton]
#akka.cluster.auto-down-unreachable-after = 2s
akka.cluster.split-brain-resolver.stable-after = 2s
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fixes for this spec - after doing a thorough review of some the cases where we had a race condition, it all comes down to whether or not _sys1 was able to fully get MemberRemoved or whether or not it became Unreachable first. Speeding up the downing provider's decision-making process both ensures that this member gets removed quickly AND ensures that the number of retries during the HandOver process get capped so the test can complete on-time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added nullability changes in here, cleaned up garbage XML-DOC comments, and did some minor reformatting - hence why this appears to be a "large" diff. However, I will comment on the real and true changes below.

public BecomingOldestData(List<UniqueAddress> previousOldest)
public ImmutableList<UniqueAddress> PreviousOldest { get; }

public BecomingOldestData(ImmutableList<UniqueAddress> previousOldest)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to using Immutable lists everywhere for all states and messages. These are only sent when movement is occurring in the cluster, which is inherently low-volume, so there's no performance concerns.

{
self.Tell(new LeaseLost(reason));
}).ContinueWith(r =>
{
if (r.IsFaulted || r.IsCanceled)
return (object)new AcquireLeaseFailure(r.Exception);
return new AcquireLeaseResult(r.Result);
}).PipeTo(Self);
}).PipeTo(self);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a potential safety issue here with using the correct closure.

@@ -769,15 +723,20 @@ private void GetNextOldestChanged()
private State<ClusterSingletonState, IClusterSingletonData> TryAcquireLease()
{
var self = Self;
lease.Acquire(reason =>

if (_lease == null)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a null check here to appease nullability checks

{
return GoTo(ClusterSingletonState.End).Using(EndData.Instance);
return GoTo(ClusterSingletonState.Younger).Using(new YoungerData(ImmutableList<UniqueAddress>.Empty));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer using null List<T> - using empty immutable lists instead.

@@ -911,24 +867,38 @@ private void InitializeFSM()
case OldestChangedBuffer.OldestChanged oldestChanged when e.StateData is YoungerData youngerData:
{
_oldestChangedReceived = true;
if (oldestChanged.Oldest != null && oldestChanged.Oldest.Equals(_selfUniqueAddress))
if (oldestChanged.NewOldest != null && oldestChanged.NewOldest.Equals(_selfUniqueAddress))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and a corresponding change in the OldestChangedBuffer are the two big fixes in this PR, aside from dealing with some of the minor mutability issues.

The Problem

When we receive a MemberRemoved for an older node that was previously hosting a cluster singleton, we intentionally delay processing of that MemberRemoved event via the DelayedMemberRemoved mechanism. This is designed to ensure that in the event that there is a split brain and the SBR was responsible for removing that old node, that node still gets a chance to shut itself down and die cleanly before we take over. This is especially important for singletons using Akka.Persistence, so we don't have two competing actors using the same persistence id and database. So, we still assume that node is the oldest up to ~20s after it's removal.

This is normally not an issue - when we attempt to send hand-overs to that node they'll time out eventually and we'll just recreate the singleton in a matter of 10-20 seconds typically.

However, where this design creates a problem is during rolling updates in a managed environment like Kubernetes, where restarts can happen really quickly.

Imagine this scenario:

  • Cluster of node1 (oldest), node2, node3 - all have the right role to host the singleton.
  • node1 is replaced first by K8s - it's terminated via CLR process exit; uses CoordinatedShutdown to try to exit gracefully as best as it can.
  • node2 sees that node1 is exiting and does the correct thing - messages it for a handover and becomes the new singleton host and is now the oldest node.
  • Kubernetes restarts node1, which joins the cluster under a new UniqueMemberAddress but with the same Address as before.
  • Kubernetes then takes down node2 - per the update process.
  • If all of the prior steps happen in under 20 seconds, node3 will be notified that it is becoming the oldest node and it will start messaging node1, which is ACTUALLY YOUNGER THAN IT, to do the hand-over. node1 will ACK that hand-over right away and we'll have two separate cluster singletons active at the same time, until node2 completely exits. This issue is one of the two bugs that caused Akka.Cluster.Sharding: duplicate shards / entities #6973 - the member ordering problem that we fixed on Akka.Cluster.Tools.Singleton / Akka.Cluster.Sharding: fix duplicate shards caused by incorrect ClusterSingletonManager HandOver #7297 being the other.

So to fix this issue, we can either speed up member removals or do something even simpler - just include the address of the previous "oldest" node in the OldestChanged message and put that one at the front of the list in our Younger and BecomingOldest states. That way, we're always messaging the most recent home of the singleton using live data from the OldestChangedBuffer. I've run this experiment thousands of times in our test lab and can confirm that it eliminated the issue completely.

/// </summary>
public UniqueAddress Oldest { get; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big fix - make sure that the OldestChanged message includes BOTH the "new" oldest node and the previous oldest node. This ensures that we don't have any data consistency problems in the event of a rolling update.

return GoTo(ClusterSingletonState.BecomingOldest).Using(new BecomingOldestData(youngerData.Oldest));
// explicitly re-order the list to make sure that the oldest, as indicated to us by the OldestChangedBuffer,
// is the first element - resolves bug https://github.com/akkadotnet/akka.net/issues/6973
var newOldestState = oldestChanged.PreviousOldest switch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-arrange the state we're going to retain so we make sure the PreviousOldest node, as reported by the OldestChangedBuffer, appears at the front of the list. This ensures that the BecomingOldest node is always performing the hand-off with the correct party in the event of a rolling update.

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit 7180810 into akkadotnet:dev Jul 25, 2024
12 checks passed
@Aaronontheweb Aaronontheweb deleted the fix-6973-early-ClusterSingletonManager-handoff branch July 25, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants