-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka.Cluster.Tools: fix mutability and oldest state bugs with ClusterSingletonManager
#7298
Akka.Cluster.Tools: fix mutability and oldest state bugs with ClusterSingletonManager
#7298
Conversation
Looks like the list of available nodes was getting mangled - seems like a basic mutability issue inside the `ClusterSingletonManager`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detailed my changes
@@ -31,7 +31,7 @@ public class ClusterSingletonRestart2Spec : AkkaSpec | |||
akka.loglevel = INFO | |||
akka.actor.provider = "cluster" | |||
akka.cluster.roles = [singleton] | |||
#akka.cluster.auto-down-unreachable-after = 2s | |||
akka.cluster.split-brain-resolver.stable-after = 2s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fixes for this spec - after doing a thorough review of some the cases where we had a race condition, it all comes down to whether or not _sys1
was able to fully get MemberRemoved
or whether or not it became Unreachable
first. Speeding up the downing provider's decision-making process both ensures that this member gets removed quickly AND ensures that the number of retries during the HandOver
process get capped so the test can complete on-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added nullability changes in here, cleaned up garbage XML-DOC comments, and did some minor reformatting - hence why this appears to be a "large" diff. However, I will comment on the real and true changes below.
public BecomingOldestData(List<UniqueAddress> previousOldest) | ||
public ImmutableList<UniqueAddress> PreviousOldest { get; } | ||
|
||
public BecomingOldestData(ImmutableList<UniqueAddress> previousOldest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to using Immutable
lists everywhere for all states and messages. These are only sent when movement is occurring in the cluster, which is inherently low-volume, so there's no performance concerns.
{ | ||
self.Tell(new LeaseLost(reason)); | ||
}).ContinueWith(r => | ||
{ | ||
if (r.IsFaulted || r.IsCanceled) | ||
return (object)new AcquireLeaseFailure(r.Exception); | ||
return new AcquireLeaseResult(r.Result); | ||
}).PipeTo(Self); | ||
}).PipeTo(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a potential safety issue here with using the correct closure.
@@ -769,15 +723,20 @@ private void GetNextOldestChanged() | |||
private State<ClusterSingletonState, IClusterSingletonData> TryAcquireLease() | |||
{ | |||
var self = Self; | |||
lease.Acquire(reason => | |||
|
|||
if (_lease == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a null
check here to appease nullability checks
{ | ||
return GoTo(ClusterSingletonState.End).Using(EndData.Instance); | ||
return GoTo(ClusterSingletonState.Younger).Using(new YoungerData(ImmutableList<UniqueAddress>.Empty)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer using null List<T>
- using empty immutable lists instead.
@@ -911,24 +867,38 @@ private void InitializeFSM() | |||
case OldestChangedBuffer.OldestChanged oldestChanged when e.StateData is YoungerData youngerData: | |||
{ | |||
_oldestChangedReceived = true; | |||
if (oldestChanged.Oldest != null && oldestChanged.Oldest.Equals(_selfUniqueAddress)) | |||
if (oldestChanged.NewOldest != null && oldestChanged.NewOldest.Equals(_selfUniqueAddress)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and a corresponding change in the OldestChangedBuffer
are the two big fixes in this PR, aside from dealing with some of the minor mutability issues.
The Problem
When we receive a MemberRemoved
for an older node that was previously hosting a cluster singleton, we intentionally delay processing of that MemberRemoved
event via the DelayedMemberRemoved
mechanism. This is designed to ensure that in the event that there is a split brain and the SBR was responsible for removing that old node, that node still gets a chance to shut itself down and die cleanly before we take over. This is especially important for singletons using Akka.Persistence, so we don't have two competing actors using the same persistence id and database. So, we still assume that node is the oldest up to ~20s after it's removal.
This is normally not an issue - when we attempt to send hand-overs to that node they'll time out eventually and we'll just recreate the singleton in a matter of 10-20 seconds typically.
However, where this design creates a problem is during rolling updates in a managed environment like Kubernetes, where restarts can happen really quickly.
Imagine this scenario:
- Cluster of
node1
(oldest),node2
,node3
- all have the rightrole
to host the singleton. node1
is replaced first by K8s - it's terminated via CLR process exit; usesCoordinatedShutdown
to try to exit gracefully as best as it can.node2
sees thatnode1
is exiting and does the correct thing - messages it for a handover and becomes the new singleton host and is now the oldest node.- Kubernetes restarts
node1
, which joins the cluster under a newUniqueMemberAddress
but with the sameAddress
as before. - Kubernetes then takes down
node2
- per the update process. - If all of the prior steps happen in under 20 seconds,
node3
will be notified that it is becoming the oldest node and it will start messagingnode1
, which is ACTUALLY YOUNGER THAN IT, to do the hand-over.node1
will ACK that hand-over right away and we'll have two separate cluster singletons active at the same time, untilnode2
completely exits. This issue is one of the two bugs that caused Akka.Cluster.Sharding: duplicate shards / entities #6973 - the member ordering problem that we fixed on Akka.Cluster.Tools.Singleton / Akka.Cluster.Sharding: fix duplicate shards caused by incorrectClusterSingletonManager
HandOver
#7297 being the other.
So to fix this issue, we can either speed up member removals or do something even simpler - just include the address of the previous "oldest" node in the OldestChanged
message and put that one at the front of the list in our Younger
and BecomingOldest
states. That way, we're always messaging the most recent home of the singleton using live data from the OldestChangedBuffer
. I've run this experiment thousands of times in our test lab and can confirm that it eliminated the issue completely.
/// </summary> | ||
public UniqueAddress Oldest { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big fix - make sure that the OldestChanged
message includes BOTH the "new" oldest node and the previous oldest node. This ensures that we don't have any data consistency problems in the event of a rolling update.
return GoTo(ClusterSingletonState.BecomingOldest).Using(new BecomingOldestData(youngerData.Oldest)); | ||
// explicitly re-order the list to make sure that the oldest, as indicated to us by the OldestChangedBuffer, | ||
// is the first element - resolves bug https://github.com/akkadotnet/akka.net/issues/6973 | ||
var newOldestState = oldestChanged.PreviousOldest switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-arrange the state we're going to retain so we make sure the PreviousOldest
node, as reported by the OldestChangedBuffer
, appears at the front of the list. This ensures that the BecomingOldest
node is always performing the hand-off with the correct party in the event of a rolling update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes
Looks like the list of available nodes was getting mangled - seems like a basic mutability issue inside the
ClusterSingletonManager
.close #6973
close #7196
Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):