-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split brain resolver #3180
Split brain resolver #3180
Conversation
@Horusiath missing API approvals for Akka.Cluster; I still need to do a review, but that's what's throwing off the specs |
@Aaronontheweb this is still incomplete - tests are still missing. I also need to refactor this a little. Right now different strategies are represented as different actor types, but during development I've understood, that they can be just standard classes with one common interface. This will be a lot easier to test. |
Ready for review. |
RunOn(() => | ||
{ | ||
Cluster.RegisterOnMemberRemoved(() => downed = true); | ||
}, minority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I need to make sure, that the nodes from minority group have to be removed from the cluster, I've found this way to be the most reliable - I simply register a flag setup on removing current node from the cluster, for which I make assertion later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but have a question in the comments that would be helpful to know
|
||
When operating an Akka cluster you must consider how to handle [network partitions](https://en.wikipedia.org/wiki/Network_partition) (a.k.a. split brain scenarios) and machine crashes (including .NET CLR/Core and hardware failures). This is crucial for correct behavior if you use Cluster Singleton or Cluster Sharding, especially together with Akka Persistence. | ||
|
||
> Note: while this feature is based on [Lightbend Reactive Platform Split Brain Resolver](https://doc.akka.io/docs/akka/rp-16s01p02/scala/split-brain-resolver.html) feature description, its implementation is result of free contribution and interpretation of Akka.NET team. Lightbend team doesn't take any responsibility for the state and correctness of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, no need to even mention this. You had to write all of the source code from scratch yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ofcourse he did. However considering akka.net is being positioned as a port of the JVM version. It does not hurt to point out that this particular part could behave differently, by design.
@@ -231,6 +231,13 @@ namespace Akka.Cluster | |||
public Akka.Actor.Props DowningActorProps { get; } | |||
public System.TimeSpan DownRemovalMargin { get; } | |||
} | |||
public sealed class SplitBrainResolver : Akka.Cluster.IDowningProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok - this is just a flavor of IDowningProvider
. That makes sense.
RunOn(() => | ||
{ | ||
Cluster.RegisterOnMemberRemoved(() => downed = true); | ||
}, minority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec looks good to me.
actor.provider = cluster | ||
cluster { | ||
down-removal-margin = 10s | ||
downing-provider-class = ""Akka.Cluster.SplitBrainResolver, Akka.Cluster"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone specifies a downing-provider-class
and an active split-brain-resolver
strategy, which one wins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A downing provider is an actual actor class that host all necessary logic i.e. gathering the information about the last known reachable and unreachable members, downing them, waiting for cluster to become stable etc.
A split brain resolver strategy is an actual decider that based on its own configuration and last known state of the cluster must make a decision, which nodes should be downed.
For a split brain to work you must provide both of them: SplitBrainResolver
as a downing-provider-class
and a particular implementation of split-brain-resolver.active-strategy
.
One more question: are any of the split brain resolvers enabled by default? |
@Aaronontheweb there are 4 of them, but you have to make explicit choice which one to use. By default |
I assume you mean "there is no downing provider if no strategy is set?" should default to the same behavior as it is currently. Correct? |
7e84cdd
to
3be25d0
Compare
This is WIP pull request implementing Split Brain Resolver feature for clusters. There are 4 different strategies being part of it:
Regarding MNTK specs, only KeepMajority has been implemented to make full E2E integration test. However to only difference from other strategies on that field is a single generic method - part of
ISplitBrainStrategy
interface, which is tested thoroughly on its own.