Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split brain resolver #3180

Merged
merged 9 commits into from
Nov 13, 2017
Merged

Conversation

Horusiath
Copy link
Contributor

@Horusiath Horusiath commented Oct 30, 2017

This is WIP pull request implementing Split Brain Resolver feature for clusters. There are 4 different strategies being part of it:

  • Static quorum
  • Keep majority
  • Keep oldest
  • Keep referee

Regarding MNTK specs, only KeepMajority has been implemented to make full E2E integration test. However to only difference from other strategies on that field is a single generic method - part of ISplitBrainStrategy interface, which is tested thoroughly on its own.

@Aaronontheweb
Copy link
Member

@Horusiath missing API approvals for Akka.Cluster; I still need to do a review, but that's what's throwing off the specs

@Horusiath
Copy link
Contributor Author

@Aaronontheweb this is still incomplete - tests are still missing. I also need to refactor this a little. Right now different strategies are represented as different actor types, but during development I've understood, that they can be just standard classes with one common interface. This will be a lot easier to test.

@Horusiath Horusiath removed the WIP label Nov 1, 2017
@Horusiath Horusiath changed the title [WIP] Split brain resolver Split brain resolver Nov 1, 2017
@Horusiath
Copy link
Contributor Author

Ready for review.

@Horusiath Horusiath added the ready label Nov 1, 2017
RunOn(() =>
{
Cluster.RegisterOnMemberRemoved(() => downed = true);
}, minority);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I need to make sure, that the nodes from minority group have to be removed from the cluster, I've found this way to be the most reliable - I simply register a flag setup on removing current node from the cluster, for which I make assertion later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec looks good to me.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but have a question in the comments that would be helpful to know


When operating an Akka cluster you must consider how to handle [network partitions](https://en.wikipedia.org/wiki/Network_partition) (a.k.a. split brain scenarios) and machine crashes (including .NET CLR/Core and hardware failures). This is crucial for correct behavior if you use Cluster Singleton or Cluster Sharding, especially together with Akka Persistence.

> Note: while this feature is based on [Lightbend Reactive Platform Split Brain Resolver](https://doc.akka.io/docs/akka/rp-16s01p02/scala/split-brain-resolver.html) feature description, its implementation is result of free contribution and interpretation of Akka.NET team. Lightbend team doesn't take any responsibility for the state and correctness of it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, no need to even mention this. You had to write all of the source code from scratch yourself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ofcourse he did. However considering akka.net is being positioned as a port of the JVM version. It does not hurt to point out that this particular part could behave differently, by design.

@@ -231,6 +231,13 @@ namespace Akka.Cluster
public Akka.Actor.Props DowningActorProps { get; }
public System.TimeSpan DownRemovalMargin { get; }
}
public sealed class SplitBrainResolver : Akka.Cluster.IDowningProvider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok - this is just a flavor of IDowningProvider. That makes sense.

RunOn(() =>
{
Cluster.RegisterOnMemberRemoved(() => downed = true);
}, minority);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec looks good to me.

actor.provider = cluster
cluster {
down-removal-margin = 10s
downing-provider-class = ""Akka.Cluster.SplitBrainResolver, Akka.Cluster""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone specifies a downing-provider-class and an active split-brain-resolver strategy, which one wins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A downing provider is an actual actor class that host all necessary logic i.e. gathering the information about the last known reachable and unreachable members, downing them, waiting for cluster to become stable etc.

A split brain resolver strategy is an actual decider that based on its own configuration and last known state of the cluster must make a decision, which nodes should be downed.

For a split brain to work you must provide both of them: SplitBrainResolver as a downing-provider-class and a particular implementation of split-brain-resolver.active-strategy.

@Aaronontheweb
Copy link
Member

One more question: are any of the split brain resolvers enabled by default?

@Horusiath
Copy link
Contributor Author

@Aaronontheweb there are 4 of them, but you have to make explicit choice which one to use. By default akka.cluster.split-brain-resolver.active-strategy is equal to off, which means, that it will break unless user will specify actual strategy.

@Aaronontheweb
Copy link
Member

which means, that it will break unless user will specify actual strategy.

I assume you mean "there is no downing provider if no strategy is set?" should default to the same behavior as it is currently. Correct?

@Horusiath Horusiath force-pushed the split-brain-resolvers branch from 7e84cdd to 3be25d0 Compare November 11, 2017 15:23
@Aaronontheweb Aaronontheweb merged commit dc946b4 into akkadotnet:dev Nov 13, 2017
@Aaronontheweb Aaronontheweb added this to the 1.3.3 milestone Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants