Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster: enable keep-majority default SBR #6628

Merged
merged 22 commits into from
Apr 5, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
9215e9b
enable `keep-majority` default SBR
Aaronontheweb Mar 30, 2023
fb42fda
Merge branch 'dev' into default-SBR
Aaronontheweb Mar 30, 2023
e9b08c2
added upgrade advisories to documentation and some spec / warning fixes
Aaronontheweb Apr 4, 2023
3782132
fixed typos
Aaronontheweb Apr 4, 2023
7ca8104
added documentation on how to disable the default downing provider
Aaronontheweb Apr 4, 2023
6e74d88
added API approvals
Aaronontheweb Apr 4, 2023
97eb1c2
disable SBR in MNTR
Aaronontheweb Apr 4, 2023
24d46b9
Merge branch 'dev' into default-SBR
Aaronontheweb Apr 4, 2023
248404f
Update MultiNodeClusterSpec.cs
Aaronontheweb Apr 4, 2023
9df4d23
fixed equality members on `InitJoin`
Aaronontheweb Apr 5, 2023
ed9918c
fix default auto-down-unreachable-after parse value
Aaronontheweb Apr 5, 2023
b6a4ab5
disable SBR in all clustering specs
Aaronontheweb Apr 5, 2023
3df700b
cleanup
Aaronontheweb Apr 5, 2023
2563099
reconfigured SBR for Akka.Cluster.Sharding specs
Aaronontheweb Apr 5, 2023
5ab3acc
fixed - had to adjust down-removal-margin
Aaronontheweb Apr 5, 2023
658ab28
Merge branch 'dev' into default-SBR
Aaronontheweb Apr 5, 2023
6ee93e5
fixed SBR issues with Akka.Cluster.Sharding MNTR
Aaronontheweb Apr 5, 2023
5337e46
Merge branch 'dev' into default-SBR
Arkatufus Apr 5, 2023
251fade
restored `auto-down-unreachable-after`
Aaronontheweb Apr 5, 2023
1705fc7
Merge branch 'dev' into default-SBR
Aaronontheweb Apr 5, 2023
7eedcdd
approave API changes
Aaronontheweb Apr 5, 2023
e45d60f
Merge branch 'default-SBR' of https://github.com/Aaronontheweb/akka.n…
Aaronontheweb Apr 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion docs/articles/clustering/split-brain-resolver.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,19 @@ Keep in mind that split brain resolver will NOT work when `akka.cluster.auto-dow

Beginning in Akka.NET v1.4.16, the Akka.NET project has ported the original split brain resolver implementations from Lightbend as they are now open source. The following section of documentation describes how Akka.NET's hand-rolled split brain resolvers are implemented.

> [!IMPORTANT]
> As of Akka.NET v1.5.2, the `keep-majority` split brain resolution strategy is now enabled by default. This should be acceptable for the majority of Akka.Cluster users, but please read on.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make it clear that there is now a default SBR enabled.


### Disabling the Default Downing Provider
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add section explaining how to disable default downing provider.


To disable the default Akka.Cluster downing provider, simply configure the following in your HOCON:

```hocon
akka.cluster.downing-provider-class = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, since we removed AutoDowning, what will happen if a user turn SBR off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in chat - nothing. AutoDowning was never enabled by default and we strongly discouraged users from ever enabling it. If the default SBR is disabled Akka.Cluster behaves exactly like it does today: unreachable nodes stay unreachable until downed manually by pbm or some other process.

```

This will disable the split brain resolver / downing provider functionality altogether in Akka.NET. This was the default behavior for Akka.Cluster as of Akka.NET v1.5.1 and earlier.

### Picking a Strategy

In order to enable an Akka.NET split brain resolver in your cluster (they are not enabled by default), you will want to update your `akka.cluster` HOCON configuration to the following:
Expand All @@ -59,7 +72,7 @@ This will cause the [`Akka.Cluster.SBR.SplitBrainResolverProvider`](xref:Akka.Cl
The following strategies are supported:

* `static-quorum`
* `keep-majority`
* `keep-majority` **(default)**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, tried to make it clear what the defaults are here.

* `keep-oldest`
* `down-all`
* `lease-majority`
Expand Down Expand Up @@ -144,6 +157,9 @@ akka.cluster.split-brain-resolver {

#### Keep Majority

> [!NOTE]
> `keep-majority` is the default SBR strategy for Akka.Cluster as of Akka.NET v1.5.2+.

The `keep-majority` strategy will down this part of the cluster, which sees a lesser part of the whole cluster. This choice is made based on the latest known state of the cluster. When cluster will split into two equal parts, the one which contains the lowest address, will survive.

When to use it? When your cluster can grow or shrink very dynamically.
Expand Down
42 changes: 42 additions & 0 deletions docs/community/whats-new/akkadotnet-v1.5-upgrade-advisories.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,48 @@ This document contains specific upgrade suggestions, warnings, and notices that
<iframe width="560" height="315" src="https://www.youtube.com/embed/-UPestlIw4k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<!-- markdownlint-enable MD033 -->

## Upgrading to Akka.NET v1.5.2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrade advisories for this PR as well as #6389


Akka.NET v1.5.2 introduces two important behavioral changes:

* [Akka.Persistence: need to remove hard-coded Newtonsoft.Json `object` serializer](https://github.com/akkadotnet/akka.net/issues/6389)
* [Akka.Cluster: enable `keep-majority` as default Split Brain Resolver](https://github.com/akkadotnet/akka.net/pull/6628)

We meant to include both of these changes in Akka.NET v1.5.0 but simply ran out of time before making them into that release.

### Akka.Persistence Changes

The impact of [Akka.Persistence: need to remove hard-coded Newtonsoft.Json `object` serializer](https://github.com/akkadotnet/akka.net/issues/6389) is pretty minor: all versions of Akka.NET prior to 1.5.2 used Newtonsoft.Json as the `object` serializer for Akka.Persistence regardless of whether or not you [used a custom `object` serializer, such as Hyperion](xref:serialization#complex-object-serialization-using-hyperion).

Going forward your user-defined `object` serialization binding will now be respected by Akka.Persistence. Any old data previously saved using Newtonsoft.Json will continue to be recovered automatically by Newtonsoft.Json - it's only the serialization of new objects inserted after upgrading to v1.5.2 that will be affected.

If you _never changed your `object`_ serializer (most users don't) then this change doesn't affect you.

### Akka.Cluster Split Brain Resolver Changes

As of Akka.NET v1.5.2 we've now enabled the `keep-majority` [Split Brain Resolver](xref:split-brain-resolver) by default.

If you were already running with a custom SBR enabled, this change won't affect you.

If you weren't running with an SBR enabled, you should read the [Akka.Cluster Split Brain Resolver documentation](xref:split-brain-resolver).

Also worth noting: we've disabled the `akka.cluster.auto-down-unreachable-after` setting as it's always been a poor and shoddy way to manage network partitions inside Akka.Cluster. If you have that setting enabled it will be ignored and you'll see the following warning appear instead:

```shell
The `auto-down-unreachable-after` feature has been deprecated as of Akka.NET v1.5.2 and will be removed in a future version of Akka.NET.
The `keep-majority` split brain resolver will be used instead. See https://getakka.net/articles/cluster/split-brain-resolver.html for more details.
```

#### Disabling the Default Downing Provider

To disable the default Akka.Cluster downing provider, simply configure the following in your HOCON:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated this section from the Split Brain Resolver page - explains how to disable the new default SBR, since I'm fairly certain we'll get a question about that in the future.


```hocon
akka.cluster.downing-provider-class = ""
```

This will disable the split brain resolver / downing provider functionality altogether in Akka.NET. This was the default behavior for Akka.Cluster as of Akka.NET v1.5.1 and earlier.

## Upgrading From Akka.NET v1.4 to v1.5

In case you need help upgrading:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ public ClusterShardingSpecConfig(
CommonConfig = ConfigurationFactory.ParseString($@"
akka.cluster.sharding.verbose-debug-logging = on
#akka.loggers = [""akka.testkit.SilenceAllTestEventListener""]

akka.cluster.downing-provider-class = ""Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec previously relied on auto-down-unreachable-after - in order to make it pass again we needed to:

  • Enable the SBR
  • Set the akka.cluster.split-brain-resolver.stable-after to the lowest possible value (must be > 0.0s)
  • Set the akka.cluster.down-removal-margin to the lowest possible value (must be > 0.0s)

akka.cluster.split-brain-resolver.stable-after = 1s
akka.cluster.down-removal-margin = 1s
akka.cluster.roles = [""backend""]
akka.cluster.distributed-data.gossip-interval = 1s
akka.persistence.journal.sqlite-shared.timeout = 10s #the original default, base test uses 5s
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ protected MultiNodeClusterShardingConfig(
Common =
ConfigurationFactory.ParseString($@"
akka.actor.provider = ""cluster""
akka.cluster.auto-down-unreachable-after = 0s
akka.cluster.downing-provider-class = ""Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster""
akka.cluster.split-brain-resolver.stable-after = 1s
akka.cluster.down-removal-margin = 1s
akka.cluster.sharding.state-store-mode = ""{mode}""
akka.cluster.sharding.remember-entities = {rememberEntities.ToString().ToLowerInvariant()}
akka.cluster.sharding.remember-entities-store = ""{rememberEntitiesStore}""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ public void ClusterSingletonManagerSettings_must_have_default_config()
clusterSingletonManagerSettings.SingletonName.ShouldBe("singleton");
clusterSingletonManagerSettings.Role.ShouldBe(null);
clusterSingletonManagerSettings.HandOverRetryInterval.TotalSeconds.ShouldBe(1);
clusterSingletonManagerSettings.RemovalMargin.TotalSeconds.ShouldBe(0);
clusterSingletonManagerSettings.RemovalMargin.TotalSeconds.ShouldBe(20); // now 20 due to default SBR settings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemovalMargin now uses SBR default value of 20 seconds (this is where the value is computed from.)


var config = Sys.Settings.Config.GetConfig("akka.cluster.singleton");
Assert.False(config.IsNullOrEmpty());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
[assembly: System.Runtime.Versioning.TargetFrameworkAttribute(".NETCoreApp,Version=v6.0", FrameworkDisplayName=".NET 6.0")]
namespace Akka.Cluster
{
[System.ObsoleteAttribute("No longer used as of Akka.NET v1.5.2")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoDowning is now obsolete - it was a bad idea anyway.

public sealed class AutoDowning : Akka.Cluster.IDowningProvider
{
public AutoDowning(Akka.Actor.ActorSystem system, Akka.Cluster.Cluster cluster) { }
Expand Down Expand Up @@ -192,6 +193,8 @@ namespace Akka.Cluster
public ClusterSettings(Akka.Configuration.Config config, string systemName) { }
public bool AllowWeaklyUpMembers { get; }
public Akka.Util.AppVersion AppVersion { get; }
[System.ObsoleteAttribute("No longer used as of Akka.NET v1.5.2 - clustering defaults to using KeepMajority " +
"SBR instead")]
public System.Nullable<System.TimeSpan> AutoDownUnreachableAfter { get; }
public System.Type DowningProviderType { get; }
public Akka.Configuration.Config FailureDetectorConfig { get; }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
[assembly: System.Runtime.Versioning.TargetFrameworkAttribute(".NETStandard,Version=v2.0", FrameworkDisplayName=".NET Standard 2.0")]
namespace Akka.Cluster
{
[System.ObsoleteAttribute("No longer used as of Akka.NET v1.5.2")]
public sealed class AutoDowning : Akka.Cluster.IDowningProvider
{
public AutoDowning(Akka.Actor.ActorSystem system, Akka.Cluster.Cluster cluster) { }
Expand Down Expand Up @@ -192,6 +193,8 @@ namespace Akka.Cluster
public ClusterSettings(Akka.Configuration.Config config, string systemName) { }
public bool AllowWeaklyUpMembers { get; }
public Akka.Util.AppVersion AppVersion { get; }
[System.ObsoleteAttribute("No longer used as of Akka.NET v1.5.2 - clustering defaults to using KeepMajority " +
"SBR instead")]
public System.Nullable<System.TimeSpan> AutoDownUnreachableAfter { get; }
public System.Type DowningProviderType { get; }
public Akka.Configuration.Config FailureDetectorConfig { get; }
Expand Down
1 change: 1 addition & 0 deletions src/core/Akka.Cluster.TestKit/MultiNodeClusterSpec.cs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ public static Config ClusterConfig()
retry-interval = 200ms
waiting-for-state-timeout = 200ms
}
#downing-provider-class = """" # disable default SBR
}
akka.loglevel = INFO
akka.log-dead-letters = off
Expand Down
22 changes: 22 additions & 0 deletions src/core/Akka.Cluster.Tests/ClusterConfigSpec.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using System;
using System.Collections.Immutable;
using Akka.Actor;
using Akka.Cluster.SBR;
using Akka.Configuration;
using Akka.Dispatch;
using Akka.Remote;
Expand Down Expand Up @@ -44,7 +45,9 @@ public void Clustering_must_be_able_to_parse_generic_cluster_config_elements()
settings.AllowWeaklyUpMembers.Should().BeTrue();
settings.WeaklyUpAfter.Should().Be(7.Seconds());
settings.PublishStatsInterval.Should().NotHaveValue();
#pragma warning disable CS0618
settings.AutoDownUnreachableAfter.Should().NotHaveValue();
#pragma warning restore CS0618
settings.DownRemovalMargin.Should().Be(TimeSpan.Zero);
settings.MinNrOfMembers.Should().Be(1);
settings.MinNrOfMembersOfRole.Should().Equal(ImmutableDictionary<string, int>.Empty);
Expand All @@ -71,6 +74,13 @@ public void Clustering_must_be_able_to_parse_generic_cluster_config_elements()
settings.VerboseHeartbeatLogging.Should().BeFalse();
settings.VerboseGossipReceivedLogging.Should().BeFalse();
settings.RunCoordinatedShutdownWhenDown.Should().BeTrue();

// downing provider settings
settings.DowningProviderType.Should().Be<SplitBrainResolverProvider>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert the new default SBR settings.

var sbrSettings = new SplitBrainResolverSettings(Sys.Settings.Config);
sbrSettings.DowningStableAfter.Should().Be(20.Seconds());
sbrSettings.DownAllWhenUnstable.Should().Be(15.Seconds()); // 3/4 OF DowningStableAfter
sbrSettings.DowningStrategy.Should().Be("keep-majority");
}

/// <summary>
Expand All @@ -83,5 +93,17 @@ public void Clustering_should_parse_nondefault_AppVersion()
var settings = new ClusterSettings(config.WithFallback(Sys.Settings.Config), Sys.Name);
settings.AppVersion.Should().Be(AppVersion.Zero);
}

/// <summary>
/// Validate that we can disable the default downing provider if needed
/// </summary>
[Fact]
public void Cluster_should_allow_disabling_of_default_DowningProvider()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate that the new defaults can be disabled / overridden.

{
// configure HOCON to disable the default akka.cluster downing provider
Config config = "akka.cluster.downing-provider-class = \"\"";
var settings = new ClusterSettings(config.WithFallback(Sys.Settings.Config), Sys.Name);
settings.DowningProviderType.Should().Be<NoDowning>();
}
}
}
12 changes: 6 additions & 6 deletions src/core/Akka.Cluster.Tests/DowningProviderSpec.cs
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ public Props DowningActorProps
}
}

class DummyDowningProvider : IDowningProvider
internal class DummyDowningProvider : IDowningProvider
{
public readonly AtomicBoolean ActorPropsAccessed = new AtomicBoolean(false);
public DummyDowningProvider(ActorSystem system, Cluster cluster)
Expand Down Expand Up @@ -69,21 +69,21 @@ public class DowningProviderSpec : AkkaSpec
");

[Fact]
public void Downing_provider_should_default_to_NoDowning()
public void Downing_provider_should_default_to_KeepMajority()
{
using (var system = ActorSystem.Create("default", BaseConfig))
{
Cluster.Get(system).DowningProvider.Should().BeOfType<NoDowning>();
Cluster.Get(system).DowningProvider.Should().BeOfType<Akka.Cluster.SBR.SplitBrainResolverProvider>();
}
}

[Fact]
public void Downing_provider_should_use_AutoDowning_if_auto_down_unreachable_after_is_configured()
public void Downing_provider_should_ignore_AutoDowning_if_auto_down_unreachable_after_is_configured()
{
var config = ConfigurationFactory.ParseString(@"akka.cluster.auto-down-unreachable-after=18s");
using (var system = ActorSystem.Create("auto-downing", config.WithFallback(BaseConfig)))
{
Cluster.Get(system).DowningProvider.Should().BeOfType<AutoDowning>();
Cluster.Get(system).DowningProvider.Should().BeOfType<Akka.Cluster.SBR.SplitBrainResolverProvider>();
}
}

Expand All @@ -97,7 +97,7 @@ public void Downing_provider_should_use_specified_downing_provider()
var downingProvider = Cluster.Get(system).DowningProvider;
downingProvider.Should().BeOfType<DummyDowningProvider>();
AwaitCondition(() =>
(downingProvider as DummyDowningProvider).ActorPropsAccessed.Value,
((DummyDowningProvider)downingProvider).ActorPropsAccessed.Value,
TimeSpan.FromSeconds(3));
}
}
Expand Down
3 changes: 3 additions & 0 deletions src/core/Akka.Cluster/AutoDown.cs
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ private void Remove(UniqueAddress node)
/// <summary>
/// Used when no custom provider is configured and 'auto-down-unreachable-after' is enabled.
/// </summary>
[Obsolete("No longer used as of Akka.NET v1.5.2")]
public sealed class AutoDowning : IDowningProvider
{
private readonly ActorSystem _system;
Expand All @@ -296,7 +297,9 @@ public Props DowningActorProps
{
get
{
#pragma warning disable CS0618 // disable obsolete warning here because this entire class is obsolete
var autoDownUnreachableAfter = _cluster.Settings.AutoDownUnreachableAfter;
#pragma warning restore CS0618
if (!autoDownUnreachableAfter.HasValue)
throw new ConfigurationException("AutoDowning downing provider selected but 'akka.cluster.auto-down-unreachable-after' not set");

Expand Down
24 changes: 17 additions & 7 deletions src/core/Akka.Cluster/Cluster.cs
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,11 @@ static Cluster()
bool GetAssertInvariants()
{
var isOn = Environment.GetEnvironmentVariable("AKKA_CLUSTER_ASSERT")?.ToLowerInvariant();
switch (isOn)
return isOn switch
{
case "on":
return true;
default:
return false;
}
"on" => true,
_ => false
};
}

IsAssertInvariantsEnabled = GetAssertInvariants();
Expand Down Expand Up @@ -114,12 +112,24 @@ public Cluster(ActorSystemImpl system)
System = system;
Settings = new ClusterSettings(system.Settings.Config, system.Name);

if (!(system.Provider is IClusterActorRefProvider provider))
if (system.Provider is not IClusterActorRefProvider provider)
throw new ConfigurationException(
$"ActorSystem {system} needs to have a 'IClusterActorRefProvider' enabled in the configuration, currently uses {system.Provider.GetType().FullName}");
SelfUniqueAddress = new UniqueAddress(provider.Transport.DefaultAddress, AddressUidExtension.Uid(system));

_log = Logging.GetLogger(system, "Cluster");

// log a warning if the user has set auto-down-unreachable-after to any value other than "off"
// obsolete setting, so suppress obsolete warning
#pragma warning disable CS0618
if (Settings.AutoDownUnreachableAfter != null)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user still has akka.cluster.auto-down-unreachable-after enabled, we raise a warning to let them know that this settings is now being ignored and keep-majority is going to be used instead.

#pragma warning restore CS0618
{
_log.Warning(
"The `auto-down-unreachable-after` feature has been deprecated as of Akka.NET v1.5.2 and will be removed in a future version of Akka.NET. " +
"The `keep-majority` split brain resolver will be used instead. See https://getakka.net/articles/cluster/split-brain-resolver.html for more details.");
}


CurrentInfoLogger = new InfoLogger(_log, Settings, SelfAddress);

Expand Down
12 changes: 11 additions & 1 deletion src/core/Akka.Cluster/ClusterDaemon.cs
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
return obj is Welcome && Equals((Welcome)obj);
return obj is Welcome welcome && Equals(welcome);
}

private bool Equals(Welcome other)
Expand Down Expand Up @@ -290,6 +290,16 @@ public override bool Equals(object obj)
{
return obj is InitJoin;
}

protected bool Equals(InitJoin other)
{
return true;
}

public override int GetHashCode()
{
return 1;
}
}

/// <inheritdoc cref="JoinSeenNode"/>
Expand Down
Loading