Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster: enable keep-majority default SBR #6628

Merged
merged 22 commits into from
Apr 5, 2023

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Mar 30, 2023

Changes

Enables keep-majority as the default SBR and turns it on by default for Akka.Cluster. This was a planned change for Akka.NET v1.5.0 but we didn't implement it.

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Enables `keep-majority` as the default SBR and turns it on by default for Akka.Cluster. This was a planned change for Akka.NET v1.5.0 but we didn't implement it.
@Aaronontheweb
Copy link
Member Author

I think I need to write some updates to the documentation and a release advisory for this change as well

@Aaronontheweb
Copy link
Member Author

Looks like I need to update the DowningProviderSpec as well

@Aaronontheweb Aaronontheweb changed the title enable keep-majority default SBR Akka.Cluster: enable keep-majority default SBR Apr 4, 2023
@Aaronontheweb Aaronontheweb marked this pull request as ready for review April 4, 2023 19:56
Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed my changes

@@ -18,6 +18,7 @@
[assembly: System.Runtime.Versioning.TargetFrameworkAttribute(".NETCoreApp,Version=v6.0", FrameworkDisplayName=".NET 6.0")]
namespace Akka.Cluster
{
[System.ObsoleteAttribute("No longer used as of Akka.NET v1.5.2")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoDowning is now obsolete - it was a bad idea anyway.

@@ -37,7 +37,7 @@ public void ClusterSingletonManagerSettings_must_have_default_config()
clusterSingletonManagerSettings.SingletonName.ShouldBe("singleton");
clusterSingletonManagerSettings.Role.ShouldBe(null);
clusterSingletonManagerSettings.HandOverRetryInterval.TotalSeconds.ShouldBe(1);
clusterSingletonManagerSettings.RemovalMargin.TotalSeconds.ShouldBe(0);
clusterSingletonManagerSettings.RemovalMargin.TotalSeconds.ShouldBe(20); // now 20 due to default SBR settings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemovalMargin now uses SBR default value of 20 seconds (this is where the value is computed from.)

@@ -11,6 +11,48 @@ This document contains specific upgrade suggestions, warnings, and notices that
<iframe width="560" height="315" src="https://www.youtube.com/embed/-UPestlIw4k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<!-- markdownlint-enable MD033 -->

## Upgrading to Akka.NET v1.5.2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrade advisories for this PR as well as #6389


#### Disabling the Default Downing Provider

To disable the default Akka.Cluster downing provider, simply configure the following in your HOCON:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated this section from the Split Brain Resolver page - explains how to disable the new default SBR, since I'm fairly certain we'll get a question about that in the future.

@@ -38,6 +38,19 @@ Keep in mind that split brain resolver will NOT work when `akka.cluster.auto-dow

Beginning in Akka.NET v1.4.16, the Akka.NET project has ported the original split brain resolver implementations from Lightbend as they are now open source. The following section of documentation describes how Akka.NET's hand-rolled split brain resolvers are implemented.

> [!IMPORTANT]
> As of Akka.NET v1.5.2, the `keep-majority` split brain resolution strategy is now enabled by default. This should be acceptable for the majority of Akka.Cluster users, but please read on.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make it clear that there is now a default SBR enabled.

src/core/Akka.Cluster/ClusterDaemon.cs Outdated Show resolved Hide resolved
DowningProviderType = typeof(AutoDowning);
else
DowningProviderType = typeof(NoDowning);
DowningProviderType = !string.IsNullOrEmpty(downingProviderClassName) ? Type.GetType(downingProviderClassName, true) : typeof(NoDowning);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled loading of AutoDowning here as it's no longer supported.

# formed in case of network partition.
# Disable with "off" or specify a duration to enable auto-down.
# If a downing-provider-class is configured this setting is ignored.
auto-down-unreachable-after = off
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed auto-down-unreachable-after setting from the base config.

@@ -63,7 +55,9 @@ akka {
# `Akka.Cluster.IDowningProvider` having two argument constructor:
# - argument 1: accepting an `ActorSystem`
# - argument 2: accepting an `Akka.Cluster.Cluster`
downing-provider-class = ""
downing-provider-class = "Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the major change - the default SBR.

@@ -379,6 +379,7 @@ public static Config BaseConfig
}
}
}
cluster.downing-provider-class = """" #disable SBR by default
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compat for the MNTR - keep the SBR disabled by default. Most tests probably don't want the SBR running anyway unless they explicitly enable it.

AutoDownUnreachableAfter = clusterConfig.GetTimeSpanWithOffSwitch("auto-down-unreachable-after");
if (AutoDownUnreachableAfter == default(TimeSpan)) // need to restore correct defaults now that it's been deleted from HOCON file
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a work-around needed to restore the "correct" defaults for AutoDownUnreachableAfter now that it's no longer included in the default cluster.conf file.

@@ -58,7 +58,9 @@ public class ClusterShardingSpecConfig : MultiNodeClusterShardingConfig
CommonConfig = ConfigurationFactory.ParseString($@"
akka.cluster.sharding.verbose-debug-logging = on
#akka.loggers = [""akka.testkit.SilenceAllTestEventListener""]

akka.cluster.downing-provider-class = ""Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec previously relied on auto-down-unreachable-after - in order to make it pass again we needed to:

  • Enable the SBR
  • Set the akka.cluster.split-brain-resolver.stable-after to the lowest possible value (must be > 0.0s)
  • Set the akka.cluster.down-removal-margin to the lowest possible value (must be > 0.0s)

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a question

To disable the default Akka.Cluster downing provider, simply configure the following in your HOCON:

```hocon
akka.cluster.downing-provider-class = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, since we removed AutoDowning, what will happen if a user turn SBR off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in chat - nothing. AutoDowning was never enabled by default and we strongly discouraged users from ever enabling it. If the default SBR is disabled Akka.Cluster behaves exactly like it does today: unreachable nodes stay unreachable until downed manually by pbm or some other process.

@Arkatufus Arkatufus enabled auto-merge (squash) April 5, 2023 16:49
@Aaronontheweb
Copy link
Member Author

Local test runner appears to be crashing on Windows - I'll re-run it once the rest of the PR completes testing.

Test Run Aborted with error System.Exception: One or more errors occurred.
 ---> System.Exception: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..

 ---> System.Exception: An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
   at System.Net.Sockets.NetworkStream.ReadByte()
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---.

Do not press Update Branch

@Aaronontheweb Aaronontheweb merged commit e53c7b0 into akkadotnet:dev Apr 5, 2023
@Aaronontheweb Aaronontheweb deleted the default-SBR branch April 5, 2023 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants