Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance on IActorRef.Child API #5242

Merged
merged 20 commits into from
Sep 2, 2021

Conversation

Aaronontheweb
Copy link
Member

Builds upon #5241, but introduces some breaking API changes.

Performance numbers:

dev

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 64.69 ns 0.145 ns 0.129 ns - -
Resolve3DeepChildRepointableActorRef 652.15 ns 9.237 ns 8.641 ns 0.0839 352 B
Resolve3DeepChildLocalActorRef 311.74 ns 4.924 ns 4.605 ns 0.0629 264 B

This PR

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 47.40 ns 0.018 ns 0.014 ns - -
Resolve3DeepChildRepointableActorRef 398.67 ns 6.259 ns 5.855 ns 0.0496 208 B
Resolve3DeepChildLocalActorRef 227.38 ns 2.343 ns 1.830 ns 0.0420 176 B

@@ -1533,7 +1533,7 @@ namespace Akka.Actor
public override Akka.Actor.ActorPath Path { get; }
public override Akka.Actor.IActorRefProvider Provider { get; }
public override Akka.Actor.ICell Underlying { get; }
public override Akka.Actor.IActorRef GetChild(System.Collections.Generic.IEnumerable<string> name) { }
public override Akka.Actor.IActorRef GetChild(System.Collections.Generic.IReadOnlyList<string> name) { }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source-compatible, but not necessarily a binary-compatible change.

@Aaronontheweb
Copy link
Member Author

Looks like this may have broken cluster formation based on some of the testing I'm doing locally. I'll check it out some once I get reports back from the test suite.

@Aaronontheweb
Copy link
Member Author

Looks like there's some work still to be done with the RemoteDaemon and how it resolves remotely deployed actors...

@Aaronontheweb
Copy link
Member Author

So once #5243 gets merged, new baseline numbers for dev:

New numbers

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 67.12 ns 0.041 ns 0.035 ns - -
Resolve3DeepChildRepointableActorRef 534.29 ns 2.166 ns 1.809 ns 0.0324 136 B
Resolve3DeepChildLocalActorRef 367.67 ns 0.650 ns 0.543 ns 0.0095 40 B

Numbers for this PR:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 49.00 ns 0.150 ns 0.141 ns - -
Resolve3DeepChildRepointableActorRef 382.39 ns 2.349 ns 2.197 ns 0.0172 72 B
Resolve3DeepChildLocalActorRef 362.73 ns 1.708 ns 1.514 ns 0.0095 40 B

Top level actor resolves (i.e. ones that go through Akka.Cluster.Sharding) got a lot faster, but ones beneath the top-level actors stayed pretty much the same.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Sep 2, 2021

VirtualPathContainer resolve performance baseline

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 66.71 ns 0.048 ns 0.040 ns - -
Resolve3DeepChildRepointableActorRef 540.70 ns 1.754 ns 1.555 ns 0.0324 136 B
Resolve3DeepChildLocalActorRef 368.26 ns 1.574 ns 1.472 ns 0.0095 40 B
ResolveVirtualPathContainer 93.03 ns 0.532 ns 0.471 ns 0.0305 128 B

On this PR:

Forgot to add baseline numbers for this:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 49.81 ns 0.171 ns 0.160 ns - -
Resolve3DeepChildRepointableActorRef 372.47 ns 2.948 ns 2.758 ns 0.0172 72 B
Resolve3DeepChildLocalActorRef 398.80 ns 2.803 ns 2.485 ns 0.0095 40 B
ResolveVirtualPathContainer 94.21 ns 0.885 ns 0.828 ns 0.0172 72 B


public T this[int index]
{
get => _array.ElementAt(Offset + index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _array is an IReadOnlyList<T> here we may be better off using the Indexer on that interface rather than ElementAt.

  • If _array does not also implement IList<T> we will force an enumeration
  • .ElementAt adds additional null and type checks even in best case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's a bit of leftovers from when I was using IReadOnlyCollection instead of IReadOnlyList - the former doesn't have an indexer. I'll update that.

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb
Copy link
Member Author

Even better numbers after fixing the indexer operation

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 49.69 ns 0.224 ns 0.187 ns - -
Resolve3DeepChildRepointableActorRef 371.25 ns 2.587 ns 2.419 ns 0.0172 72 B
Resolve3DeepChildLocalActorRef 387.38 ns 2.858 ns 2.534 ns 0.0095 40 B
ResolveVirtualPathContainer 89.60 ns 0.842 ns 0.788 ns 0.0172 72 B

@Aaronontheweb
Copy link
Member Author

Although IMHO those differences are probably in the margin of error

@Aaronontheweb
Copy link
Member Author

I wonder if getting rid of these deeply nested TryGet(... out) calls would also help....

@Aaronontheweb
Copy link
Member Author

Perf numbers from simplified branching:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 50.69 ns 0.028 ns 0.022 ns - -
Resolve3DeepChildRepointableActorRef 354.04 ns 2.459 ns 2.300 ns 0.0172 72 B
Resolve3DeepChildLocalActorRef 362.21 ns 3.093 ns 2.893 ns 0.0095 40 B
ResolveVirtualPathContainer 89.11 ns 1.173 ns 1.097 ns 0.0172 72 B

@Aaronontheweb
Copy link
Member Author

Perf numbers from removing redundant Try(... out) methods

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ResolveChild 50.04 ns 0.137 ns 0.122 ns - -
Resolve3DeepChildRepointableActorRef 345.53 ns 1.424 ns 1.112 ns 0.0172 72 B
Resolve3DeepChildLocalActorRef 328.63 ns 1.803 ns 1.598 ns 0.0095 40 B
ResolveVirtualPathContainer 88.01 ns 1.020 ns 0.905 ns 0.0172 72 B

@Aaronontheweb Aaronontheweb marked this pull request as ready for review September 2, 2021 16:34
@@ -93,7 +93,6 @@ namespace Akka.Actor
public System.Collections.Generic.IEnumerable<Akka.Actor.IInternalActorRef> GetChildren() { }
public static Akka.Actor.IActorRef GetCurrentSelfOrNoSender() { }
public static Akka.Actor.IActorRef GetCurrentSenderOrNoSender() { }
[System.ObsoleteAttribute("Use TryGetSingleChild [0.7.1]")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed TryGetSingleChild since no one was actually using its boolean functionality at all


var child = GetChild(concatenatedChildNames);
var child = GetChild(concatenatedChildNames.ToList());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra allocation in remote deployments here.... might see if I can go about fixing that still....

@@ -313,7 +311,7 @@ public override IActorRef GetChild(IEnumerable<string> name)
{
if (uid != ActorCell.UndefinedUid && uid != child.Path.Uid)
return Nobody.Instance;
return n == 0 ? child : child.GetChild(name.TakeRight(n));
return n == 0 ? child : child.GetChild(name.TakeRight(n).ToList());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another allocation in remote deployments

@@ -353,58 +353,37 @@ private bool TryGetChildRestartStatsByName(string name, out ChildRestartStats ch
/// </summary>
/// <param name="name">N/A</param>
/// <returns>N/A</returns>
[Obsolete("Use TryGetSingleChild [0.7.1]")]
public IInternalActorRef GetSingleChild(string name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Half the perf optimizations happened here:

  • Simplified branching
  • Remove Try... out stuff since it was all functionally being converted back into ActorRefs.Nobody anyway.

/// <see cref="ArraySegment{T}"/> but for <see cref="IReadOnlyList{T}"/>
/// </summary>
/// <typeparam name="T"></typeparam>
internal struct ListSlice<T> : IList<T>, IReadOnlyList<T>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically an ArraySegment that takes a type of IReadOnlyList<T>

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit d25db5d into akkadotnet:dev Sep 2, 2021
@Aaronontheweb Aaronontheweb deleted the perf/improve-ResolveAPI branch September 2, 2021 17:22
@Aaronontheweb Aaronontheweb added this to the 1.4.25 milestone Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants