Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimenting with RemoteActorRefProvider address resolution performance #5228

Conversation

Aaronontheweb
Copy link
Member

Per @to11mtm 's reporting in our Gitter chat room...

RemotePingPong

Before

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    16
ClockSpeed:                        0 MHZ
Actor Count:                       32
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      111

Num clients, Total [msg], Msgs/sec, Total [ms]
         1,  200000,     96154,    2080.77
         5, 1000000,    187442,    5335.46
        10, 2000000,    188307,   10621.24
        15, 3000000,    190767,   15726.12
        20, 4000000,    189988,   21054.78
        25, 5000000,    188694,   26498.56
        30, 6000000,    188543,   31823.73

After

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    16                             
ClockSpeed:                        0 MHZ                          
Actor Count:                       32                             
Messages sent/received per client: 200000  (2e5)                  
Is Server GC:                      True                           
Thread count:                      111                            
                                                                  
Num clients, Total [msg], Msgs/sec, Total [ms]                    
         1,  200000,    117717,    1699.19                        
         5, 1000000,    197278,    5069.24                        
        10, 2000000,    196677,   10169.66                        
        15, 3000000,    194075,   15458.38                        
        20, 4000000,    194657,   20549.84                        
        25, 5000000,    193641,   25821.94                        
        30, 6000000,    191712,   31297.06                        

Sharding Performance

Before

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  Job-RAVSYY : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method StateMode MsgCount Mean Error StdDev
SingleRequestResponseToLocalEntity Persistence 10000 114.761 ms 2.2856 ms 4.2364 ms
StreamingToLocalEntity Persistence 10000 5.394 ms 0.4011 ms 1.1574 ms
SingleRequestResponseToRemoteEntity Persistence 10000 4,348.561 ms 22.9590 ms 21.4758 ms
StreamingToRemoteEntity Persistence 10000 505.702 ms 9.7041 ms 11.9176 ms
SingleRequestResponseToLocalEntity DData 10000 112.387 ms 2.2285 ms 3.0504 ms
StreamingToLocalEntity DData 10000 5.114 ms 0.2683 ms 0.7436 ms
SingleRequestResponseToRemoteEntity DData 10000 4,396.789 ms 18.9920 ms 17.7651 ms
StreamingToRemoteEntity DData 10000 512.592 ms 10.1481 ms 10.4213 ms

After

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
  [Host]     : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
  Job-HYNWDT : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method StateMode MsgCount Mean Error StdDev
SingleRequestResponseToLocalEntity Persistence 10000 129.718 ms 3.2318 ms 9.427 ms
StreamingToLocalEntity Persistence 10000 6.162 ms 0.5868 ms 1.721 ms
SingleRequestResponseToRemoteEntity Persistence 10000 4,305.384 ms 34.0898 ms 31.888 ms
StreamingToRemoteEntity Persistence 10000 511.976 ms 10.0823 ms 12.382 ms
SingleRequestResponseToLocalEntity DData 10000 117.223 ms 2.3304 ms 5.628 ms
StreamingToLocalEntity DData 10000 5.616 ms 0.4011 ms 1.164 ms
SingleRequestResponseToRemoteEntity DData 10000 4,519.704 ms 81.3376 ms 76.083 ms
StreamingToRemoteEntity DData 10000 506.318 ms 5.9951 ms 5.608 ms

@Aaronontheweb
Copy link
Member Author

<html>
<body>
<!--StartFragment-->

SingleRequestResponseToRemoteEntity | DData | 10000 | 4,519.704 ms | 81.3376 ms | 76.083 ms
-- | -- | -- | -- | -- | --


<!--EndFragment-->
</body>
</html>

STDDEV is way up here - going to re-run it.

@Aaronontheweb
Copy link
Member Author

@to11mtm should we just fix the == operator to do the right thing?

@to11mtm
Copy link
Member

to11mtm commented Aug 26, 2021

@to11mtm should we just fix the == operator to do the right thing?

That would probably be the better thing to do TBH.

@Zetanova
Copy link
Contributor

Transport.Addresses.Any(a => a.Equals(address)) is for sure slower as a foreach in a hot-path
I don't know how ofter the Any will be executed or if the compiler makes some magic with it.
If there is no magic then the branch and a foreach is faster then a Linq with scope and no branch

And maybe there is a fast check to return early?

public bool HasAddress(Address address)
{
            if(address.Equals(_local.RootPath.Address) || address.Equals(RootPath.Address))
                  return true;
            //maybe there is a fast check to abort here

            foreach(var a in Transport.Addresses)
                  if(a.Equals(address)) return true;

            return false;
}

@Zetanova
Copy link
Contributor

The foreach is not good too.
It is creating a try-catch

Because the Transport.Addresses is of type ISet

the simple and obvious Contains method is the best fit

public bool HasAddress(Address address)
{
         retrun address.Equals(_local.RootPath.Address) || address.Equals(RootPath.Address) 
                    || Transport.Addresses.Contains(address);
}

see: https://sharplab.io/#gist:b0a2cc23e7748f71577070b2748e462e

@Aaronontheweb
Copy link
Member Author

Great suggestions @Zetanova - I'll update the PR with some numbers for those later this week.

@Aaronontheweb
Copy link
Member Author

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    16
ClockSpeed:                        0 MHZ
Actor Count:                       32
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      111

Num clients, Total [msg], Msgs/sec, Total [ms]
         1,  200000,    106952,    1870.69
         5, 1000000,    197981,    5051.82
        10, 2000000,    198709,   10065.92
        15, 3000000,    197239,   15210.15
        20, 4000000,    196329,   20374.52
        25, 5000000,    196634,   25428.24
        30, 6000000,    195015,   30767.75        

Final numbers look good to me!

@Aaronontheweb Aaronontheweb added this to the 1.4.25 milestone Aug 31, 2021
@@ -256,7 +258,8 @@ public Address WithPort(int? port = null)
/// <returns><c>true</c> if both addresses are not equal; otherwise <c>false</c></returns>
public static bool operator !=(Address left, Address right)
{
return !Equals(left, right);
return !ReferenceEquals(left, right) &&
(left?.Equals(right) ?? true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect. if left equals right we should return false. (I think the null coalesce is making it hard to infer here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah based on the test suite I probably did that wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought I was !-ing the entire statement, not just the first clause.

@Zetanova
Copy link
Contributor

Zetanova commented Aug 31, 2021

The equals itself is not optimal, the Port nullable can be better compared.

The EqualityComparer<Address>.Default.Equlas(left, right) can be used,
it should have a better perf then a 2x branching with an own code (not sure)

See code at:
https://sharplab.io/#gist:f028291550b1c641019aca408efdbc3a

@Aaronontheweb
Copy link
Member Author

The equals itself is not optimal, the Port nullable can be better compared.

The EqualityComparer<Address>.Default.Equlas(left, right) can be used,
it should have a better perf then a 2x branching with an own code (not sure)

See code at:
https://sharplab.io/#gist:f028291550b1c641019aca408efdbc3a

  public bool Equals(Address other)
    {
        return other != null
            && string.Equals(_host, other._host) 
            && string.Equals(_system, other._system) 
            && string.Equals(_protocol, other._protocol)
            && (_port ?? 0) == (other._port ?? 0);
    }

You want the port comparison to go first - as it's the fastest route to exit the loop for two values who are not equal. You also still want to check for equality by reference in this method in case it's called directly via the .Equals method

@Aaronontheweb
Copy link
Member Author

Final numbers:

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    16
ClockSpeed:                        0 MHZ
Actor Count:                       32
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      112

Num clients, Total [msg], Msgs/sec, Total [ms]
         1,  200000,    108755,    1839.86
         5, 1000000,    200201,    4995.26
        10, 2000000,    199442,   10028.27
        15, 3000000,    198900,   15083.07
        20, 4000000,    199552,   20045.01
        25, 5000000,    198413,   25200.80
        30, 6000000,    195224,   30734.52

Looks to me like the technical errors have been resolved as well.

@to11mtm
Copy link
Member

to11mtm commented Sep 3, 2021

The equals itself is not optimal, the Port nullable can be better compared.

The EqualityComparer<Address>.Default.Equlas(left, right) can be used,
it should have a better perf then a 2x branching with an own code (not sure)

See code at:
https://sharplab.io/#gist:f028291550b1c641019aca408efdbc3a

I'm not sure... the ASM generated looks too sparse for what we're doing here. I'm not sure if Sharplab is acccurate here.

I would expect something like this would be the 'best' way to do ==:

    public static bool operator ==(Address left, Address right)
    {
            if (left != null) {
                return left.Equals(right);
            }
            return right == null;
    }

This is based on how .Equals() itself is implemented in GenericEqualityComparer, cuts out a branch since our .Equals will do the correct thing if right is null, and will be Devirted (Whereas GenericEqualityComparer AFAIK is virt calls since it's on the generic interface, unless there's some compiler magic I'm unaware of)

@Aaronontheweb
Copy link
Member Author

Thanks @to11mtm - I'll make that change and then wrap this PR up.

@Aaronontheweb
Copy link
Member Author

@Zetanova you've been a great help on this PR btw - really appreciate your great feedback.

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) September 6, 2021 22:59
@Zetanova
Copy link
Contributor

Zetanova commented Sep 7, 2021

I looked at the whole class

The RootPath.Address is the same as _local.RootPath.Address
currently the equals will perform twice on the same address

public bool HasAddress(Address address)
{
         retrun address.Equals(_local.RootPath.Address) || Transport.Addresses.Contains(address);
}

then the cache extension instances do not need to be volatile,
every memory barrier in a hot path should be avoided:

private ActorRefResolveThreadLocalCache _actorRefResolveThreadLocalCache;
private ActorPathThreadLocalCache _actorPathThreadLocalCache;

public virtual void Init(ActorSystemImpl system)
{
            _system = system;

            //moved up
            _actorRefResolveThreadLocalCache = ActorRefResolveThreadLocalCache.For(system);
            _actorPathThreadLocalCache = ActorPathThreadLocalCache.For(system);

            _local.Init(system);

            //rest of code
}

Then ActorPath.TryParse is getting called multiple times (with and without cache)

I made an draft at #5273
It is still WIP

@Aaronontheweb
Copy link
Member Author

The RootPath.Address is the same as _local.RootPath.Address currently the equals will perform twice on the same address

Not a huge deal from a perf point of view - ReferenceEquals call is about 1-2 ns.

@Zetanova
Copy link
Contributor

Zetanova commented Sep 7, 2021

@Aaronontheweb Why is there a AddressCache that caches the Address with the key as ActorPathString and an ActorPathCache that makes the same?
Both using ActorPathString as the key and storing near the same information

@Zetanova
Copy link
Contributor

Zetanova commented Sep 7, 2021

@Aaronontheweb I made some improvements, volatile should/can be removed from private Internals _internals; too.
This makes in a hot-path a difference.

Then I found a way to remove AddressCache and use the same ActorPathCache as RemoteActorRefProvider

@Aaronontheweb Aaronontheweb merged commit 8f168fc into akkadotnet:dev Sep 8, 2021
@Aaronontheweb Aaronontheweb deleted the perf/RemoteActorRefProvider-Address-resolve branch September 8, 2021 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants