Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AkkaPduCodec performance fixes [remoting] #3299

Merged
merged 4 commits into from
Jan 27, 2018

Conversation

Aaronontheweb
Copy link
Member

Noticed some low-hanging fruit for performance in the AkkaPduCodec fixes while I was working on implementing a custom serializer for Akka.Remote today.

This is the class responsible for serializing 100% of message traffic that passes over each Akka.Remote connection.

Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Described changes

@@ -228,14 +228,19 @@ protected AkkaPduCodec(ActorSystem system)
/// <returns>TBD</returns>
public virtual ByteString EncodePdu(IAkkaPdu pdu)
{
ByteString finalBytes = null;
pdu.Match()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of the old .Match extension method, which allocates. Replace with a C# 7 switch statement instead. Does not allocate.

return finalBytes;
switch (pdu)
{
case Payload p:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the order in which we filter for events in Akka.Remote serialization; Payload makes up the vast majority of network traffic, since that's all of the messages that aren't part of the Akka.Remote association protocol itself. Payload was previously handled second, so we always had one case we needed to fall through each time.

Heartbeat is the second most frequently used message type, since these are emitted once per second.

* Since there's never any ActorSystem-specific information coded directly
* into the heartbeat messages themselves (i.e. no handshake info,) there's no harm in caching in the
* same heartbeat byte buffer and re-using it.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained in the comment above: Heartbeat messages don't contain any handshake information, therefore the same "heartbeat" message can be used across many different associations and even many different ActorSystems without having to be created from scratch each time. We cache this value in a static mutable field on the serializer now. We could probably make this immutable....

* into the heartbeat messages themselves (i.e. no handshake info,) there's no harm in caching in the
* same heartbeat byte buffer and re-using it.
*/
private static readonly ByteString HeartbeatPdu = ConstructControlMessagePdu(CommandType.Heartbeat);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to make this static and immutable as well.

@Aaronontheweb
Copy link
Member Author

I'll let the NBench stuff do its thing here, but I'm also going to run a PingPong benchmark locally for comparison. These optimizations are all fairly "micro," but they should noticeably reduce allocations per-remote-message sent by at least 1.

@Aaronontheweb
Copy link
Member Author

Bearing in mind that these values aren't scientific because I'm ran them on my personal computer, which has tons of stuff running in the background:

Before

λ dotnet run .\RemotePingPong --framework netcoreapp1.1
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 20000 (2e4)
Is Server GC: True

Num clients, Total [msg], Msgs/sec, Total [ms]
1, 20000, 60423, 331.72
5, 100000, 63573, 1573.50
10, 200000, 64957, 3079.72
15, 300000, 64103, 4680.37
20, 400000, 64382, 6213.41
25, 500000, 63630, 7858.75
30, 600000, 63379, 9467.96
Done..

λ dotnet run .\RemotePingPong --framework net461
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 20000 (2e4)
Is Server GC: True

Num clients, Total [msg], Msgs/sec, Total [ms]
1, 20000, 60607, 330.69
5, 100000, 71995, 1389.54
10, 200000, 70722, 2828.55
15, 300000, 73530, 4080.33
20, 400000, 69554, 5751.79
25, 500000, 70902, 7052.97
30, 600000, 69590, 8622.68
Done..

After

λ dotnet run .\RemotePingPong --framework netcoreapp1.1
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 20000 (2e4)
Is Server GC: True

Num clients, Total [msg], Msgs/sec, Total [ms]
1, 20000, 57804, 346.07
5, 100000, 63654, 1571.40
10, 200000, 64600, 3096.74
15, 300000, 62138, 4828.39
20, 400000, 63715, 6278.23
25, 500000, 64070, 7804.18
30, 600000, 62559, 9591.81
Done.

λ dotnet run .\RemotePingPong --framework net461
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 20000 (2e4)
Is Server GC: True

Num clients, Total [msg], Msgs/sec, Total [ms]
1, 20000, 64517, 310.58
5, 100000, 73153, 1367.91
10, 200000, 74627, 2680.02
15, 300000, 73350, 4090.98
20, 400000, 72993, 5480.39
25, 500000, 70732, 7069.06
30, 600000, 71600, 8380.05
Done..

IMHO, looks to me like the change made a material difference on .NET 4.6.1 but none at all on .NET Core 1.1. I'll wait and see what the build server reports from NBench though.

@Horusiath Horusiath self-requested a review January 27, 2018 15:49
Copy link
Contributor

@Horusiath Horusiath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good. The results of benchmarks you've shown looks disturbing thou. No performance improvement?

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Jan 27, 2018

@Horusiath looks like there was one in the .NET 4.6.1 pipeline but not .NET Core 1.1. Guess in this instance the amount of objects being allocated wasn't enough to make a material difference to the frequency at which the garbage collector runs. In other words, there's bigger bottlenecks / bigger allocators than the AkkaPduCodec.

@Aaronontheweb Aaronontheweb merged commit 9631e03 into akkadotnet:dev Jan 27, 2018
@Aaronontheweb Aaronontheweb deleted the akkapdu-codec-spec branch January 27, 2018 16:36
@Aaronontheweb Aaronontheweb added this to the 1.3.4 milestone Feb 1, 2018
Aaronontheweb added a commit that referenced this pull request Feb 1, 2018
* AkkaPduCodec performance fixes

* made HeartbeatPdu immutable

* made all internal formatting methods static
Aaronontheweb added a commit that referenced this pull request Feb 19, 2018
* AkkaPduCodec performance fixes

* made HeartbeatPdu immutable

* made all internal formatting methods static
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants