Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster: improve gossip serialization performance #7281

Merged

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Jul 10, 2024

Changes

Performance golf to attempt to improve serialization performance on Gossip data structures

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4529/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.303
  [Host]     : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Allocated
Serialize_Heartbeat 392.8 ns 7.83 ns 21.84 ns 0.0534 224 B
Deserialize_Heartbeat 914.4 ns 18.04 ns 40.34 ns 0.1640 688 B
Serialize_HeartbeatRsp 492.4 ns 9.44 ns 24.69 ns 0.0629 264 B
Deserialize_HeartbeatRsp 1,149.8 ns 22.99 ns 55.51 ns 0.1907 800 B
Serialize_GossipEnvelope 27,185.0 ns 675.36 ns 1,980.72 ns 1.7700 7496 B
Deserialize_GossipEnvelope 51,976.9 ns 1,186.86 ns 3,499.47 ns 3.9673 16624 B
Serialize_GossipStatus 3,124.2 ns 71.97 ns 211.07 ns 0.3929 1648 B
Deserialize_GossipStatus 8,466.2 ns 196.12 ns 559.54 ns 0.9766 4096 B
Serialize_Welcome 42,868.1 ns 989.00 ns 2,869.27 ns 2.2583 9592 B
Deserialize_Welcome 76,814.4 ns 1,800.46 ns 5,280.43 ns 4.8828 20480 B

@Aaronontheweb
Copy link
Member Author

First improvement - rolling up several of the aggregations into a single foreach loop instead of having 3 separate LINQ invocations iterating over the same collection:


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4529/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.303
  [Host]     : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2


Method Mean Error StdDev Median Gen0 Allocated
Serialize_Heartbeat 338.0 ns 6.74 ns 12.83 ns 334.9 ns 0.0534 224 B
Deserialize_Heartbeat 794.0 ns 15.77 ns 33.27 ns 789.0 ns 0.1640 688 B
Serialize_HeartbeatRsp 417.0 ns 8.32 ns 13.67 ns 415.8 ns 0.0629 264 B
Deserialize_HeartbeatRsp 976.7 ns 19.40 ns 47.96 ns 975.0 ns 0.1907 800 B
Serialize_GossipEnvelope 16,183.2 ns 288.39 ns 534.56 ns 16,142.3 ns 1.7090 7232 B
Deserialize_GossipEnvelope 36,860.0 ns 886.79 ns 2,614.73 ns 36,627.2 ns 3.9063 16368 B
Serialize_GossipStatus 2,478.6 ns 49.36 ns 121.08 ns 2,476.6 ns 0.3929 1648 B
Deserialize_GossipStatus 7,194.2 ns 201.86 ns 592.03 ns 7,188.1 ns 0.9766 4096 B
Serialize_Welcome 26,747.4 ns 527.24 ns 1,283.37 ns 26,500.5 ns 2.2278 9328 B
Deserialize_Welcome 54,725.1 ns 1,177.71 ns 3,472.51 ns 53,607.7 ns 4.6387 20216 B

@Aaronontheweb
Copy link
Member Author

I'm going to collect some binary dumps of the cluster Gossip data so we can just preload that - having a better sense of absolute values will be helpful.

@Aaronontheweb
Copy link
Member Author

Updated stats thanks to @Arkatufus 's changes


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4651/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.303
  [Host]     : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2


Method Mean Error StdDev Gen0 Allocated
Serialize_Heartbeat 278.1 ns 4.00 ns 3.74 ns 0.0534 224 B
Deserialize_Heartbeat 327.2 ns 5.34 ns 4.99 ns 0.1106 464 B
Serialize_HeartbeatRsp 339.1 ns 4.72 ns 4.42 ns 0.0629 264 B
Deserialize_HeartbeatRsp 377.6 ns 6.51 ns 5.77 ns 0.1278 536 B
Serialize_GossipEnvelope 12,640.7 ns 112.64 ns 99.85 ns 1.7242 7232 B
Deserialize_GossipEnvelope 9,289.6 ns 118.33 ns 98.81 ns 2.1820 9136 B
Serialize_GossipStatus 1,973.6 ns 23.70 ns 21.01 ns 0.3929 1648 B
Deserialize_GossipStatus 3,009.2 ns 30.65 ns 28.67 ns 0.5836 2448 B
Serialize_Welcome 21,170.3 ns 184.95 ns 154.44 ns 2.2278 9328 B
Deserialize_Welcome 12,853.6 ns 99.75 ns 93.31 ns 2.5940 10888 B

@Aaronontheweb
Copy link
Member Author

My most recent commit broke roles serialization - still working on that

@Aaronontheweb
Copy link
Member Author

Method Mean Error StdDev Median Gen0 Gen1 Allocated
Serialize_Heartbeat 132.0 ns 2.61 ns 6.01 ns 129.6 ns 0.0236 - 224 B
Deserialize_Heartbeat 200.8 ns 4.04 ns 6.17 ns 200.3 ns 0.0491 - 464 B
Serialize_HeartbeatRsp 163.2 ns 3.32 ns 6.47 ns 163.7 ns 0.0279 - 264 B
Deserialize_HeartbeatRsp 221.7 ns 4.45 ns 6.65 ns 220.3 ns 0.0567 - 536 B
Serialize_GossipEnvelope 5,953.5 ns 118.32 ns 213.36 ns 5,904.4 ns 0.7553 - 7144 B
Deserialize_GossipEnvelope 4,907.1 ns 85.34 ns 116.81 ns 4,866.8 ns 0.9689 0.0153 9136 B
Serialize_GossipStatus 1,083.4 ns 21.59 ns 22.17 ns 1,086.6 ns 0.1736 - 1648 B
Deserialize_GossipStatus 1,560.8 ns 29.72 ns 26.35 ns 1,568.8 ns 0.2594 - 2448 B
Serialize_Welcome 9,562.1 ns 188.59 ns 309.85 ns 9,520.2 ns 0.9766 0.0153 9200 B
Deserialize_Welcome 6,832.7 ns 134.10 ns 169.59 ns 6,783.4 ns 1.1444 0.0153 10888 B

This is from our test lab machine - I'll need to set a perf baseline here for comparison, but the memory usage is down some.

@Aaronontheweb
Copy link
Member Author

Original dev benchmarks for our Test Lab unit:

Method Mean Error StdDev Gen0 Gen1 Allocated
Serialize_Heartbeat 127.6 ns 2.60 ns 5.36 ns 0.0236 - 224 B
Deserialize_Heartbeat 191.9 ns 3.91 ns 7.62 ns 0.0491 - 464 B
Serialize_HeartbeatRsp 158.0 ns 3.19 ns 5.06 ns 0.0279 - 264 B
Deserialize_HeartbeatRsp 227.1 ns 4.59 ns 11.67 ns 0.0567 - 536 B
Serialize_GossipEnvelope 7,845.8 ns 151.72 ns 245.01 ns 0.7935 0.0076 7496 B
Deserialize_GossipEnvelope 4,998.2 ns 99.58 ns 179.55 ns 0.9689 0.0153 9136 B
Serialize_GossipStatus 1,053.3 ns 20.82 ns 32.42 ns 0.1736 - 1648 B
Deserialize_GossipStatus 1,540.8 ns 30.16 ns 38.14 ns 0.2594 - 2448 B
Serialize_Welcome 11,639.9 ns 230.33 ns 391.12 ns 1.0071 0.0153 9592 B
Deserialize_Welcome 6,871.3 ns 134.25 ns 183.77 ns 1.1520 0.0229 10888 B

@Aaronontheweb
Copy link
Member Author

BenchmarkDotNet v0.13.12, Pop!_OS 22.04 LTS
13th Gen Intel Core i7-1360P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 8.0.105
[Host] : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method Mean Error StdDev Median Gen0 Gen1 Allocated
Serialize_Heartbeat 130.7 ns 2.67 ns 5.14 ns 129.7 ns 0.0236 - 224 B
Deserialize_Heartbeat 194.4 ns 3.88 ns 6.38 ns 193.1 ns 0.0491 - 464 B
Serialize_HeartbeatRsp 159.5 ns 3.23 ns 6.15 ns 158.9 ns 0.0279 - 264 B
Deserialize_HeartbeatRsp 224.1 ns 4.46 ns 8.38 ns 221.2 ns 0.0567 - 536 B
Serialize_GossipEnvelope 5,728.8 ns 114.39 ns 200.35 ns 5,707.1 ns 0.7477 - 7048 B
Deserialize_GossipEnvelope 4,962.2 ns 98.46 ns 205.51 ns 4,938.9 ns 0.9689 0.0153 9136 B
Serialize_GossipStatus 1,050.5 ns 18.96 ns 20.28 ns 1,047.8 ns 0.1736 - 1648 B
Deserialize_GossipStatus 1,532.8 ns 29.03 ns 39.74 ns 1,531.5 ns 0.2594 - 2448 B
Serialize_Welcome 9,099.1 ns 181.61 ns 455.62 ns 8,936.8 ns 0.9460 - 9032 B
Deserialize_Welcome 6,898.0 ns 135.02 ns 160.73 ns 6,883.2 ns 1.1520 0.0229 10888 B

@Aaronontheweb
Copy link
Member Author

Latest:

// * Summary *

BenchmarkDotNet v0.13.12, Pop!_OS 22.04 LTS
13th Gen Intel Core i7-1360P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 8.0.105
[Host] : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method Mean Error StdDev Gen0 Gen1 Allocated
Serialize_Heartbeat 127.1 ns 2.60 ns 5.48 ns 0.0236 - 224 B
Deserialize_Heartbeat 196.9 ns 3.92 ns 4.35 ns 0.0491 - 464 B
Serialize_HeartbeatRsp 160.4 ns 3.23 ns 6.08 ns 0.0279 - 264 B
Deserialize_HeartbeatRsp 223.4 ns 4.38 ns 5.85 ns 0.0567 - 536 B
Serialize_GossipEnvelope 5,629.1 ns 108.37 ns 165.49 ns 0.7401 - 7016 B
Deserialize_GossipEnvelope 4,953.6 ns 97.68 ns 178.62 ns 0.9689 0.0153 9136 B
Serialize_GossipStatus 1,072.8 ns 21.34 ns 35.07 ns 0.1736 - 1648 B
Deserialize_GossipStatus 1,553.1 ns 30.38 ns 50.75 ns 0.2594 - 2448 B
Serialize_Welcome 8,930.0 ns 178.42 ns 312.48 ns 0.9460 0.0153 9000 B
Deserialize_Welcome 6,886.5 ns 136.00 ns 186.15 ns 1.1520 0.0229 10888 B

@Aaronontheweb
Copy link
Member Author

I don't think there's much we can do to improve the performance of serializing Gossip - that's probably the end of the line for those optimizations.

@Aaronontheweb
Copy link
Member Author

Final numbers:

// * Summary *

BenchmarkDotNet v0.13.12, Pop!_OS 22.04 LTS
13th Gen Intel Core i7-1360P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 8.0.105
[Host] : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method Mean Error StdDev Gen0 Gen1 Allocated
Serialize_Heartbeat 127.7 ns 2.51 ns 2.99 ns 0.0236 - 224 B
Deserialize_Heartbeat 196.5 ns 3.92 ns 6.86 ns 0.0491 - 464 B
Serialize_HeartbeatRsp 161.8 ns 3.29 ns 4.72 ns 0.0279 - 264 B
Deserialize_HeartbeatRsp 223.4 ns 4.43 ns 4.93 ns 0.0567 - 536 B
Serialize_GossipEnvelope 5,699.7 ns 113.09 ns 138.88 ns 0.7401 - 7016 B
Deserialize_GossipEnvelope 4,974.3 ns 98.05 ns 211.06 ns 0.9537 0.0153 8992 B
Serialize_GossipStatus 1,065.2 ns 20.61 ns 28.21 ns 0.1736 - 1648 B
Deserialize_GossipStatus 1,582.0 ns 31.27 ns 45.84 ns 0.2594 - 2448 B
Serialize_Welcome 8,990.1 ns 175.70 ns 234.55 ns 0.9460 0.0153 9000 B
Deserialize_Welcome 6,885.5 ns 137.78 ns 233.96 ns 1.1368 0.0229 10720 B

@Aaronontheweb Aaronontheweb added this to the 1.5.27 milestone Jul 16, 2024
@Aaronontheweb Aaronontheweb marked this pull request as ready for review July 16, 2024 19:02
@Aaronontheweb
Copy link
Member Author

Results on my original machine:

// * Summary *

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4651/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.303
[Host] : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2

Method Mean Error StdDev Gen0 Allocated
Serialize_Heartbeat 272.4 ns 5.50 ns 9.04 ns 0.0534 224 B
Deserialize_Heartbeat 348.1 ns 6.88 ns 6.76 ns 0.1106 464 B
Serialize_HeartbeatRsp 351.9 ns 6.27 ns 6.43 ns 0.0629 264 B
Deserialize_HeartbeatRsp 408.6 ns 8.81 ns 25.27 ns 0.1278 536 B
Serialize_GossipEnvelope 12,711.6 ns 251.05 ns 383.37 ns 1.6632 7016 B
Deserialize_GossipEnvelope 9,450.1 ns 161.05 ns 150.64 ns 2.1362 8992 B
Serialize_GossipStatus 2,094.7 ns 40.46 ns 51.17 ns 0.3929 1648 B
Deserialize_GossipStatus 3,126.9 ns 53.93 ns 50.45 ns 0.5836 2448 B
Serialize_Welcome 20,213.8 ns 398.49 ns 489.38 ns 2.1362 9000 B
Deserialize_Welcome 12,822.8 ns 239.58 ns 212.38 ns 2.5482 10720 B

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Arkatufus Arkatufus enabled auto-merge (squash) July 17, 2024 21:56
@Arkatufus Arkatufus merged commit 5176dfb into akkadotnet:dev Jul 17, 2024
12 checks passed
@Aaronontheweb Aaronontheweb deleted the perf-fix-cluster-serialization branch July 18, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants