Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MurmurHash string hash memory footprint #5028

Merged

Conversation

Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented May 21, 2021

Closes #5023

Edit by @Aaronontheweb : Removed the reference to #5022 since that has other issues still affecting it

@@ -202,20 +202,20 @@ public static int StringHash(string s)
{
unchecked
{
var sChar = s.ToCharArray();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the biggest offending code. MurmurHash is called once for every DataEnvelope that are going to be sent as a gossip. For a system that are trying to gossip thousands of gossips, this can add up quite significantly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Murmur_string_hash 60.87 ns 1.240 ns 2.389 ns 0.0229 - - 96 B
Jenkins_string_hash 221.14 ns 0.767 ns 0.680 ns 0.0153 - - 64 B
Murmur_binary_hash 166.72 ns 2.177 ns 2.036 ns - - - -
Jenkins_binary_hash 370.40 ns 0.709 ns 0.554 ns - - - -

Copy link
Member

@Aaronontheweb Aaronontheweb May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19041.985 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.203
  [Host]     : .NET Core 3.1.15 (CoreCLR 4.700.21.21202, CoreFX 4.700.21.21402), X64 RyuJIT
  DefaultJob : .NET Core 3.1.15 (CoreCLR 4.700.21.21202, CoreFX 4.700.21.21402), X64 RyuJIT

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Murmur_string_hash 40.39 ns 0.082 ns 0.077 ns - - - -
Jenkins_string_hash 218.54 ns 0.467 ns 0.390 ns 0.0153 - - 64 B
Murmur_binary_hash 166.06 ns 2.224 ns 2.080 ns - - - -
Jenkins_binary_hash 372.00 ns 2.132 ns 1.664 ns - - - -

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a big improvement.

@@ -336,7 +378,7 @@ public MultiVersionVector(ImmutableDictionary<UniqueAddress, long> nodeVersions)

public override bool IsEmpty => Versions.IsEmpty;
public override int Count => Versions.Count;
public override IEnumerator<KeyValuePair<UniqueAddress, long>> VersionEnumerator => Versions.GetEnumerator();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second memory offender. Calling ImmutableDictionary GetEnumerator seemed to create a shallow copy of some of its internal data structure, consuming memory everytime it is called.

Fix by converting this to a Tuple IEnumerator and our own much simpler IEnumerator implementation.

@Arkatufus
Copy link
Contributor Author

There is a third memory offender, the serialization byte array allocation, but I have yet found a good way to fix this.

@Arkatufus Arkatufus marked this pull request as draft May 21, 2021 17:00
@Aaronontheweb
Copy link
Member

@Arkatufus can you do a before and after snapshot on this?

@Arkatufus
Copy link
Contributor Author

Would a dotMemory screenshot be enough?

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the breaking API change trade-offs?

@@ -40,7 +40,7 @@ public int Compare(Address x, Address y)
if (result != 0) return result;
result = string.CompareOrdinal(x.System, y.System);
if (result != 0) return result;
result = string.CompareOrdinal(x.Host ?? "", y.Host ?? "");
result = string.CompareOrdinal(x.Host ?? string.Empty, y.Host ?? string.Empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -257,7 +257,7 @@ public SingleVersionVector(UniqueAddress node, long version)

public override bool IsEmpty => false;
public override int Count => 1;
public override IEnumerator<KeyValuePair<UniqueAddress, long>> VersionEnumerator => new Enumerator(Node, Version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking API change or is this a subtype of an internal type above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, both of these are breaking changes.

I thought about doing this in #4990 - but I decided to avoid the breaking API change by exposing an internal API that should be used for this.

I don't think the risks are that high of this breaking change having a significant negative impact as DData is either primarily used directly or inside of cluster sharding and these APIs aren't called directly... but it's a judgement call.

@Arkatufus
Copy link
Contributor Author

Well, this is highly annoying. I can't reproduce the bug using the original reproduction code anymore, memory consuption stays between 100-200 Mb no matter what I do.

@Aaronontheweb
Copy link
Member

Aaronontheweb commented May 21, 2021 via email

@Arkatufus
Copy link
Contributor Author

Yes, its still occuring, the console is scrolling endlessly, but the memory total didn't go above 200 Mb

@Aaronontheweb
Copy link
Member

@Arkatufus let's get this PR cleaned up for merge so we can benefit from the improved memory footprint. We can still work on tracking down the gossip issues in another PR.

# Conflicts:
#	src/contrib/cluster/Akka.DistributedData/VersionVector.cs
@Arkatufus Arkatufus marked this pull request as ready for review May 24, 2021 15:06
@Arkatufus Arkatufus changed the title Resolve Akka.DistributedData DurableDataEnvelope memory leak Improve MurmurHash string hash memory footprint May 24, 2021
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) May 24, 2021 16:14
@Aaronontheweb Aaronontheweb added this to the 1.4.21 milestone May 24, 2021
@Aaronontheweb Aaronontheweb merged commit 3f8fa29 into akkadotnet:dev May 24, 2021
This was referenced Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants