Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Performance optimize Ask #4965

Closed

Conversation

Aaronontheweb
Copy link
Member

No description provided.

@Aaronontheweb
Copy link
Member Author

Going to use the baseline data from #4962 to measure the impact on Ask itself, even though there's ActorSelection overhead in it.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Apr 20, 2021

Baseline numbers on dev branch

No idea why these throughput numbers are lower than the gains I posted for ActorSelection - probably has to do with my PC workload in the background

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.928 (2004/?/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method Iterations Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RequestResponseActorSelection 10000 98.446 ms 1.5898 ms 1.4871 ms 5000.0000 - - 20.2 MB
CreateActorSelection 10000 5.397 ms 0.0447 ms 0.0418 ms 953.1250 - - 3.81 MB

@Aaronontheweb
Copy link
Member Author

Got a cleaner baseline this time

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.928 (2004/?/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method Iterations Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
RequestResponseActorSelection 10000 83.867 ms 1.4599 ms 2.8474 ms 82.746 ms 5000.0000 - - 20.2 MB
CreateActorSelection 10000 5.407 ms 0.1040 ms 0.1068 ms 5.365 ms 953.1250 - - 3.81 MB

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Apr 20, 2021

Introduced object pooling for StringBuilder, inspired by what @to11mtm did on #4929

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.928 (2004/?/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method Iterations Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RequestResponseActorSelection 10000 84.063 ms 1.5885 ms 1.4859 ms 4857.1429 - - 19.21 MB
CreateActorSelection 10000 5.318 ms 0.0632 ms 0.0560 ms 953.1250 - - 3.81 MB

Throughput seems pretty much the same, but definitely a drop in Gen 1 allocation and total memory.

@Aaronontheweb
Copy link
Member Author

Interestingly, this change will also affect how quickly actors can spawn since the same method I'm pooling StringBuilders on is used for generating random new actor names. I'll give that a spin too.

internal static class PooledObject
{
public static readonly ObjectPool<StringBuilder> StringBuilderPool =
new DefaultObjectPoolProvider().CreateStringBuilderPool(512, 2048);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting to re-use this pool elsewhere? These sizes may be a bit much otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's several other places where we can use it, but the sizes I chose are arbitrary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly curious if there was significant overhead in accessing and returning a pooled StringBuilder vs Gen 0-ing one.

@Aaronontheweb Aaronontheweb changed the title Performance optimize Ask [WIP] Performance optimize Ask Apr 20, 2021
next = next >> 6;
encodedBytes[writeIndex]= Base64Chars[index];
next = next >> 6;
writeIndex++;
} while (next != 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on akka.net internals but here are my two cents

You can always unroll the loop instead of looping since this is a fixed-size data type it will increase performance at least fourfold
Also, the prefix is preventing it from doing micro-optimization so you can split methods (with any/with just 1/without prefix) to receive performance optimization case by case basis, you also can skip stackalloc in this way (stackalloc benefits from less allocation but you have to copy the memory block anyways, so one less, much better performance)

just a side note, I don't think this is important but I think we may have endianness issue here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm up for trying all of the above.

just a side note, I don't think this is important but I think we may have endianness issue here

ah, you mean I'm doing it big endian? Whoops.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you mean I'm doing it big endian? Whoops.

Meh, I'm not saying you're doing anything wrong and also I don't think anyone will try to bitconvert long to base64 and do whatever you're doing here themselves in between different architecture systems/cluster nodes, I'm saying just because I have zero knowledge regarding internals. so, it's a just side note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just using this to help generate random actor names, which is something we have to do for the temporary 1-off actors that get used to power Ask and GracefulStop - when those actor paths get translated over the network it's all done as a string so it's probably of little consequence, but I still don't like doing things wrong :p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I don't think it does matter at all. you're doing it just fine 👍

Copy link

@christallire christallire Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm doing this for fun and.. I got quite confusing result while validating results

`BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.202
[Host] : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Method Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
Benchmark 28.82 ns 0.262 ns 0.245 ns 1.00 0.0057 - - 48 B
BenchmarkOptimization1 12.68 ns 0.207 ns 0.184 ns 0.44 0.0057 - - 48 B

and Here's my code

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static unsafe string Base64EncodeOptimization1(long value)
        {
            // disable range checking
            fixed (char* base64 = &base64Table[0])
            {
                Span<char> encodedBytes = stackalloc char[11];

                var next = value;
                encodedBytes[0] = base64[(int) ((next >> (6 * 0)) & 63)];
                encodedBytes[1] = base64[(int) ((next >> (6 * 1)) & 63)];
                encodedBytes[2] = base64[(int) ((next >> (6 * 2)) & 63)];
                encodedBytes[3] = base64[(int) ((next >> (6 * 3)) & 63)];
                encodedBytes[4] = base64[(int) ((next >> (6 * 4)) & 63)];
                encodedBytes[5] = base64[(int) ((next >> (6 * 5)) & 63)];
                encodedBytes[6] = base64[(int) ((next >> (6 * 6)) & 63)];
                encodedBytes[7] = base64[(int) ((next >> (6 * 7)) & 63)];
                encodedBytes[8] = base64[(int) ((next >> (6 * 8)) & 63)];
                encodedBytes[9] = base64[(int) ((next >> (6 * 9)) & 63)];
                encodedBytes[10] = base64[(int) ((next >> (6 * 10)) & 63)];

                return new string(encodedBytes);
            }
        }

Here's the catch:

  1. This is actually base63 (no '='), I think this is intended it but made me scratch my head
  2. and original code loses all top bits if all zero (edge case: 0), well, if it's intended, okay.
  3. but the chance of hash conflict is higher than I thought because of 1) and 2)
  4. I think my micro-optimization is pointless unless you use this in a tight loop (those two are only nano-second differences..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chris-sung - sorry I missed this!

This is actually base63 (no '='), I think this is intended it but made me scratch my head

That's correct - believe we can't use that due to Akka.NET Uri encoding restrictions. This function is primarily used to generate random actor names.

and original code loses all top bits if all zero (edge case: 0), well, if it's intended, okay.

That should also be fine.

I think my micro-optimization is pointless unless you use this in a tight loop (those two are only nano-second differences..)

We don't use this in a tight loop - most of the benefit from this function comes from eliminating the stringbuilder allocation and reducing GC overhead. This function typically gets invoked when we're generating actor names, so the area where you'll see the biggest performance benefit is when a large number of Ask<T> operations are being created all at once.

Maybe we should still incorporate this in v1.4.21 though?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation. what do you mean by "incorporate this" exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean use it instead of my current implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants