Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guid.CreateVersion7() could be faster #106377

Open
yaakov-h opened this issue Aug 14, 2024 · 12 comments
Open

Guid.CreateVersion7() could be faster #106377

yaakov-h opened this issue Aug 14, 2024 · 12 comments
Labels
area-System.Runtime in-pr There is an active PR which will close this issue when it is merged needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration tenet-performance Performance related issue
Milestone

Comments

@yaakov-h
Copy link
Member

Description

I got nerd-sniped on Mastodon a while back into build a highly performant UUIDv7 implementation for .NET.

I haven't published it as a library yet or anything like that.

I just saw that .NET 9 Preview 7 adds Guid.CreateVersion7() which does the same thing, but when I benchmarked it the runtime version came out as almost 3x as slow as my version.

The problem is my version depends on being smart with RandomNumberGenerator.Fill, and based on this comment it seems that random number generation is not available in this part of the runtime and what is available is slower

However I think there may be an oversight here.

RandomNumberGenerator.Fill is significantly faster on macOS, but about 20% slower on Windows. I suspect this is due to the split between CoCreateGuid() on Windows vs Interop.GetCryptographicallySecureRandomBytes() on Unix.

I'll post additional benchmarks below.

Would you be open to me doing platform-specific implementations to speed up Guid.CreateVersion7() (and if I can get benchmarks to back it up, maybe Guid.NewGuid() too?) on non-Windows platforms?

@yaakov-h yaakov-h added the tenet-performance Performance related issue label Aug 14, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 14, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 14, 2024
@yaakov-h
Copy link
Member Author

macOS, M2 Pro (ARM64), .NET 9 Preview 7:

// * Summary *

BenchmarkDotNet v0.13.12, macOS Sonoma 14.6 (23G80) [Darwin 23.6.0]
Apple M2 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK 9.0.100-preview.7.24407.12
  [Host]   : .NET 9.0.0 (9.0.24.40507), Arm64 RyuJIT AdvSIMD
  .NET 9.0 : .NET 9.0.0 (9.0.24.40507), Arm64 RyuJIT AdvSIMD

Job=.NET 9.0  Runtime=.NET 9.0  

| Method  | Mean      | Error    | StdDev   | Ratio | Allocated | Alloc Ratio |
|-------- |----------:|---------:|---------:|------:|----------:|------------:|
| runtime | 257.25 ns | 0.566 ns | 0.472 ns |  1.00 |         - |          NA |
| faster  |  93.55 ns | 0.265 ns | 0.248 ns |  0.36 |         - |          NA |

// * Hints *
Outliers
  Benchmarks.runtime: .NET 9.0 -> 2 outliers were removed (261.13 ns, 261.29 ns)

@yaakov-h
Copy link
Member Author

yaakov-h commented Aug 14, 2024

Windows, Intel Core i9-12900H (x64) .NET 9 Preview 7:

// * Summary *

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3)
12th Gen Intel Core i9-12900H, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.7.24407.12
  [Host]   : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2
  .NET 9.0 : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2

Job=.NET 9.0  Runtime=.NET 9.0

| Method  | Mean     | Error    | StdDev   | Ratio | RatioSD | Allocated | Alloc Ratio |
|-------- |---------:|---------:|---------:|------:|--------:|----------:|------------:|
| runtime | 61.35 ns | 1.254 ns | 1.394 ns |  1.00 |    0.00 |         - |          NA |
| faster  | 75.07 ns | 1.499 ns | 1.949 ns |  1.23 |    0.05 |         - |          NA |

Here my "faster" version is actually slower, so I can see how the existing decision would have been made to use NewGuid() instead of RNG.

@stephentoub
Copy link
Member

stephentoub commented Aug 14, 2024

I don't think changing CreateVersion7 here is appropriate. The only extra overhead using NewGuid has is setting (and requesting from the rng) a few extra bytes that CreateVersion7 will then overwrite, but otherwise it needs cryptographically-secure random bytes, and it gets them from a native method. If you're really seeing RandomNumberGenerator.Fill being 3x faster than NewGuid on macOS, then I suspect that means the CryptoNative_GetRandomBytes function is ~3x faster than the SystemNative_GetCryptographicallySecureRandomBytes function on macOS, and that's the gap we should be looking to address.

@jkotas
Copy link
Member

jkotas commented Aug 14, 2024

CryptoNative_GetRandomBytes function is ~3x faster than the SystemNative_GetCryptographicallySecureRandomBytes function on macOS,

We should be calling AppleCryptoNative_GetRandomBytes on macOS. Both AppleCryptoNative_GetRandomBytes and SystemNative_GetCryptographicallySecureRandomBytes should call the same CCRandomGenerateBytes macOS API:

@stephentoub
Copy link
Member

Thanks. If they're already calling the same function, then @yaakov-h more analysis will be needed to understand the cited differences before a PR would be accepted.

What does "my version depends on being smart with RandomNumberGenerator.Fill" mean?

@yaakov-h
Copy link
Member Author

yaakov-h commented Aug 14, 2024

I suspect that means the CryptoNative_GetRandomBytes function is ~3x faster than the SystemNative_GetCryptographicallySecureRandomBytes function on macOS, and that's the gap we should be looking to address.

yes and no, because that comes back to:

my version depends on being smart with RandomNumberGenerator.Fill

CreateVersion7() appears to create a new UUIDv4, and then overwrites some of those bytes and bits to turn it into a UUIDv7.

In the version marked faster in the above benchmarks, I skip the first 6 bytes when doing the RNG rather than RNG'ing them and overwriting it later, as these will be the timestamp.

Span<byte> value = _interior; // InlineArray(16) struct
RandomNumberGenerator.Fill(value.Slice(6));

Generating fewer random bytes helps speed things up a little bit.

@stephentoub
Copy link
Member

Generating fewer random bytes helps speed things up a little bit.

But generating 6 fewer random bytes when it's already generating over a hundred should not make a 3x difference. Can you expand on that?

@yaakov-h
Copy link
Member Author

A hundred? A GUID is only 16 bytes.

Calling RandomNumberGenerator.Fill(16 bytes) is slower than calling RandomNumberGenerator.Fill(10 bytes).

When Guid.CreateVersion7() runs on UNIX platforms, it fills 16 bytes and then throws away the first 6.

Somre more of the performance difference may be lost due to additional things in the runtime like checking for a negative DateTimeOffset, I would have to check that.

@vcsjones vcsjones added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 14, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member

A hundred? A GUID is only 16 bytes.

Sorry, that was supposed to be ten; typing late at night. I'd expect generating ~35% fewer bytes would make it ~35% faster, but your numbers are showing ~2x as much gain. That's what's I'm not clear about.

@yaakov-h
Copy link
Member Author

I'm not entirely sure either. Looking over the differences as well as the iteration history and Mastodon conversation history of making the other version faster:

  • The runtime sets V4 version and variant, and later overwrites these with V7 version and variant. That's some more wasted CPU time.
  • Setting the date-time bytes could probably be done faster with BinaryPrimitives.WriteInt64BigEndian (if available in this part of the runtime) or Unsafe.As<long> rather than dealing with _a and _b individually.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 16, 2024
@yaakov-h
Copy link
Member Author

It doesn't seem like I even need most of the little optimisations in the other implementation.

Just skipping the parts of v4 generation that get replaced by the v7 generation is enough to achieve the speedup.

PR in #106525 with benchmarks.

@jeffhandley jeffhandley added this to the Future milestone Aug 23, 2024
@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime in-pr There is an active PR which will close this issue when it is merged needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants