-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for Ascii.Equals and Ascii.EqualsIgnoreCase #85926
Conversation
=> left.Length == right.Length && SequenceEqualIgnoreCase(right, left); | ||
|
||
private static bool SequenceEqualIgnoreCase<TLeft, TRight>(ReadOnlySpan<TLeft> left, ReadOnlySpan<TRight> right) | ||
private interface ILoader<TLeft, TRight> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name is welcomed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is fine 👍
=> left.Length == right.Length | ||
&& Equals<ushort, ushort, PlainLoader<ushort>>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(left)), ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(right)), (uint)right.Length); | ||
|
||
private static bool Equals<TLeft, TRight, TLoader>(ref TLeft left, ref TRight right, nuint length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I validated that produced machine-code for x64 looks good.
(Don't have an ARM machine to look at that code)
…t(Avx.IsSupported) can be removed This caused a build-failure that should get fixed by this change.
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsAdresses the open points / comments from #84886 (cf. #84886 (comment))
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks great, thank you for your contribution @gfoidl !
Could you please run the benchmarks from dotnet/performance#3016 and share the results?
I would like to merge dotnet/performance#3016 first and then wait 1-2 days and merge this PR (to gather some data in the reporting system and show nice boost!)
Vector128<TRight> loweringMask = typeof(TRight) == typeof(byte) | ||
? Vector128.Create((byte)0x20).As<byte, TRight>() | ||
: Vector128.Create((ushort)0x20).As<ushort, TRight>(); | ||
|
||
Vector128<TRight> vecA = typeof(TRight) == typeof(byte) | ||
? Vector128.Create((byte)'a').As<byte, TRight>() | ||
: Vector128.Create((ushort)'a').As<ushort, TRight>(); | ||
|
||
Vector128<TRight> vecZMinusA = typeof(TRight) == typeof(byte) | ||
? Vector128.Create((byte)('z' - 'a')).As<byte, TRight>() | ||
: Vector128.Create((ushort)('z' - 'a')).As<ushort, TRight>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified?
Vector128<TRight> loweringMask = typeof(TRight) == typeof(byte) | |
? Vector128.Create((byte)0x20).As<byte, TRight>() | |
: Vector128.Create((ushort)0x20).As<ushort, TRight>(); | |
Vector128<TRight> vecA = typeof(TRight) == typeof(byte) | |
? Vector128.Create((byte)'a').As<byte, TRight>() | |
: Vector128.Create((ushort)'a').As<ushort, TRight>(); | |
Vector128<TRight> vecZMinusA = typeof(TRight) == typeof(byte) | |
? Vector128.Create((byte)('z' - 'a')).As<byte, TRight>() | |
: Vector128.Create((ushort)('z' - 'a')).As<ushort, TRight>(); | |
Vector128<TRight> loweringMask = Vector128.Create((TRight)(object)0x20); | |
Vector128<TRight> vecA = Vector128.Create((TRight)(object)'a'); | |
Vector128<TRight> vecZMinusA = Vector128.Create((TRight)(object)('z' - 'a')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍🏻
Won't work this way though, as the constants are either int
or char
, so an InvalidCastException
would happen.
But it works with TRight.CreateTruncating
.
{ | ||
uint valueA = uint.CreateTruncating(left[i]); | ||
uint valueB = uint.CreateTruncating(right[i]); | ||
private struct PlainLoader<T> : ILoader<T, T> where T : unmanaged, INumberBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: it could be readonly?
private struct PlainLoader<T> : ILoader<T, T> where T : unmanaged, INumberBase<T> | |
private readonly struct PlainLoader<T> : ILoader<T, T> where T : unmanaged, INumberBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it could. There are only static
methods, thus I omited the readonly
modifier.
Do we have any preference here? Some other structs, that implement static abstract interfaces, which I found in a quick search use it, so to be consistent it should be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we mostly use readonly
on such types
return | ||
Avx.IsSupported ? Avx.TestZ(vector.AsByte(), Vector256.Create((byte)0x80)) : | ||
vector.AsByte().ExtractMostSignificantBits() == 0; | ||
} | ||
else | ||
{ | ||
return | ||
Avx.IsSupported ? Avx.TestZ(vector.AsUInt16(), Vector256.Create((ushort)0xFF80)) : | ||
(vector.AsUInt16() & Vector256.Create((ushort)0xFF80)) == Vector256<ushort>.Zero; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subjective: the previous style was easier to read (at least to me)
return | |
Avx.IsSupported ? Avx.TestZ(vector.AsByte(), Vector256.Create((byte)0x80)) : | |
vector.AsByte().ExtractMostSignificantBits() == 0; | |
} | |
else | |
{ | |
return | |
Avx.IsSupported ? Avx.TestZ(vector.AsUInt16(), Vector256.Create((ushort)0xFF80)) : | |
(vector.AsUInt16() & Vector256.Create((ushort)0xFF80)) == Vector256<ushort>.Zero; | |
return Avx.IsSupported | |
? Avx.TestZ(vector.AsByte(), Vector256.Create((byte)0x80)) | |
: vector.AsByte().ExtractMostSignificantBits() == 0; | |
} | |
else | |
{ | |
return Avx.IsSupported | |
? Avx.TestZ(vector.AsUInt16(), Vector256.Create((ushort)0xFF80)) | |
: (vector.AsUInt16() & Vector256.Create((ushort)0xFF80)) == Vector256<ushort>.Zero; |
thank you for adding the support for case when Avx
is not supported!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method could be annotated as [BypassReadyToRun]
instead.
The only case when you will hit the non-Avx fallback is when R2R compiled this method without Avx support (no new platforms).
I don't know what the preference is between having duplicate code just to handle the R2R case vs. the startup cost of jiting the method to T0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easier to read (at least to me)
Same here. But a few lines above that styles is used to, and for consistency I used that one here too.
This method could be annotated as
[BypassReadyToRun]
instead.
That attribute is new to me. Thanks for the hint.
I don't know what's better here either, any guidance is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidwrighton what would you say the preference is between adding code just to handle the R2R fallback vs. disabling R2R?
In this case at least the fallback is trivial.
private static bool Equals<TLeft, TRight, TLoader>(ref TLeft left, ref TRight right, nuint length) | ||
where TLeft : unmanaged, INumberBase<TLeft> | ||
where TRight : unmanaged, INumberBase<TRight> | ||
where TLoader : ILoader<TLeft, TRight> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we restrict it to struct
?
where TLoader : ILoader<TLeft, TRight> | |
where TLoader : struct, ILoader<TLeft, TRight> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it makes sense to prevent perf-bugs very early due the struct-constraint.
=> left.Length == right.Length && SequenceEqualIgnoreCase(right, left); | ||
|
||
private static bool SequenceEqualIgnoreCase<TLeft, TRight>(ReadOnlySpan<TLeft> left, ReadOnlySpan<TRight> right) | ||
private interface ILoader<TLeft, TRight> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is fine 👍
This comment was marked as outdated.
This comment was marked as outdated.
Machine infoBenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 10 (10.0.19045.2846)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=8.0.100-preview.5.23260.3
[Host] : .NET 8.0.0 (8.0.23.25213), X64 RyuJIT AVX2
main : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
pr : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1
(removed unrelated rows from the table and renamed the job for easier interpretation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you very much @gfoidl !
@gfoidl thanks to help of @cincuranet in dotnet/performance#3016 (comment) I was able to get the charts that show improvements from your contribution: x64 x64 arm64 arm64 |
🎉 thanks for the info! |
Adresses the open points / comments from #84886 (cf. #84886 (comment))
Vector64
hw-accelerated intrinsics (instead of sw-fallback), cf. Ascii.Equals and Ascii.EqualsIgnoreCase #84886 (comment), Ascii.Equals and Ascii.EqualsIgnoreCase #84886 (comment)(byte, char)
and(byte, byte)
/(char, char)
variants