Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for Ascii.Equals and Ascii.EqualsIgnoreCase #85926

Merged
merged 6 commits into from
May 11, 2023

Conversation

gfoidl
Copy link
Member

@gfoidl gfoidl commented May 8, 2023

Adresses the open points / comments from #84886 (cf. #84886 (comment))

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 8, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 8, 2023
=> left.Length == right.Length && SequenceEqualIgnoreCase(right, left);

private static bool SequenceEqualIgnoreCase<TLeft, TRight>(ReadOnlySpan<TLeft> left, ReadOnlySpan<TRight> right)
private interface ILoader<TLeft, TRight>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name is welcomed...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is fine 👍

=> left.Length == right.Length
&& Equals<ushort, ushort, PlainLoader<ushort>>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(left)), ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(right)), (uint)right.Length);

private static bool Equals<TLeft, TRight, TLoader>(ref TLeft left, ref TRight right, nuint length)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I validated that produced machine-code for x64 looks good.
(Don't have an ARM machine to look at that code)

…t(Avx.IsSupported) can be removed

This caused a build-failure that should get fixed by this change.
@adamsitnik adamsitnik added area-System.Text.Encoding tenet-performance Performance related issue and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 9, 2023
@ghost
Copy link

ghost commented May 9, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Adresses the open points / comments from #84886 (cf. #84886 (comment))

Author: gfoidl
Assignees: -
Labels:

area-System.Text.Encoding, tenet-performance, community-contribution, needs-area-label

Milestone: -

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks great, thank you for your contribution @gfoidl !

Could you please run the benchmarks from dotnet/performance#3016 and share the results?

I would like to merge dotnet/performance#3016 first and then wait 1-2 days and merge this PR (to gather some data in the reporting system and show nice boost!)

Comment on lines 219 to 229
Vector128<TRight> loweringMask = typeof(TRight) == typeof(byte)
? Vector128.Create((byte)0x20).As<byte, TRight>()
: Vector128.Create((ushort)0x20).As<ushort, TRight>();

Vector128<TRight> vecA = typeof(TRight) == typeof(byte)
? Vector128.Create((byte)'a').As<byte, TRight>()
: Vector128.Create((ushort)'a').As<ushort, TRight>();

Vector128<TRight> vecZMinusA = typeof(TRight) == typeof(byte)
? Vector128.Create((byte)('z' - 'a')).As<byte, TRight>()
: Vector128.Create((ushort)('z' - 'a')).As<ushort, TRight>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified?

Suggested change
Vector128<TRight> loweringMask = typeof(TRight) == typeof(byte)
? Vector128.Create((byte)0x20).As<byte, TRight>()
: Vector128.Create((ushort)0x20).As<ushort, TRight>();
Vector128<TRight> vecA = typeof(TRight) == typeof(byte)
? Vector128.Create((byte)'a').As<byte, TRight>()
: Vector128.Create((ushort)'a').As<ushort, TRight>();
Vector128<TRight> vecZMinusA = typeof(TRight) == typeof(byte)
? Vector128.Create((byte)('z' - 'a')).As<byte, TRight>()
: Vector128.Create((ushort)('z' - 'a')).As<ushort, TRight>();
Vector128<TRight> loweringMask = Vector128.Create((TRight)(object)0x20);
Vector128<TRight> vecA = Vector128.Create((TRight)(object)'a');
Vector128<TRight> vecZMinusA = Vector128.Create((TRight)(object)('z' - 'a'));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍🏻
Won't work this way though, as the constants are either int or char, so an InvalidCastException would happen.
But it works with TRight.CreateTruncating.

{
uint valueA = uint.CreateTruncating(left[i]);
uint valueB = uint.CreateTruncating(right[i]);
private struct PlainLoader<T> : ILoader<T, T> where T : unmanaged, INumberBase<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: it could be readonly?

Suggested change
private struct PlainLoader<T> : ILoader<T, T> where T : unmanaged, INumberBase<T>
private readonly struct PlainLoader<T> : ILoader<T, T> where T : unmanaged, INumberBase<T>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could. There are only static methods, thus I omited the readonly modifier.
Do we have any preference here? Some other structs, that implement static abstract interfaces, which I found in a quick search use it, so to be consistent it should be applied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we mostly use readonly on such types

Comment on lines +1488 to +1496
return
Avx.IsSupported ? Avx.TestZ(vector.AsByte(), Vector256.Create((byte)0x80)) :
vector.AsByte().ExtractMostSignificantBits() == 0;
}
else
{
return
Avx.IsSupported ? Avx.TestZ(vector.AsUInt16(), Vector256.Create((ushort)0xFF80)) :
(vector.AsUInt16() & Vector256.Create((ushort)0xFF80)) == Vector256<ushort>.Zero;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subjective: the previous style was easier to read (at least to me)

Suggested change
return
Avx.IsSupported ? Avx.TestZ(vector.AsByte(), Vector256.Create((byte)0x80)) :
vector.AsByte().ExtractMostSignificantBits() == 0;
}
else
{
return
Avx.IsSupported ? Avx.TestZ(vector.AsUInt16(), Vector256.Create((ushort)0xFF80)) :
(vector.AsUInt16() & Vector256.Create((ushort)0xFF80)) == Vector256<ushort>.Zero;
return Avx.IsSupported
? Avx.TestZ(vector.AsByte(), Vector256.Create((byte)0x80))
: vector.AsByte().ExtractMostSignificantBits() == 0;
}
else
{
return Avx.IsSupported
? Avx.TestZ(vector.AsUInt16(), Vector256.Create((ushort)0xFF80))
: (vector.AsUInt16() & Vector256.Create((ushort)0xFF80)) == Vector256<ushort>.Zero;

thank you for adding the support for case when Avx is not supported!

Copy link
Member

@MihaZupan MihaZupan May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method could be annotated as [BypassReadyToRun] instead.
The only case when you will hit the non-Avx fallback is when R2R compiled this method without Avx support (no new platforms).

I don't know what the preference is between having duplicate code just to handle the R2R case vs. the startup cost of jiting the method to T0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier to read (at least to me)

Same here. But a few lines above that styles is used to, and for consistency I used that one here too.

This method could be annotated as [BypassReadyToRun] instead.

That attribute is new to me. Thanks for the hint.
I don't know what's better here either, any guidance is welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwrighton what would you say the preference is between adding code just to handle the R2R fallback vs. disabling R2R?
In this case at least the fallback is trivial.

private static bool Equals<TLeft, TRight, TLoader>(ref TLeft left, ref TRight right, nuint length)
where TLeft : unmanaged, INumberBase<TLeft>
where TRight : unmanaged, INumberBase<TRight>
where TLoader : ILoader<TLeft, TRight>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we restrict it to struct?

Suggested change
where TLoader : ILoader<TLeft, TRight>
where TLoader : struct, ILoader<TLeft, TRight>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it makes sense to prevent perf-bugs very early due the struct-constraint.

=> left.Length == right.Length && SequenceEqualIgnoreCase(right, left);

private static bool SequenceEqualIgnoreCase<TLeft, TRight>(ReadOnlySpan<TLeft> left, ReadOnlySpan<TRight> right)
private interface ILoader<TLeft, TRight>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is fine 👍

@gfoidl

This comment was marked as outdated.

@tarekgh tarekgh added this to the 8.0.0 milestone May 10, 2023
@gfoidl
Copy link
Member Author

gfoidl commented May 11, 2023

Could you please run the benchmarks from dotnet/performance#3016 and share the results?

Machine info
BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 10 (10.0.19045.2846)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=8.0.100-preview.5.23260.3
  [Host]     : .NET 8.0.0 (8.0.23.25213), X64 RyuJIT AVX2
  main : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  pr : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Job Size Mean Error Ratio
IsValid_Bytes main 6 2.537 ns 0.2359 ns 1.00
IsValid_Bytes pr 6 2.016 ns 0.0397 ns 0.80
IsValid_Chars main 6 3.387 ns 0.2208 ns 1.00
IsValid_Chars pr 6 2.742 ns 0.0860 ns 0.82
Equals_Bytes main 6 8.860 ns 0.4392 ns 1.00
Equals_Bytes pr 6 5.099 ns 0.1212 ns 0.58
Equals_Chars main 6 9.591 ns 0.3690 ns 1.00
Equals_Chars pr 6 5.742 ns 0.0648 ns 0.60
Equals_Bytes_Chars main 6 9.366 ns 0.5560 ns 1.00
Equals_Bytes_Chars pr 6 6.061 ns 0.1365 ns 0.66
EqualsIgnoreCase_ExactlyTheSame_Bytes main 6 9.026 ns 0.4026 ns 1.00
EqualsIgnoreCase_ExactlyTheSame_Bytes pr 6 8.822 ns 0.1208 ns 0.97
EqualsIgnoreCase_ExactlyTheSame_Chars main 6 8.797 ns 0.1992 ns 1.00
EqualsIgnoreCase_ExactlyTheSame_Chars pr 6 9.300 ns 0.1225 ns 1.06
EqualsIgnoreCase_ExactlyTheSame_Bytes_Chars main 6 7.201 ns 0.1652 ns 1.00
EqualsIgnoreCase_ExactlyTheSame_Bytes_Chars pr 6 9.225 ns 0.2208 ns 1.28
EqualsIgnoreCase_DifferentCase_Bytes main 6 11.150 ns 0.2578 ns 1.00
EqualsIgnoreCase_DifferentCase_Bytes pr 6 10.116 ns 0.2073 ns 0.91
EqualsIgnoreCase_DifferentCase_Chars main 6 11.533 ns 0.1493 ns 1.00
EqualsIgnoreCase_DifferentCase_Chars pr 6 10.151 ns 0.1410 ns 0.88
EqualsIgnoreCase_DifferentCase_Bytes_Chars main 6 11.220 ns 0.1978 ns 1.00
EqualsIgnoreCase_DifferentCase_Bytes_Chars pr 6 10.293 ns 0.1117 ns 0.92
IsValid_Bytes main 128 2.958 ns 0.0841 ns 1.00
IsValid_Bytes pr 128 2.956 ns 0.0446 ns 1.00
IsValid_Chars main 128 4.798 ns 0.1310 ns 1.00
IsValid_Chars pr 128 5.039 ns 0.0931 ns 1.05
Equals_Bytes main 128 11.362 ns 0.1925 ns 1.00
Equals_Bytes pr 128 6.118 ns 0.1530 ns 0.54
Equals_Chars main 128 15.352 ns 0.3357 ns 1.00
Equals_Chars pr 128 12.580 ns 0.2221 ns 0.82
Equals_Bytes_Chars main 128 17.121 ns 0.3781 ns 1.00
Equals_Bytes_Chars pr 128 13.550 ns 0.3093 ns 0.80
EqualsIgnoreCase_ExactlyTheSame_Bytes main 128 118.966 ns 1.8113 ns 1.00
EqualsIgnoreCase_ExactlyTheSame_Bytes pr 128 10.107 ns 0.2089 ns 0.08
EqualsIgnoreCase_ExactlyTheSame_Chars main 128 153.375 ns 2.8708 ns 1.00
EqualsIgnoreCase_ExactlyTheSame_Chars pr 128 15.381 ns 0.3254 ns 0.10
EqualsIgnoreCase_ExactlyTheSame_Bytes_Chars main 128 109.199 ns 1.7805 ns 1.00
EqualsIgnoreCase_ExactlyTheSame_Bytes_Chars pr 128 16.945 ns 0.4240 ns 0.16
EqualsIgnoreCase_DifferentCase_Bytes main 128 159.318 ns 2.9244 ns 1.00
EqualsIgnoreCase_DifferentCase_Bytes pr 128 16.653 ns 0.3459 ns 0.10
EqualsIgnoreCase_DifferentCase_Chars main 128 158.219 ns 2.2267 ns 1.00
EqualsIgnoreCase_DifferentCase_Chars pr 128 28.308 ns 0.5351 ns 0.18
EqualsIgnoreCase_DifferentCase_Bytes_Chars main 128 231.040 ns 2.1013 ns 1.00
EqualsIgnoreCase_DifferentCase_Bytes_Chars pr 128 31.340 ns 0.5958 ns 0.14

(removed unrelated rows from the table and renamed the job for easier interpretation)

@adamsitnik adamsitnik self-assigned this May 11, 2023
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you very much @gfoidl !

@tarekgh tarekgh merged commit b0b133b into dotnet:main May 11, 2023
@gfoidl gfoidl deleted the ascii-equality-pt-2 branch May 11, 2023 17:00
@adamsitnik
Copy link
Member

@gfoidl thanks to help of @cincuranet in dotnet/performance#3016 (comment) I was able to get the charts that show improvements from your contribution:

x64 Equals:

image

x64 EqualsIgnoreCase:

image

arm64 EqualsIgnoreCase:

image

arm64 Equals:

image

@gfoidl
Copy link
Member Author

gfoidl commented May 17, 2023

🎉 thanks for the info!

@ghost ghost locked as resolved and limited conversation to collaborators Jun 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants