-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GetIndexOfFirstNonAsciiByte_Vector not taken on ARM64 #90527
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsFixes #89924
|
if (!Vector512.IsHardwareAccelerated && | ||
!Vector256.IsHardwareAccelerated && | ||
(Sse2.IsSupported || AdvSimd.IsSupported)) | ||
if (!Vector512.IsHardwareAccelerated && !Vector256.IsHardwareAccelerated && Sse2.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, then doesn't this change leave a bunch of dead code in GetIndexOfFirstNonAsciiByte_Intrinsified that should be cleaned up? It has multiple blocks dedicated to AdvSimd which would never be reached. At which point the method should also be renamed to be SSE-specific.
But I'd also explicitly asked about this:
#88532 (comment)
and was told the preferring of the intrinsics path was by design. Maybe it was only measured on x64 and not on Arm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, then doesn't this change leave a bunch of dead code in GetIndexOfFirstNonAsciiByte_Intrinsified that should be cleaned up? It has multiple blocks dedicated to AdvSimd which would never be reached. At which point the method should also be renamed to be SSE-specific.
Let me fix that.
The _Vector
path is estimated to be ~40% faster on osx-arm64
, It is likely the switch was not measured on ARM64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _Vector path is estimated to be ~40% faster on osx-arm64, It is likely the switch was not measured on ARM64.
@neon-sunset can you clarify please? .NET 7 was using the _Intrinsified
path for Arm64: https://github.com/dotnet/runtime/blob/v7.0.10/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91
The PR that introduced the _Vector
path (#88532) didn't change _Intrinsified
at all (outside using the centralized Vector128.Size
constant) and didn't change which Arm64 was calling.
This then wouldn't seem like a regression, but rather simply a case where the new _Vector
code is faster for Arm64. If that's the case, could you share your own numbers showing the .NET 7
vs .NET 8
difference (without this PR) and the difference for _Vector
vs _Intrinsified
for .NET 8 (this PR vs without this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neon-sunset can you clarify please? .NET 7 was using the _Intrinsified path for Arm64: https://github.com/dotnet/runtime/blob/v7.0.10/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91
You are correct, my mistake.
This then wouldn't seem like a regression, but rather simply a case where the new _Vector code is faster for Arm64. If that's the case, could you share your own numbers showing the .NET 7 vs .NET 8 difference (without this PR) and the difference for _Vector vs _Intrinsified for .NET 8 (this PR vs without this PR)?
This is the plan, I'm just waiting for the Release
build to finish to post detailed numbers 😄
9985c3e
to
a00b989
Compare
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Fixes #89924
CC @EgorBo @anthonycanino