-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use crossplat vectors in GetPointerToFirstInvalidChar #90373
Conversation
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics Issue DetailsThis whole path is currently not used on ARM64 (it uses Sse4.1 requirement was lifted to Sse2 because of Vector128<ushort> Min(Vector128<ushort> a, Vector128<ushort> b)
=> Vector128.Min(a, b); ; Method mytest:Min
movups xmm0, xmmword ptr [rdx]
movups xmm1, xmmword ptr [reloc @RWD00]
paddw xmm0, xmm1
movups xmm2, xmmword ptr [r8]
paddw xmm2, xmm1
pminsw xmm0, xmm2
psubw xmm0, xmm1
movups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
RWD00 dq 8000800080008000h, 8000800080008000h
; Total bytes of code: 37
|
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[CompExactlyDependsOn(typeof(AdvSimd.Arm64))] | ||
[CompExactlyDependsOn(typeof(Sse2))] | ||
internal static Vector128<ushort> AddSaturate(Vector128<ushort> left, Vector128<ushort> right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be exposed as a public API with #82559
3e0a6df
to
226929f
Compare
{ | ||
charIsNonAscii = AdvSimd.Min(utf16Data, vector0080); | ||
// Use Sse41.Min with opportunistic ISA check on R2R instead of | ||
// slower SSE2-baked Vector128.Min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Vector128.Min slower than Sse41.Min?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector128.Min
uses fallback without SSE4.1:
Vector128<ushort> Min(Vector128<ushort> a, Vector128<ushort> b)
=> Vector128.Min(a, b);
; Method mytest:Min
movups xmm0, xmmword ptr [rdx]
movups xmm1, xmmword ptr [reloc @RWD00]
paddw xmm0, xmm1
movups xmm2, xmmword ptr [r8]
paddw xmm2, xmm1
pminsw xmm0, xmm2
psubw xmm0, xmm1
movups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
RWD00 dq 8000800080008000h, 8000800080008000h
; Total bytes of code: 37
it probably can be improved but Sse4.1 Min is a single instruction under a quick opportunistic check branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure R2R codegen that important though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a bug in Vector128.Min then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is only for R2R/NAOT which is compiled as SSE2, for JIT everything is great as is.
the idea to emit:
if (runtimeIsaCheck == 1)
{
// use single-instruction for Min
}
else
{
// slow fallback because there is no SSE4.1
}
but I decided to drop it because it looks like we have this problem already in many places where we only check Vector128.IsHardwareAccelerated
and don't care what exactly Vector128.* APIs need under the hood.
Shouldn't we be changing the fallback EDIT: #90391 |
Ah, I didn't notice you already ported it, will revert mine |
c68f5c3
to
33ad71e
Compare
wasm job is failing with dotnet/dnceng#450 |
This whole path is currently not used on ARM64 (it uses
Vector<>
instead).Sse4.1 requirement was lifted to Sse2