-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AdvSimd support for System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte #38653
AdvSimd support for System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte #38653
Conversation
Tagging subscribers to this area: @tannergooding |
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with advsimd yet so I can't comment on the intrinsics usage. But the overall control flow LGTM, modulo some comment cleanup. :)
Assuming AdvSimd.LoadVector128
allows unaligned reads, that advsimd implies little-endian, and MaxAcross(vector).ToScalar()
is the arm64 equivalent of pmovmskb
, the logic should be sound.
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
The Linux musl arm64 leg is failing in UTF8 unit tests, which might be related to my changes. I'm investigating.
|
c00016b
to
639b51d
Compare
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
3581cca
to
6e27b8c
Compare
@carlossanlop - Did you measure the performance win we get from this? |
@@ -15,6 +17,11 @@ namespace System.Text.Unicode | |||
{ | |||
internal static unsafe partial class Utf8Utility | |||
{ | |||
private static readonly Vector128<byte> s_mostSignficantBitMask = Vector128.Create((byte)0x80); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen in the past that referring static
variable is slower on ARM64 because it has to call helper. Calling helper is done by loading its address which is 3 instructions + a helper call. Since you will be accessing it inside a loop, I would recommend to make these local variables. Here is a quick test:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static int AdvSimdMoveMask1(Vector128<byte> value)
{
Debug.Assert(AdvSimd.Arm64.IsSupported);
// extractedBits[i] = (value[i] & 0x80) == 0x80 & (1 << i);
Vector128<byte> mostSignficantBitMask = s_mostSignficantBitMask;
Vector128<byte> mostSignificantBitIsSet = AdvSimd.CompareEqual(AdvSimd.And(value, mostSignficantBitMask), mostSignficantBitMask);
Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitMask128);
// self-pairwise add until all flags have moved to the first two bytes of the vector
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
return extractedBits.AsInt32().ToScalar();
}
Generated code for AdvSimdMoveMask1
; Assembly listing for method BitArrayTest.TestClass:Main(System.String[]):int
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 arg0 [V00 ] ( 0, 0 ) ref -> zero-ref class-hnd
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; V02 tmp1 [V02,T05] ( 3, 3 ) simd16 -> d16 HFA(simd16) "Inline stloc first use temp"
; V03 tmp2 [V03,T04] ( 2, 4 ) simd16 -> d8 HFA(simd16) "Inlining Arg"
; V04 tmp3 [V04,T01] ( 3, 6 ) simd16 -> d16 HFA(simd16) "dup spill"
; V05 tmp4 [V05,T02] ( 3, 6 ) simd16 -> d16 HFA(simd16) "dup spill"
; V06 tmp5 [V06,T03] ( 3, 6 ) simd16 -> d16 HFA(simd16) "dup spill"
;* V07 cse0 [V07,T00] ( 0, 0 ) long -> zero-ref "CSE - aggressive"
;
; Lcl frame size = 0
G_M58909_IG01:
A9BE7BFD stp fp, lr, [sp,#-32]!
6D0127E8 stp d8, d9, [sp,#16]
910003FD mov fp, sp
;; bbWeight=1 PerfScore 2.50
G_M58909_IG02:
4F00E5E8 movi v8.16b, #0x0f
D291BF00 movz x0, #0x8df8
F2BE2860 movk x0, #0xf143 LSL #16
F2CFFF60 movk x0, #0x7ffb LSL #32
52800021 mov w1, #1
6E084509 mov v9.d[0], v8.d[1]
97FF44AF bl CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
D2858700 movz x0, #0x2c38
F2B8D1C0 movk x0, #0xc68e LSL #16
F2C03BC0 movk x0, #478 LSL #32
F9400000 ldr x0, [x0]
3CC08010 ldr q16, [x0,#8]
6E180528 mov v8.d[1], v9.d[0]
4E301D11 and v17.16b, v8.16b, v16.16b
6E308E30 cmeq v16.16b, v17.16b, v16.16b
D2858800 movz x0, #0x2c40
F2B8D1C0 movk x0, #0xc68e LSL #16
F2C03BC0 movk x0, #478 LSL #32
F9400000 ldr x0, [x0]
3CC08011 ldr q17, [x0,#8]
4E311E10 and v16.16b, v16.16b, v17.16b
4E30BE10 addp v16.16b, v16.16b, v16.16b
4E30BE10 addp v16.16b, v16.16b, v16.16b
4E30BE10 addp v16.16b, v16.16b, v16.16b
4E042E00 smov x0, v16.s[0]
;; bbWeight=1 PerfScore 29.50
G_M58909_IG03:
6D4127E8 ldp d8, d9, [sp,#16]
A8C27BFD ldp fp, lr, [sp],#32
D65F03C0 ret lr
;; bbWeight=1 PerfScore 3.00
; Total bytes of code 124, prolog size 12, PerfScore 47.40, (MethodHash=ba8f19e2) for method BitArrayTest.TestClass:Main(System.String[]):int
; ============================================================
vs.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static int AdvSimdMoveMask2(Vector128<byte> value, Vector128<byte> mostSignficantBitMask, Vector128<byte> s_bitMask128)
{
Debug.Assert(AdvSimd.Arm64.IsSupported);
// extractedBits[i] = (value[i] & 0x80) == 0x80 & (1 << i);
Vector128<byte> mostSignificantBitIsSet = AdvSimd.CompareEqual(AdvSimd.And(value, mostSignficantBitMask), mostSignficantBitMask);
Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitMask128);
// self-pairwise add until all flags have moved to the first two bytes of the vector
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
return extractedBits.AsInt32().ToScalar();
}
Generated code for AdvSimdMoveMask2
; Assembly listing for method BitArrayTest.TestClass:Main(System.String[]):int
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 arg0 [V00 ] ( 0, 0 ) ref -> zero-ref class-hnd
; V01 loc0 [V01,T04] ( 3, 3 ) simd16 -> d16 HFA(simd16)
;* V02 loc1 [V02 ] ( 0, 0 ) simd16 -> zero-ref HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; V04 tmp1 [V04,T05] ( 2, 2 ) simd16 -> d17 HFA(simd16)
; V05 tmp2 [V05,T03] ( 2, 4 ) simd16 -> d18 HFA(simd16) "Inlining Arg"
;* V06 tmp3 [V06 ] ( 0, 0 ) simd16 -> zero-ref HFA(simd16) "Inlining Arg"
;* V07 tmp4 [V07 ] ( 0, 0 ) simd16 -> zero-ref HFA(simd16) "Inlining Arg"
; V08 tmp5 [V08,T00] ( 3, 6 ) simd16 -> d16 HFA(simd16) "dup spill"
; V09 tmp6 [V09,T01] ( 3, 6 ) simd16 -> d16 HFA(simd16) "dup spill"
; V10 tmp7 [V10,T02] ( 3, 6 ) simd16 -> d16 HFA(simd16) "dup spill"
;
; Lcl frame size = 0
G_M58909_IG01:
A9BF7BFD stp fp, lr, [sp,#-16]!
910003FD mov fp, sp
;; bbWeight=1 PerfScore 1.50
G_M58909_IG02:
4F04E410 movi v16.16b, #0x80
D2804020 movz x0, #513
F2A10080 movk x0, #0x804 LSL #16
F2C40200 movk x0, #0x2010 LSL #32
F2F00800 movk x0, #0x8040 LSL #48
4E080C11 dup v17.2d, x0
4F00E5F2 movi v18.16b, #0x0f
4E301E52 and v18.16b, v18.16b, v16.16b
6E308E50 cmeq v16.16b, v18.16b, v16.16b
4E311E10 and v16.16b, v16.16b, v17.16b
4E30BE10 addp v16.16b, v16.16b, v16.16b
4E30BE10 addp v16.16b, v16.16b, v16.16b
4E30BE10 addp v16.16b, v16.16b, v16.16b
4E042E00 smov x0, v16.s[0]
;; bbWeight=1 PerfScore 14.00
G_M58909_IG03:
A8C17BFD ldp fp, lr, [sp],#16
D65F03C0 ret lr
;; bbWeight=1 PerfScore 2.00
; Total bytes of code 72, prolog size 8, PerfScore 24.70, (MethodHash=ba8f19e2) for method BitArrayTest.TestClass:Main(System.String[]):int
; ============================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the reduced instruction count justify the added complexity of hoisting what amounts to implementation detail to the callsite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these hot paths, probably. But it also is something we should likely explicitly track as the JIT should realistically be hoisting this itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT won't hoist access to static variable because it is not an invariant. Different thread can change its value in middle of loop. See #35279 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something
This all changes for readonly statics since they can't be safely modified after their corresponding class constructors have completed.
If the JIT doesn't optimize readonly statics, then surely it should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, readonly
static optimize the helper call. I didn't realize that those are readonly static
. I tried it on my local machine and looks like s_bitMask128
variable access is getting hoisted. However s_mostSignficantBitMask
is not getting hoisted. Not saving it to a local variable Vector128<byte> mostSignficantBitMask = s_mostSignficantBitMask;
makes its access hoisted.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static ulong AllStatic(Vector128<byte> value)
{
Debug.Assert(AdvSimd.Arm64.IsSupported);
Vector128<byte> mostSignificantBitIsSet = AdvSimd.CompareEqual(AdvSimd.And(value, s_mostSignficantBitMask), s_mostSignficantBitMask);
Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitMask128);
// self-pairwise add until all flags have moved to the first two bytes of the vector
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
ulong result = extractedBits.AsUInt64().ToScalar();
return result;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static ulong OneStatic(Vector128<byte> value)
{
Debug.Assert(AdvSimd.Arm64.IsSupported);
Vector128<byte> mostSignficantBitMask = s_mostSignficantBitMask;
Vector128<byte> mostSignificantBitIsSet = AdvSimd.CompareEqual(AdvSimd.And(value, s_mostSignficantBitMask), s_mostSignficantBitMask);
Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitMask128);
// self-pairwise add until all flags have moved to the first two bytes of the vector
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
ulong result = extractedBits.AsUInt64().ToScalar();
return result;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static ulong NoStatic(Vector128<byte> value, Vector128<byte> mostSignficantBitMask, Vector128<byte> bitMask128)
{
Debug.Assert(AdvSimd.Arm64.IsSupported);
Vector128<byte> mostSignificantBitIsSet = AdvSimd.CompareEqual(AdvSimd.And(value, mostSignficantBitMask), mostSignficantBitMask);
Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, bitMask128);
// self-pairwise add until all flags have moved to the first two bytes of the vector
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
ulong result = extractedBits.AsUInt64().ToScalar();
return result;
}
NoStatic vs. OneStatic
NoStatic vs. AllStatic
Summarize:
I need to look deeper on why OneStatic version doesn't CSE the load, but for now, probably don't store s_mostSignficantBitMask
in local variable, but just use it directly in AdvSimd.And()
.
@briansull - FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for the analysis!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the CSE decision might differ based on other factors in the method, you should compare the JIT code emitted with and without static variables and choose the optimal one. You can dump the JIT code by setting environment variable set COMPlus_JITDisasm=GetPointerToFirstInvalidByte
.
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Utf8String.Experimental/src/System/Runtime/Intrinsics/Intrinsics.Shims.cs
Outdated
Show resolved
Hide resolved
Not yet but I will. |
I found a unit test that is failing in ARM64 without my changes. I reverted everything, built for arm64, ran the unit tests and this unit test still failed:
|
Is that test run in any innerloop or outerloop arm64 runs in AzDO? (or is it intentionally disabled?) |
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
Tests for last commit passed: System.Runtime
System.Text.Encoding
System.Utf8String.Experimental
|
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Show resolved
Hide resolved
The tests passed locally for the latest commit: System.Runtime
System.Text.Encoding
System.Utf8String.Experimental
|
The Windows_NT net472 x86 and x64 Release build legs hit this:
That job was executed in |
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
It seems to be hitting the shim method here that throws |
…should not be larger than 16
…ntal (S.P.Corelib code is used for being NetStandard)
We found the root cause for the net472 issue: I declared a static field in a class that is consumed by |
The changes looks good to me. |
We had a conversation offline and we decided we will collect the perf data after merging. @kunalspathak @jeffhandley |
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs
Show resolved
Hide resolved
I posted the perf results in this comment: #39050 (comment) |
@kunalspathak @echesakovMSFT I am going to address @GrabYourPitchforks 's last comments, but wanted to let you know that before I merge that commit, the CI failures seem unrelated to my changes, although they are consistent and pre-existing (i have been re-running the failed CI legs and they keep showing up). Here are the two failures summarized, which I suspect will show up again after I merge the new commit: Build libraries Windows_NT AllConfigurations x64 release
Build wasm mono
|
Unit tests passed locally for the latest commit. And because the change affects the Sse2 code too, I ran the unit tests in my x64 PC as well: ARM64System.Text.Json
System.Text.Encoding
System.Runtime
System.Utf8String.Experimental
System.Formats.Asn1
X64System.Runtime
System.Text.Encoding
System.Text.Json
System.Formats.Asn1
System.Utf8String.Experimental
|
// bump our input counter by that amount, and resume processing from the | ||
// "the first byte is no longer ASCII" portion of the main loop. | ||
// We should not expect a total number of zeroes equal or larger than 16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not expect a total number of zeroes equal or larger than 16. [](start = 27, length = 70)
minor: I would re-phrase the comment to "Make sure that pInputBuffer is not advanced by more than 15 positions."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's ok with you, I would like to address this in another PR so that I don't reset the CI (it's taking a really long time to finish).
I have another similar TODO from the other PR (modifying a comment). We can collect similar requests and address them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. No problem.
Changes LGTM |
This is ready to merge. One repeated unrelated failure: Libraries Build Windows_NT allConfigurations x64 Release:
The WASM leg is stuck with the yellow dot next to it (says it's still running). There seems to be a delay between the time a CI leg finishes and the time it gets reported in the PR CI results (I asked @ViktorHofer and he confirmed this is a known issue) The actual WASM leg already finished and it was successful: https://dev.azure.com/dnceng/public/_build/results?buildId=738082&view=logs&j=108d2c4a-8a62-5a58-8dad-8e1042acc93c&t=4a4d9b63-088e-49f3-b7e4-699b664f7a06 |
…InvalidByte (dotnet#38653) * AdvSimd support for System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte * Move comment to the top, add shims. * Little endian checks * Use custom MoveMask method for AdvSimd * Address suggestions to improve the AdvSimdMoveMask method * Define initialMask outside MoveMask method * UInt64 in Arm64MoveMask * Add unit test case to verify intrinsics improvement * Avoid casting to smaller integer type * Typo and comment * Use ShiftRightArithmetic instead of CompareEqual + And. Remove test case causing other unit tests to fail. * Use AddPairwise version of GetNotAsciiBytes * Add missing shims causing Linux build to fail * Simplify GetNonAsciiBytes to only one AddPairwise call, shorter bitmask * Respect data type returned by masking method * Address suggestions - assert trailingzerocount and bring back uint mask * Trailing zeroes in AdvSimd need to be divided by 4, and total number should not be larger than 16 * Avoid declaring static field which causes PNSE in Utf8String.Experimental (S.P.Corelib code is used for being NetStandard) * Prefer using nuint for BitConverter.TrailingZeroCount
…#39738) * AdvSimd support for System.Text.Unicode.Utf16Utility.GetPointerToFirstInvalidChar (#39050) * AdvSimd support for System.Text.Unicode.Utf16Utility.GetPointerToFirstInvalidChar * Move using directive outside #if. Improve Arm64MoveMask. * Change overloads * UIn64 in Arm64MoveMask * Build error implicit conversion fix * Rename method and use simpler version * Use ShiftRightArithmetic instead of CompareEqual + And. * Remove unnecessary comment * Add missing shims causing Linux build to fail * AdvSimd support for System.Text.Unicode.Utf8Utility.TranscodeToUtf8 (#39041) * AdvSimd support for System.Text.Unicode.Utf8Utility.TranscodeToUtf8 * Readd using to prevent build failure. Add AdvSimd equivalent operation to TestZ. * Inverted condition * Address IsSupported order, improve use ExtractNarrowingSaturated usage * Rename source to result, second argument utf16Data * Improve CompareTest * Add shims causing failures in Linux * Use unsigned version of ExtractNarrowingSaturate, avoid using MinAcross and use MaxPairwise instead * Missing support check for Sse2.X64 * Add missing case for AdvSimd * Use MinPairwise for short * AdvSimd support for System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte (#38653) * AdvSimd support for System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte * Move comment to the top, add shims. * Little endian checks * Use custom MoveMask method for AdvSimd * Address suggestions to improve the AdvSimdMoveMask method * Define initialMask outside MoveMask method * UInt64 in Arm64MoveMask * Add unit test case to verify intrinsics improvement * Avoid casting to smaller integer type * Typo and comment * Use ShiftRightArithmetic instead of CompareEqual + And. Remove test case causing other unit tests to fail. * Use AddPairwise version of GetNotAsciiBytes * Add missing shims causing Linux build to fail * Simplify GetNonAsciiBytes to only one AddPairwise call, shorter bitmask * Respect data type returned by masking method * Address suggestions - assert trailingzerocount and bring back uint mask * Trailing zeroes in AdvSimd need to be divided by 4, and total number should not be larger than 16 * Avoid declaring static field which causes PNSE in Utf8String.Experimental (S.P.Corelib code is used for being NetStandard) * Prefer using nuint for BitConverter.TrailingZeroCount * Fix build failure in net472 debug AdvSimd Utf16Utility (#39652) Co-authored-by: Carlos Sanchez Lopez <1175054+carlossanlop@users.noreply.github.com>
…InvalidByte (dotnet#38653) * AdvSimd support for System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte * Move comment to the top, add shims. * Little endian checks * Use custom MoveMask method for AdvSimd * Address suggestions to improve the AdvSimdMoveMask method * Define initialMask outside MoveMask method * UInt64 in Arm64MoveMask * Add unit test case to verify intrinsics improvement * Avoid casting to smaller integer type * Typo and comment * Use ShiftRightArithmetic instead of CompareEqual + And. Remove test case causing other unit tests to fail. * Use AddPairwise version of GetNotAsciiBytes * Add missing shims causing Linux build to fail * Simplify GetNonAsciiBytes to only one AddPairwise call, shorter bitmask * Respect data type returned by masking method * Address suggestions - assert trailingzerocount and bring back uint mask * Trailing zeroes in AdvSimd need to be divided by 4, and total number should not be larger than 16 * Avoid declaring static field which causes PNSE in Utf8String.Experimental (S.P.Corelib code is used for being NetStandard) * Prefer using nuint for BitConverter.TrailingZeroCount
Contributes to #35035
Adds AdvSimd.Arm64 support for
System.Text.Unicode.Utf16Utility.GetPointerToFirstInvalidChar()
inside the file
runtime\src\libraries\System.Private.CoreLib\src\System\Text\Unicode\Utf8Utility.Validation.cs
The tests for this method live in:
runtime\src\libraries\System.Runtime\tests\System\Text\Unicode\Utf8UtilityTests.ValidateBytes.cs
I've been having difficulties testing this in my ARM device so I want to analyze the CI results.
I manually executed an additional "Libraries Test Run" pipeline to ensure arm64 is run in all platforms.