Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port SpanHelpers.IndexOfAny(ref byte, byte, byte, int) to Vector128/256 #73556

Closed
wants to merge 4 commits into from

Conversation

adamsitnik
Copy link
Member

It's #73384 again

I've tried to repro #73474 locally and I've failed. Now I want to use the CI to verify if the issue is gone or not.

./build.sh --os browser -c Release
make -C src/mono/wasm provision-wasm
export EMSDK_PATH=/home/adam/projects/runtime/src/mono/wasm/emsdk
./dotnet.sh build /t:Test /p:TargetOS=Browser /p:TargetArchitecture=wasm /p:RunAOTCompilation=true /p:EnableAggressiveTrimming=true -c Release ./src/libraries/System.Runtime/tests
  ~/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle ~/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle ~/projects/runtime/src/libraries/System.Runtime/tests
  [1.0.0-prerelease.22375.5+d43453b9981bfd8705025250caff58d18e2abc7e] XHarness command issued: wasm test --app=. --output-directory=/home/adam/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle/xharness-output -s dotnet.js.symbols --symbol-patterns wasm-symbol-patterns.txt --symbolicator WasmSymbolicator.dll,Microsoft.WebAssembly.Internal.SymbolicatorWrapperForXHarness --engine=V8 --engine-arg=--stack-trace-limit=1000 --js-file=test-main.js -- --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
  warn: Cannot find symbols file dotnet.js.symbols
  info: 
        
  info: Running v8 --expose_wasm --stack-trace-limit=1000 test-main.js -- --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
        
  info: Incoming arguments: --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
  info: Application arguments: --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
  info: console.debug: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
  info: console.info: Initializing.....
  info: Discovering: System.Runtime.Tests.dll (method display = ClassAndMethod, method display options = None)
  info: Discovered:  System.Runtime.Tests.dll (found 8900 of 9043 test cases)
  info: Using random seed for test cases: 1658957131
  info: Using random seed for collections: 1658957131
  info: Starting:    System.Runtime.Tests.dll
  info: Finished:    System.Runtime.Tests.dll
  info: 
  info: === TEST EXECUTION SUMMARY ===
  info: Total: 49567, Errors: 0, Failed: 0, Skipped: 85, Time: 42.670667s
  info: 
  info: Received expected 13414172 of /home/adam/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle/xharness-output/testResults.xml
  info: Finished writing 13414172 bytes of RESULTXML
  info: Xml file was written to the provided writer.
  info: Tests run: 49567 Passed: 49482 Inconclusive: 0 Failed: 0 Ignored: 0 Skipped: 85
  info: Process v8 exited with 0
        
  info: Waiting to flush log messages with a timeout of 120 secs ..
  info: Application has finished with exit code: 0
  XHarness exit code: 0
  ~/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle ~/projects/runtime/src/libraries/System.Runtime/tests
  ----- end Mon Aug 8 11:33:56 CEST 2022 ----- exit code 0 ----------------------------------------------------------
  XHarness artifacts: /home/adam/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle/xharness-output

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:04:33.82

@ghost
Copy link

ghost commented Aug 8, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

It's #73384 again

I've tried to repro #73474 locally and I've failed. Now I want to use the CI to verify if the issue is gone or not.

./build.sh --os browser -c Release
make -C src/mono/wasm provision-wasm
export EMSDK_PATH=/home/adam/projects/runtime/src/mono/wasm/emsdk
./dotnet.sh build /t:Test /p:TargetOS=Browser /p:TargetArchitecture=wasm /p:RunAOTCompilation=true /p:EnableAggressiveTrimming=true -c Release ./src/libraries/System.Runtime/tests
  ~/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle ~/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle ~/projects/runtime/src/libraries/System.Runtime/tests
  [1.0.0-prerelease.22375.5+d43453b9981bfd8705025250caff58d18e2abc7e] XHarness command issued: wasm test --app=. --output-directory=/home/adam/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle/xharness-output -s dotnet.js.symbols --symbol-patterns wasm-symbol-patterns.txt --symbolicator WasmSymbolicator.dll,Microsoft.WebAssembly.Internal.SymbolicatorWrapperForXHarness --engine=V8 --engine-arg=--stack-trace-limit=1000 --js-file=test-main.js -- --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
  warn: Cannot find symbols file dotnet.js.symbols
  info: 
        
  info: Running v8 --expose_wasm --stack-trace-limit=1000 test-main.js -- --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
        
  info: Incoming arguments: --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
  info: Application arguments: --run WasmTestRunner.dll System.Runtime.Tests.dll -notrait category=AdditionalTimezoneChecks -notrait category=OuterLoop -notrait category=failing
  info: console.debug: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
  info: console.info: Initializing.....
  info: Discovering: System.Runtime.Tests.dll (method display = ClassAndMethod, method display options = None)
  info: Discovered:  System.Runtime.Tests.dll (found 8900 of 9043 test cases)
  info: Using random seed for test cases: 1658957131
  info: Using random seed for collections: 1658957131
  info: Starting:    System.Runtime.Tests.dll
  info: Finished:    System.Runtime.Tests.dll
  info: 
  info: === TEST EXECUTION SUMMARY ===
  info: Total: 49567, Errors: 0, Failed: 0, Skipped: 85, Time: 42.670667s
  info: 
  info: Received expected 13414172 of /home/adam/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle/xharness-output/testResults.xml
  info: Finished writing 13414172 bytes of RESULTXML
  info: Xml file was written to the provided writer.
  info: Tests run: 49567 Passed: 49482 Inconclusive: 0 Failed: 0 Ignored: 0 Skipped: 85
  info: Process v8 exited with 0
        
  info: Waiting to flush log messages with a timeout of 120 secs ..
  info: Application has finished with exit code: 0
  XHarness exit code: 0
  ~/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle ~/projects/runtime/src/libraries/System.Runtime/tests
  ----- end Mon Aug 8 11:33:56 CEST 2022 ----- exit code 0 ----------------------------------------------------------
  XHarness artifacts: /home/adam/projects/runtime/artifacts/bin/System.Runtime.Tests/Release/net7.0-browser/browser-wasm/AppBundle/xharness-output

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:04:33.82

Author: adamsitnik
Assignees: -
Labels:

area-System.Memory

Milestone: -

@adamsitnik adamsitnik requested a review from jkotas August 8, 2022 12:25
@adamsitnik
Copy link
Member Author

@jkotas the Build Browser wasm Linux Release LibraryTests_AOT build is green

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

What fixed the underlying issue? The crash was caused by reading beyond end of the buffer. The CI may be green by luck - the memory after the end of the buffer might have been initialized just right to avoid the crash in the given run.

@vargaz or @lambdageek should inspect the IL to confirm that the problem is gone. Also, we should find the change that fixed it.

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

Ok, #73493 fixed the out of bound read, replaced it by rejecting the whole method with BadImageFormatException. Do we still have a functional issue at runtime?

@adamsitnik
Copy link
Member Author

Do we still have a functional issue at runtime?

System.Memory tests exercise this code path and they are passing on machine with AVX2

  ~/projects/runtime/artifacts/bin/System.Memory.Tests/Release/net7.0/browser-wasm/AppBundle ~/projects/runtime/artifacts/bin/System.Memory.Tests/Release/net7.0/browser-wasm/AppBundle ~/projects/runtime/src/libraries/System.Memory/tests
  [1.0.0-prerelease.22375.5+d43453b9981bfd8705025250caff58d18e2abc7e] XHarness command issued: wasm test --app=. --output-directory=/home/adam/projects/runtime/artifacts/bin/System.Memory.Tests/Release/net7.0/browser-wasm/AppBundle/xharness-output -s dotnet.js.symbols --symbol-patterns wasm-symbol-patterns.txt --symbolicator WasmSymbolicator.dll,Microsoft.WebAssembly.Internal.SymbolicatorWrapperForXHarness --engine=V8 --engine-arg=--stack-trace-limit=1000 --js-file=test-main.js -- --run WasmTestRunner.dll System.Memory.Tests.dll -notrait category=OuterLoop -notrait category=failing
  warn: Cannot find symbols file dotnet.js.symbols
  info: 
        
  info: Running v8 --expose_wasm --stack-trace-limit=1000 test-main.js -- --run WasmTestRunner.dll System.Memory.Tests.dll -notrait category=OuterLoop -notrait category=failing
        
  info: Incoming arguments: --run WasmTestRunner.dll System.Memory.Tests.dll -notrait category=OuterLoop -notrait category=failing
  info: Application arguments: --run WasmTestRunner.dll System.Memory.Tests.dll -notrait category=OuterLoop -notrait category=failing
  info: console.debug: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
  info: console.info: Initializing.....
  info: Discovering: System.Memory.Tests.dll (method display = ClassAndMethod, method display options = None)
  info: Discovered:  System.Memory.Tests.dll (found 2408 of 2430 test cases)
  info: Using random seed for test cases: 75981927
  info: Using random seed for collections: 75981927
  info: Starting:    System.Memory.Tests.dll
  info: Finished:    System.Memory.Tests.dll
  info: 
  info: === TEST EXECUTION SUMMARY ===
  info: Total: 47871, Errors: 0, Failed: 0, Skipped: 0, Time: 28.899475s
  info: 
  info: Process v8 exited with 0
        
  info: Received expected 12787906 of /home/adam/projects/runtime/artifacts/bin/System.Memory.Tests/Release/net7.0/browser-wasm/AppBundle/xharness-output/testResults.xml
  info: Waiting to flush log messages with a timeout of 120 secs ..
  info: Finished writing 12787906 bytes of RESULTXML
  info: Xml file was written to the provided writer.
  info: Tests run: 47871 Passed: 47871 Inconclusive: 0 Failed: 0 Ignored: 0 Skipped: 0
  info: Application has finished with exit code: 0
  XHarness exit code: 0
  ~/projects/runtime/artifacts/bin/System.Memory.Tests/Release/net7.0/browser-wasm/AppBundle ~/projects/runtime/src/libraries/System.Memory/tests
  ----- end Mon Aug 8 14:50:58 CEST 2022 ----- exit code 0 ----------------------------------------------------------
  XHarness artifacts: /home/adam/projects/runtime/artifacts/bin/System.Memory.Tests/Release/net7.0/browser-wasm/AppBundle/xharness-output

Build succeeded.
    0 Warning(s)
    0 Error(s)

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

Do the tests exercise the AOT code of method, or are they falling back to the interpreter silently? If it is the latter, we are likely going to see perf regressions.

@adamsitnik
Copy link
Member Author

Do the tests exercise the AOT code of method, or are they falling back to the interpreter silently? If it is the latter, we are likely going to see perf regressions.

I am using instructions provided by @lambdageek in #73474 (comment) to run these tests

/p:RunAOTCompilation=true /p:EnableAggressiveTrimming=true 

I would expect that they exercise the AOT code of method. How can I check if they don't fall back to interpreter silently?

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

I don’t know how to check that. Somebody from the Mono team needs to help you with at that.

@EgorBo
Copy link
Member

EgorBo commented Aug 8, 2022

I guess MONO_VERBOSE_METHOD=methodName might help (it's not expected to print anything when you run already AOT'd binary)

@adamsitnik
Copy link
Member Author

I've tried to benchmark WASM AOT but I don't think that it's currently possible: dotnet/BenchmarkDotNet#1818 (comment)

@lambdageek
Copy link
Member

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lambdageek
Copy link
Member

lambdageek commented Aug 8, 2022

I'm trying the System.Memory tests locally. I would expect any call to IndexOfAny to throw BIFE on wasm AOT. there shouldn't be a fallback to the interpreter, AFAIK. Perhaps we're skipping some of these tests already for other reasons

@lambdageek
Copy link
Member

lambdageek commented Aug 8, 2022

Ah interesting.
Apparently in this configuration the AOT compiler does just skip the method and relies on the interpreter to throw.
But the interpreter doesn't throw - it doesn't have a check that the fallthru on a conditional branch is within the bounds of a method. and dynamically the branch is always taken.

So this PR is good; i'm working, separately, on adding a fallthru check to the interpreter

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

this PR is good

Do you agree that this PR is going to introduce perf regression since the method won’t be AOT compiled anymore?

@lambdageek
Copy link
Member

this PR is good

Do you agree that this PR is going to introduce perf regression since the method won’t be AOT compiled anymore?

Yes, I should qualify "good" - this PR wont' crash the AOT compiler or the runtime anymore. The perf won't be good.

@lambdageek
Copy link
Member

dynamically the branch is always taken.

actually dynamically this code is unreachable

@adamsitnik
Copy link
Member Author

@jkotas @lambdageek @vitek-karas I am using similar patterns in few other PRs and aiming to port all SpanHelpers code to Vector128/256 before we ship 7 (the PRs are ready). The thing I don't understand is what exactly is wrong here. Based on #73474 (comment) I know that linker is producing bad IL.

But why? Is it because I am using Vector128/256 types?
Is my understanding correct that NativeAOT is also using the linker and it's going to face same issues?

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

Yes, something in the refactored methods exposed a bug in the IL linker.

Native AOT does not have the bug. It is not sharing the relevant part of the linker.

@lambdageek
Copy link
Member

But why? Is it because I am using Vector128/256 types?

I'm not sure, but I think I have a workaround:

if I change the last else if (Vector.IsHardwareAccelerated) to:

            else
             {
                Debug.Assert (Vector.IsHardwareAccelerated);

then I can also delete

            Debug.Fail("Unreachable");
            goto NotFound; 

and then in the IL that comes out of the linker, I don't see a conditional jump at the end of the method anymore.

(I'm not sure if this has any undesired impact on CoreCLR performance)

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

My guess is that the bug is triggered by certain shape of the method flowgraph. It is not about types used.

@vitek-karas
Copy link
Member

Sorry - I didn't get to this yet (first on my list after emails/management)...
My theory is that:
The method ends with an if which looks like:

Early1:
Early2:

if (something)
{
    goto Early1;
}
else
{
    goto Early2;
}

Basically the method ends with an if/else where both branches go backwards. In that case it's invalid to remove the "false" branch, because there's nothing left in the method - and even though that branch is technically not reachable, it needs to exist for validity.

@EgorBo
Copy link
Member

EgorBo commented Aug 8, 2022

Can it still remove it and fold ldc.0 + brfalse to just br ? or there can be other similar shapes (e.g. a local instead of constant)

@vitek-karas
Copy link
Member

I'll have to go through the code - but in general the code in the linker is very simplistic and tries to avoid changing the number of instructions (it's problematic to move code around in that setup) - so it tries to keep the same number of instructions. It's also relatively problematic to fully remove the ldc.0 and so on - if the condition is over a call then removing the entire call is relatively complicated - and we don't do that. Which means we need to "pop" somehow.

I'm considering either disabling the optimization in this case or to force insert a dummy ret at the end of the method.

@jkotas
Copy link
Member

jkotas commented Aug 8, 2022

force insert a dummy ret at the end of the method.

ldnull + throw may be better. Ret requires exactly zero or one items on the evaluation stack.

@adamsitnik
Copy link
Member Author

@jkotas @lambdageek @vitek-karas Thanks for the explanation! I've removed all the gotos, similified the code and ensured that there are no regressions for configs that I can benchmark (CLR x64 AVX2 AVX & arm64 AdvSIMD).
It was a good excercise and I am defnitely not going to recommend using goto in the guidelines that I am currently working on ;)

The perf has not regressed, it's on par. When reading the Ratio columns please keep in mind that for small number of elements we are dealing with nanobenchmarks and 3.4 ns vs 3.2 ns is a 6% difference.

AVX2 (x64)

BenchmarkDotNet=v0.13.1.1845-nightly, OS=Windows 11 (10.0.22000.795/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]     : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  Job-FHASLE : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-JRHZCS : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX2

LaunchCount=9
Method Toolchain Size Mean Ratio
IndexOfAnyTwoValues \PR\corerun.exe 1 1.922 ns 0.82
IndexOfAnyTwoValues \baseline\corerun.exe 1 2.357 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 4 2.513 ns 0.91
IndexOfAnyTwoValues \baseline\corerun.exe 4 2.768 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 8 3.459 ns 1.08
IndexOfAnyTwoValues \baseline\corerun.exe 8 3.216 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 32 2.202 ns 0.68
IndexOfAnyTwoValues \baseline\corerun.exe 32 3.223 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 33 2.200 ns 0.69
IndexOfAnyTwoValues \baseline\corerun.exe 33 3.202 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 63 2.224 ns 0.69
IndexOfAnyTwoValues \baseline\corerun.exe 63 3.231 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 64 2.773 ns 0.75
IndexOfAnyTwoValues \baseline\corerun.exe 64 3.689 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 512 6.235 ns 0.76
IndexOfAnyTwoValues \baseline\corerun.exe 512 8.181 ns 1.00

Code gen before:

; System.SpanHelpers.IndexOfAny(Byte ByRef, Byte, Byte, Int32)
       push      rsi
       vzeroupper
       movzx     edx,dl
       mov       eax,edx
       movzx     r8d,r8b
       mov       r10d,r8d
       xor       r11d,r11d
       mov       esi,r9d
       movsxd    r9,r9d
       add       r9,0FFFFFFFFFFFFFFF0
       js        near ptr M00_L01
       mov       rsi,r9
       jmp       near ptr M00_L13
M00_L00:
       add       rsi,0FFFFFFFFFFFFFFF8
       movzx     edx,byte ptr [rcx+r11]
       cmp       eax,edx
       je        near ptr M00_L05
       cmp       r10d,edx
       je        near ptr M00_L05
       movzx     edx,byte ptr [rcx+r11+1]
       cmp       eax,edx
       je        near ptr M00_L06
       cmp       r10d,edx
       je        near ptr M00_L06
       movzx     edx,byte ptr [rcx+r11+2]
       cmp       eax,edx
       je        near ptr M00_L07
       cmp       r10d,edx
       je        near ptr M00_L07
       movzx     edx,byte ptr [rcx+r11+3]
       cmp       eax,edx
       je        near ptr M00_L08
       cmp       r10d,edx
       je        near ptr M00_L08
       movzx     edx,byte ptr [rcx+r11+4]
       cmp       eax,edx
       je        near ptr M00_L09
       cmp       r10d,edx
       je        near ptr M00_L09
       movzx     edx,byte ptr [rcx+r11+5]
       cmp       eax,edx
       je        near ptr M00_L10
       cmp       r10d,edx
       je        near ptr M00_L10
       movzx     edx,byte ptr [rcx+r11+6]
       cmp       eax,edx
       je        near ptr M00_L11
       cmp       r10d,edx
       je        near ptr M00_L11
       movzx     edx,byte ptr [rcx+r11+7]
       cmp       eax,edx
       je        near ptr M00_L12
       cmp       r10d,edx
       je        near ptr M00_L12
       add       r11,8
M00_L01:
       cmp       rsi,8
       jae       near ptr M00_L00
       cmp       rsi,4
       jb        short M00_L02
       add       rsi,0FFFFFFFFFFFFFFFC
       movzx     edx,byte ptr [rcx+r11]
       cmp       eax,edx
       je        short M00_L05
       cmp       r10d,edx
       je        short M00_L05
       movzx     edx,byte ptr [rcx+r11+1]
       cmp       eax,edx
       je        short M00_L06
       cmp       r10d,edx
       je        short M00_L06
       movzx     edx,byte ptr [rcx+r11+2]
       cmp       eax,edx
       je        short M00_L07
       cmp       r10d,edx
       je        short M00_L07
       movzx     edx,byte ptr [rcx+r11+3]
       cmp       eax,edx
       je        short M00_L08
       cmp       r10d,edx
       je        short M00_L08
       add       r11,4
M00_L02:
       test      rsi,rsi
       je        short M00_L04
M00_L03:
       movzx     edx,byte ptr [rcx+r11]
       cmp       eax,edx
       je        short M00_L05
       cmp       r10d,edx
       je        short M00_L05
       inc       r11
       dec       rsi
       jne       short M00_L03
M00_L04:
       mov       eax,0FFFFFFFF
       vzeroupper
       pop       rsi
       ret
M00_L05:
       mov       eax,r11d
       jmp       near ptr M00_L20
M00_L06:
       lea       eax,[r11+1]
       jmp       near ptr M00_L20
M00_L07:
       lea       eax,[r11+2]
       jmp       near ptr M00_L20
M00_L08:
       lea       eax,[r11+3]
       jmp       near ptr M00_L20
M00_L09:
       lea       eax,[r11+4]
       jmp       near ptr M00_L20
M00_L10:
       lea       eax,[r11+5]
       jmp       near ptr M00_L20
M00_L11:
       lea       eax,[r11+6]
       jmp       near ptr M00_L20
M00_L12:
       lea       eax,[r11+7]
       jmp       near ptr M00_L20
M00_L13:
       cmp       rsi,10
       jb        short M00_L16
       vmovd     xmm0,edx
       vpbroadcastb ymm0,xmm0
       vmovd     xmm1,r8d
       vpbroadcastb ymm1,xmm1
       add       rsi,0FFFFFFFFFFFFFFF0
       je        short M00_L15
M00_L14:
       vmovupd   ymm2,[rcx+r11]
       vpcmpeqb  ymm3,ymm0,ymm2
       vpcmpeqb  ymm2,ymm1,ymm2
       vpor      ymm2,ymm3,ymm2
       vpmovmskb eax,ymm2
       test      eax,eax
       jne       near ptr M00_L19
       add       r11,20
       cmp       rsi,r11
       ja        short M00_L14
M00_L15:
       vmovupd   ymm2,[rcx+rsi]
       mov       r11,rsi
       vpcmpeqb  ymm0,ymm0,ymm2
       vpcmpeqb  ymm1,ymm1,ymm2
       vpor      ymm0,ymm0,ymm1
       vpmovmskb eax,ymm0
       test      eax,eax
       jne       short M00_L19
       jmp       near ptr M00_L04
M00_L16:
       vmovd     xmm0,edx
       vpbroadcastb xmm0,xmm0
       vmovd     xmm1,r8d
       vpbroadcastb xmm1,xmm1
       test      rsi,rsi
       je        short M00_L18
M00_L17:
       vmovupd   xmm2,[rcx+r11]
       vpcmpeqb  xmm3,xmm0,xmm2
       vpcmpeqb  xmm2,xmm1,xmm2
       vpor      xmm2,xmm3,xmm2
       vpmovmskb eax,xmm2
       test      eax,eax
       jne       short M00_L19
       add       r11,10
       cmp       rsi,r11
       ja        short M00_L17
M00_L18:
       vmovupd   xmm2,[rcx+rsi]
       mov       r11,rsi
       vpcmpeqb  xmm0,xmm0,xmm2
       vpcmpeqb  xmm1,xmm1,xmm2
       vpor      xmm0,xmm0,xmm1
       vpmovmskb eax,xmm0
       test      eax,eax
       je        near ptr M00_L04
M00_L19:
       tzcnt     eax,eax
       add       r11,rax
       jmp       near ptr M00_L05
M00_L20:
       vzeroupper
       pop       rsi
       ret
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       int       3
       sbb       [rcx],eax
       add       [rax],eax
       add       [rax],esp
; Total bytes of code 663

Codegen after:

.NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX2

; System.SpanHelpers.IndexOfAny(Byte ByRef, Byte, Byte, Int32)
       vzeroupper
       xor       r10d,r10d
       cmp       r9d,20
       jl        short M00_L02
       nop       dword ptr [rax]
       mov       eax,r9d
       and       eax,0FFFFFFE0
       movzx     r11d,dl
       vmovd     xmm0,r11d
       vpbroadcastb ymm0,xmm0
       movzx     r11d,r8b
       vmovd     xmm1,r11d
       vpbroadcastb ymm1,xmm1
M00_L00:
       vmovdqu   ymm2,ymmword ptr [rcx+r10]
       vpcmpeqb  ymm3,ymm0,ymm2
       vpcmpeqb  ymm2,ymm1,ymm2
       vpor      ymm2,ymm3,ymm2
       vptest    ymm2,ymm2
       jne       short M00_L01
       add       r10,20
       cmp       r10,rax
       jb        short M00_L00
       jmp       short M00_L05
       nop       dword ptr [rax]
M00_L01:
       vpmovmskb r9d,ymm2
       xor       eax,eax
       tzcnt     eax,r9d
       add       eax,r10d
       jmp       near ptr M00_L09
M00_L02:
       cmp       r9d,10
       jl        short M00_L05
       mov       eax,r9d
       and       eax,0FFFFFFF0
       movzx     r11d,dl
       vmovd     xmm0,r11d
       vpbroadcastb xmm0,xmm0
       movzx     r11d,r8b
       vmovd     xmm1,r11d
       vpbroadcastb xmm1,xmm1
M00_L03:
       vmovdqu   xmm2,xmmword ptr [rcx+r10]
       vpcmpeqb  xmm3,xmm0,xmm2
       vpcmpeqb  xmm2,xmm1,xmm2
       vpor      xmm2,xmm3,xmm2
       vptest    xmm2,xmm2
       jne       short M00_L04
       add       r10,10
       cmp       r10,rax
       jb        short M00_L03
       jmp       short M00_L05
       xchg      ax,ax
M00_L04:
       vpmovmskb edx,xmm2
       xor       eax,eax
       tzcnt     eax,edx
       add       eax,r10d
       jmp       short M00_L09
M00_L05:
       movzx     eax,dl
       movzx     edx,r8b
       mov       r8d,r9d
       cmp       r10,r8
       jae       short M00_L07
M00_L06:
       movzx     r9d,byte ptr [rcx+r10]
       cmp       eax,r9d
       je        short M00_L08
       cmp       edx,r9d
       je        short M00_L08
       inc       r10
       cmp       r10,r8
       jb        short M00_L06
M00_L07:
       mov       eax,0FFFFFFFF
       vzeroupper
       ret
M00_L08:
       mov       eax,r10d
M00_L09:
       vzeroupper
       ret
       add       [rcx],bl
       add       [rax],al
       add       [rax],al
       add       [rax],al
       add       [rax],al
       add       [rax],al
       add       [rax],al
       add       [rax],al
       add       [rax-55C021C],bh
       jg        short M00_L10
M00_L10:
       add       [rbp+48],dl
       (bad)
; Total bytes of code 291

AVX (x64)

BenchmarkDotNet=v0.13.1.1845-nightly, OS=Windows 11 (10.0.22000.795/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]     : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  Job-YLMNVS : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX
  Job-DFRNLR : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX

EnvironmentVariables=COMPlus_EnableAVX2=0 LaunchCount=9
Method Toolchain Size Mean Ratio
IndexOfAnyTwoValues \PR\corerun.exe 1 1.913 ns 0.80
IndexOfAnyTwoValues \baseline\corerun.exe 1 2.399 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 4 3.005 ns 1.17
IndexOfAnyTwoValues \baseline\corerun.exe 4 2.566 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 8 3.457 ns 1.12
IndexOfAnyTwoValues \baseline\corerun.exe 8 3.043 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 32 3.045 ns 0.81
IndexOfAnyTwoValues \baseline\corerun.exe 32 3.749 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 33 3.037 ns 0.72
IndexOfAnyTwoValues \baseline\corerun.exe 33 4.188 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 63 3.010 ns 0.72
IndexOfAnyTwoValues \baseline\corerun.exe 63 4.152 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 64 3.674 ns 0.77
IndexOfAnyTwoValues \baseline\corerun.exe 64 4.787 ns 1.00
IndexOfAnyTwoValues \PR\corerun.exe 512 11.500 ns 0.94
IndexOfAnyTwoValues \baseline\corerun.exe 512 12.203 ns 1.00

AdvSIMD(arm64)

BenchmarkDotNet=v0.13.1.1845-nightly, OS=ubuntu 20.04
Unknown processor
.NET SDK=7.0.100-rc.1.22408.5
  [Host]     : .NET 7.0.0 (7.0.22.40308), Arm64 RyuJIT AdvSIMD
  Job-NZXMHK : .NET 7.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-LQCIXE : .NET 7.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

LaunchCount=9
Method Toolchain Size Mean Ratio
IndexOfAnyTwoValues /PR/corerun 1 6.295 ns 0.98
IndexOfAnyTwoValues /main/corerun 1 6.841 ns 1.00
IndexOfAnyTwoValues /PR/corerun 4 6.846 ns 1.00
IndexOfAnyTwoValues /main/corerun 4 7.454 ns 1.00
IndexOfAnyTwoValues /PR/corerun 8 8.182 ns 1.05
IndexOfAnyTwoValues /main/corerun 8 7.793 ns 1.00
IndexOfAnyTwoValues /PR/corerun 32 6.823 ns 1.03
IndexOfAnyTwoValues /main/corerun 32 6.936 ns 1.00
IndexOfAnyTwoValues /PR/corerun 33 7.015 ns 1.07
IndexOfAnyTwoValues /main/corerun 33 6.920 ns 1.00
IndexOfAnyTwoValues /PR/corerun 63 7.337 ns 1.05
IndexOfAnyTwoValues /main/corerun 63 6.988 ns 1.00
IndexOfAnyTwoValues /PR/corerun 64 11.086 ns 1.00
IndexOfAnyTwoValues /main/corerun 64 12.297 ns 1.00
IndexOfAnyTwoValues /PR/corerun 512 39.362 ns 1.03
IndexOfAnyTwoValues /main/corerun 512 38.550 ns 1.00

@vitek-karas
Copy link
Member

I think I have a tentative fix for the linker bug (with the gotos - I couldn't find any other way to get to that state 😉 ). I'll clean it up and if possible we should try it on your previous change... anyway, thanks a lot for finding it!

@adamsitnik adamsitnik added this to the 7.0.0 milestone Aug 10, 2022
@SwapnilGaikwad
Copy link
Contributor

Hi @adamsitnik, I'm looking at the char equivalent of IndexOfAny. It's half done (blocked by #73777) but can add as a draft PR if you are going to find it useful.

@SwapnilGaikwad
Copy link
Contributor

Hi @adamsitnik, I'm looking at the char equivalent of IndexOfAny. It's half done (blocked by #73777) but can add as a draft PR if you are going to find it useful.

#73788

@adamsitnik
Copy link
Member Author

We don't need it anymore due to #73556

@adamsitnik adamsitnik closed this Aug 17, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants