Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

superpmicollect access violation in pgo Bytemark #76991

Closed
markples opened this issue Oct 13, 2022 · 10 comments · Fixed by #77147
Closed

superpmicollect access violation in pgo Bytemark #76991

markples opened this issue Oct 13, 2022 · 10 comments · Fixed by #77147
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@markples
Copy link
Member

win-arm64 superpmicollect pgo
(may be broader than pgo - full superpmi collection runs have been failing at times on arm64)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=49331&view=logs&j=63f2a02a-7891-5505-e9da-45b13105e16b
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-bb1a6462f51c458492/JIT.1/1/console.516d02e2.log?helixlogtype=result

set COMPlus_TieredCompilation=1
set COMPlus_ReadyToRun=0
set COMPlus_TC_QuickJitForLoops=1
set COMPlus_TieredPGO=1
set COMPlus_JitClassProfiling=0
set COMPlus_JitDelegateProfiling=1
set COMPlus_JitVTableProfiling=1
set COMPlus_JitRandomGuardedDevirtualization=1
set COMPlus_JitRandomlyCollect64BitCounts=1

D:\h\w\A77F0919\w\BEF10A17\e>set PATH=D:\h\w\A77F0919\p\dotnet-cli;C:\python3.7.0\lib\site-packages\pywin32_system32;C:\python3.7.0\lib\site-packages\pywin32_system32;C:\python3.7.0\Scripts\;C:\python3.7.0\;D:\Windows\system32;D:\Windows;D:\Windows\System32\Wbem;D:\Windows\System32\WindowsPowerShell\v1.0\;D:\Windows\System32\OpenSSH\;D:\Users\runner\AppData\Local\Microsoft\WindowsApps 

D:\h\w\A77F0919\w\BEF10A17\e>set DOTNET_ROOT=D:\h\w\A77F0919\p\dotnet-cli 

D:\h\w\A77F0919\w\BEF10A17\e>set DOTNET_CLI_TELEMETRY_OPTOUT=1 

D:\h\w\A77F0919\w\BEF10A17\e>set DOTNET_CLI_HOME=D:\h\w\A77F0919\w\BEF10A17\e\.dotnet 

D:\h\w\A77F0919\w\BEF10A17\e>set NUGET_PACKAGES=D:\h\w\A77F0919\w\BEF10A17\e\.nuget 

D:\h\w\A77F0919\w\BEF10A17\e>dotnet D:\h\w\A77F0919\p\xunit\xunit.console.dll JIT\BBT\JIT.BBT.XUnitWrapper.dll JIT\CheckProjects\JIT.CheckProjects.XUnitWrapper.dll JIT\opt\JIT.opt.XUnitWrapper.dll JIT\Performance\JIT.Performance.XUnitWrapper.dll JIT\RyuJIT\JIT.RyuJIT.XUnitWrapper.dll JIT\superpmi\JIT.superpmi.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml 
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 7.0.0-rc.1.22426.10)
  Discovering: JIT.BBT.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.BBT.XUnitWrapper (found 1 test case)
  Starting:    JIT.BBT.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.BBT.XUnitWrapper
  Discovering: JIT.CheckProjects.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.CheckProjects.XUnitWrapper (found 1 test case)
  Starting:    JIT.CheckProjects.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.CheckProjects.XUnitWrapper
  Discovering: JIT.opt.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.opt.XUnitWrapper (found 247 test cases)
  Starting:    JIT.opt.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.opt.XUnitWrapper
  Discovering: JIT.Performance.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.Performance.XUnitWrapper (found 101 test cases)
  Starting:    JIT.Performance.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.Performance.XUnitWrapper
  Discovering: JIT.RyuJIT.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.RyuJIT.XUnitWrapper (found 1 test case)
  Starting:    JIT.RyuJIT.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.RyuJIT.XUnitWrapper
  Discovering: JIT.superpmi.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.superpmi.XUnitWrapper (found 3 test cases)
  Starting:    JIT.superpmi.XUnitWrapper (parallel test collections = on, max threads = 8)
    JIT\superpmi\superpmicollect\superpmicollect.cmd [FAIL]
      
      Assert failure(PID 3040 [0x00000be0], Thread: 18220 [0x472c]): !"Access violation while Jitting!"
      
      CORECLR! EEFilterException + 0x60 (0x00007ff8`c26d6bb8)
      CORECLR! `CEEInfo::runWithErrorTrap'::`1'::filt$0 + 0x60 (0x00007ff8`c2be3740)
      CORECLR! _C_ExecuteExceptionFilter + 0x38 (0x00007ff8`c2632d88)
      CORECLR! _C_specific_handler + 0x214 (0x00007ff8`c2b40584)
      NTDLL! chkstk + 0x1CC (0x00007ff9`06e2387c)
      NTDLL! LdrControlFlowGuardEnforced + 0x8D4 (0x00007ff9`06e85ff4)
      NTDLL! KiUserExceptionDispatcher + 0x24 (0x00007ff9`06e23434)
      SUPERPMI-SHIM-COLLECTOR! <no symbol> + 0x0 (0x00007ff8`e3a71660)
      SUPERPMI-SHIM-COLLECTOR! jitStartup + 0x205C (0x00007ff8`e3a7b20c)
      SUPERPMI-SHIM-COLLECTOR! jitStartup + 0x1695C (0x00007ff8`e3a8fb0c)
          File: D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp Line: 10346
          Image: D:\h\w\A77F0919\p\corerun.exe
      
      ERROR: Test D:\h\w\A77F0919\w\BEF10A17\e\JIT\superpmi\superpmicollect\Bytemark\Bytemark.cmd failed
      
      Return code:      1
      Raw output file:      D:\h\w\A77F0919\w\BEF10A17\uploads\Reports\JIT.superpmi\superpmicollect\superpmicollect.output.txt
      Raw output:
      BEGIN EXECUTION
       "D:\h\w\A77F0919\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  superpmicollect.dll 
      SuperPMI collection and playback - BEGIN
      Setting environment variables:
          SuperPMIShimLogPath=D:\h\w\A77F0919\t\ifdy230s.3lcSPMI
          SuperPMIShimPath=D:\h\w\A77F0919\p\clrjit.dll
          COMPlus_JitName=superpmi-shim-collector.dll
      Running: D:\Windows\system32\cmd.exe /c D:\h\w\A77F0919\w\BEF10A17\e\JIT\superpmi\superpmicollect\Bytemark\Bytemark.cmd
      BEGIN EXECUTION
       "D:\h\w\A77F0919\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  Bytemark.dll 
      BBBBBB   YYY   Y  TTTTTTT  EEEEEEE
      BBB   B  YYY   Y    TTT    EEE
      BBB   B  YYY   Y    TTT    EEE
      BBBBBB    YYY Y     TTT    EEEEEEE
      BBB   B    YYY      TTT    EEE
      BBB   B    YYY      TTT    EEE
      BBBBBB     YYY      TTT    EEEEEEE
      
      BYTEmark (tm) C# Mode Benchmark ver. 2 (06/99)
      NUMERIC SORT(jagged):  Iterations/sec: 1003.96474  Index: 25.74731
      NUMERIC SORT(rectangle):Expected: 100
      Actual: -1073740286
      END EXECUTION - FAILED
      FAILED
      SuperPMI collection and playback - FAILED
      Expected: 100
      Actual: 101
      END EXECUTION - FAILED
      FAILED
      Test Harness Exitcode is : 1
      To run the test:
      > set CORE_ROOT=D:\h\w\A77F0919\p
      > D:\h\w\A77F0919\w\BEF10A17\e\JIT\superpmi\superpmicollect\superpmicollect.cmd

@dotnet/jit-contrib

@markples markples added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs labels Oct 13, 2022
@markples markples added this to the 8.0.0 milestone Oct 13, 2022
@ghost
Copy link

ghost commented Oct 13, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

win-arm64 superpmicollect pgo
(may be broader than pgo - full superpmi collection runs have been failing at times on arm64)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=49331&view=logs&j=63f2a02a-7891-5505-e9da-45b13105e16b
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-bb1a6462f51c458492/JIT.1/1/console.516d02e2.log?helixlogtype=result

set COMPlus_TieredCompilation=1
set COMPlus_ReadyToRun=0
set COMPlus_TC_QuickJitForLoops=1
set COMPlus_TieredPGO=1
set COMPlus_JitClassProfiling=0
set COMPlus_JitDelegateProfiling=1
set COMPlus_JitVTableProfiling=1
set COMPlus_JitRandomGuardedDevirtualization=1
set COMPlus_JitRandomlyCollect64BitCounts=1

D:\h\w\A77F0919\w\BEF10A17\e>set PATH=D:\h\w\A77F0919\p\dotnet-cli;C:\python3.7.0\lib\site-packages\pywin32_system32;C:\python3.7.0\lib\site-packages\pywin32_system32;C:\python3.7.0\Scripts\;C:\python3.7.0\;D:\Windows\system32;D:\Windows;D:\Windows\System32\Wbem;D:\Windows\System32\WindowsPowerShell\v1.0\;D:\Windows\System32\OpenSSH\;D:\Users\runner\AppData\Local\Microsoft\WindowsApps 

D:\h\w\A77F0919\w\BEF10A17\e>set DOTNET_ROOT=D:\h\w\A77F0919\p\dotnet-cli 

D:\h\w\A77F0919\w\BEF10A17\e>set DOTNET_CLI_TELEMETRY_OPTOUT=1 

D:\h\w\A77F0919\w\BEF10A17\e>set DOTNET_CLI_HOME=D:\h\w\A77F0919\w\BEF10A17\e\.dotnet 

D:\h\w\A77F0919\w\BEF10A17\e>set NUGET_PACKAGES=D:\h\w\A77F0919\w\BEF10A17\e\.nuget 

D:\h\w\A77F0919\w\BEF10A17\e>dotnet D:\h\w\A77F0919\p\xunit\xunit.console.dll JIT\BBT\JIT.BBT.XUnitWrapper.dll JIT\CheckProjects\JIT.CheckProjects.XUnitWrapper.dll JIT\opt\JIT.opt.XUnitWrapper.dll JIT\Performance\JIT.Performance.XUnitWrapper.dll JIT\RyuJIT\JIT.RyuJIT.XUnitWrapper.dll JIT\superpmi\JIT.superpmi.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml 
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 7.0.0-rc.1.22426.10)
  Discovering: JIT.BBT.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.BBT.XUnitWrapper (found 1 test case)
  Starting:    JIT.BBT.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.BBT.XUnitWrapper
  Discovering: JIT.CheckProjects.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.CheckProjects.XUnitWrapper (found 1 test case)
  Starting:    JIT.CheckProjects.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.CheckProjects.XUnitWrapper
  Discovering: JIT.opt.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.opt.XUnitWrapper (found 247 test cases)
  Starting:    JIT.opt.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.opt.XUnitWrapper
  Discovering: JIT.Performance.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.Performance.XUnitWrapper (found 101 test cases)
  Starting:    JIT.Performance.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.Performance.XUnitWrapper
  Discovering: JIT.RyuJIT.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.RyuJIT.XUnitWrapper (found 1 test case)
  Starting:    JIT.RyuJIT.XUnitWrapper (parallel test collections = on, max threads = 8)
  Finished:    JIT.RyuJIT.XUnitWrapper
  Discovering: JIT.superpmi.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.superpmi.XUnitWrapper (found 3 test cases)
  Starting:    JIT.superpmi.XUnitWrapper (parallel test collections = on, max threads = 8)
    JIT\superpmi\superpmicollect\superpmicollect.cmd [FAIL]
      
      Assert failure(PID 3040 [0x00000be0], Thread: 18220 [0x472c]): !"Access violation while Jitting!"
      
      CORECLR! EEFilterException + 0x60 (0x00007ff8`c26d6bb8)
      CORECLR! `CEEInfo::runWithErrorTrap'::`1'::filt$0 + 0x60 (0x00007ff8`c2be3740)
      CORECLR! _C_ExecuteExceptionFilter + 0x38 (0x00007ff8`c2632d88)
      CORECLR! _C_specific_handler + 0x214 (0x00007ff8`c2b40584)
      NTDLL! chkstk + 0x1CC (0x00007ff9`06e2387c)
      NTDLL! LdrControlFlowGuardEnforced + 0x8D4 (0x00007ff9`06e85ff4)
      NTDLL! KiUserExceptionDispatcher + 0x24 (0x00007ff9`06e23434)
      SUPERPMI-SHIM-COLLECTOR! <no symbol> + 0x0 (0x00007ff8`e3a71660)
      SUPERPMI-SHIM-COLLECTOR! jitStartup + 0x205C (0x00007ff8`e3a7b20c)
      SUPERPMI-SHIM-COLLECTOR! jitStartup + 0x1695C (0x00007ff8`e3a8fb0c)
          File: D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp Line: 10346
          Image: D:\h\w\A77F0919\p\corerun.exe
      
      ERROR: Test D:\h\w\A77F0919\w\BEF10A17\e\JIT\superpmi\superpmicollect\Bytemark\Bytemark.cmd failed
      
      Return code:      1
      Raw output file:      D:\h\w\A77F0919\w\BEF10A17\uploads\Reports\JIT.superpmi\superpmicollect\superpmicollect.output.txt
      Raw output:
      BEGIN EXECUTION
       "D:\h\w\A77F0919\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  superpmicollect.dll 
      SuperPMI collection and playback - BEGIN
      Setting environment variables:
          SuperPMIShimLogPath=D:\h\w\A77F0919\t\ifdy230s.3lcSPMI
          SuperPMIShimPath=D:\h\w\A77F0919\p\clrjit.dll
          COMPlus_JitName=superpmi-shim-collector.dll
      Running: D:\Windows\system32\cmd.exe /c D:\h\w\A77F0919\w\BEF10A17\e\JIT\superpmi\superpmicollect\Bytemark\Bytemark.cmd
      BEGIN EXECUTION
       "D:\h\w\A77F0919\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  Bytemark.dll 
      BBBBBB   YYY   Y  TTTTTTT  EEEEEEE
      BBB   B  YYY   Y    TTT    EEE
      BBB   B  YYY   Y    TTT    EEE
      BBBBBB    YYY Y     TTT    EEEEEEE
      BBB   B    YYY      TTT    EEE
      BBB   B    YYY      TTT    EEE
      BBBBBB     YYY      TTT    EEEEEEE
      
      BYTEmark (tm) C# Mode Benchmark ver. 2 (06/99)
      NUMERIC SORT(jagged):  Iterations/sec: 1003.96474  Index: 25.74731
      NUMERIC SORT(rectangle):Expected: 100
      Actual: -1073740286
      END EXECUTION - FAILED
      FAILED
      SuperPMI collection and playback - FAILED
      Expected: 100
      Actual: 101
      END EXECUTION - FAILED
      FAILED
      Test Harness Exitcode is : 1
      To run the test:
      > set CORE_ROOT=D:\h\w\A77F0919\p
      > D:\h\w\A77F0919\w\BEF10A17\e\JIT\superpmi\superpmicollect\superpmicollect.cmd

@dotnet/jit-contrib

Author: markples
Assignees: -
Labels:

area-CodeGen-coreclr, blocking-clean-ci-optional

Milestone: 8.0.0

@BruceForstall
Copy link
Member

Unfortunately no symbols there for SUPERPMI-SHIM-COLLECTOR :-(

@BruceForstall
Copy link
Member

BruceForstall commented Oct 13, 2022

When I run spmi crossgen2 collection on win-arm64 over core_root, I get:

image

for 2 assemblies: (1) System.Management.dll, (2) System.Net.Quic.dll

[This easily could be a separate issue]

@markples
Copy link
Member Author

If I'm reading the current run logs correctly, those two didn't fail. (could be separate, could seem to be nondeterministic, could be my misreading, etc.)

@AndyAyersMS
Copy link
Member

I'll investigate.

@AndyAyersMS AndyAyersMS self-assigned this Oct 17, 2022
@AndyAyersMS
Copy link
Member

Not able to repro this so far. Has it happened repeatedly in CI?

@AndyAyersMS
Copy link
Member

Ah, I see 6 failures in the past two weeks, including two today. Will keep trying to repro.

@AndyAyersMS
Copy link
Member

Still no luck with local repro (~50 retries). One of today's failures has a dump file, so will take a look at that.

@AndyAyersMS
Copy link
Member

As Mark noted, no there are symbols for the collector shim, but from the dump, we can see the entry point was getPgoInstrumentationResults and (forcing in locally built symbols) we seem to die within recGetPgoInstrumentationResults during a memcpy.

0c 0000002a`dfdfad10 00007ffb`e5691660     ntdll!KiUserExceptionDispatcher+0x24 [minkernel\ntos\rtl\arm64\trampoln.asm @ 541] 
0d 0000002a`dfdfb160 00007ffb`e569b20c     superpmi_shim_collector!__memcpy_forward_large_neon+0xc0 [D:\a\_work\1\s\Intermediate\crt\vcruntime\build\base\xmt\libvcruntime_kernel32\libvcruntime_kernel32.nativeproj\objd\arm64\memcpy.i @ 4191] 
0e 0000002a`dfdfb160 00007ffb`e56afb04     superpmi_shim_collector!LightWeightMapBuffer::AddBuffer+0xfc [C:\repos\runtime0\src\coreclr\tools\superpmi\superpmi-shared\lightweightmap.h @ 47] 
0f 0000002a`dfdfb1a0 00007ffb`e5696d8c     superpmi_shim_collector!MethodContext::recGetPgoInstrumentationResults+0x14c [C:\repos\runtime0\src\coreclr\tools\superpmi\superpmi-shared\methodcontext.cpp @ 5703] 
10 0000002a`dfdfb250 00007ffb`c0f8f2b0     superpmi_shim_collector!interceptor_ICJI::getPgoInstrumentationResults+0xbc
11 0000002a`dfdfb2a0 00007ffb`c0f8cc78     clrjit!Compiler::compInitOptions+0x718 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 2706] 
12 0000002a`dfdfb480 00007ffb`c0f8c2ac     clrjit!Compiler::compCompileHelper+0x110 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 6265]

Not sure why this happens though.

@AndyAyersMS
Copy link
Member

Actually, it looks like we're trying to read too much from *pInstrumentationData. In the dump, this address is

0x0000024ecaa73fa0 and maxOffset is 0x64, so we will try to read from 0x0000024ecaa7400 which is not mapped.

The last schema entry is

    [23]             [Type: ICorJitInfo::PgoInstrumentationSchema]
        [+0x000] Offset           : 0x5c [Type: unsigned __int64]
        [+0x008] InstrumentationKind : BasicBlockIntCount (65) [Type: ICorJitInfo::PgoInstrumentationKind]
        [+0x00c] ILOffset         : 262 [Type: int]
        [+0x010] Count            : 1 [Type: int]
        [+0x014] Other            : 0 [Type: int]

and so the data only extends to (Offset _ sizeof(BasicBlockIntCount)) --> (0x5c + 0x4) --> 0x60

So the computation of maxOffset is wrong because it's not taking the data size into account.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Oct 18, 2022
When recording the profile data into the method context, SPMI was
assuming all data items were `sizeof(uintptr_t)` which is not guaranteed.
Use the proper size.

Fixes dotnet#76991.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 18, 2022
AndyAyersMS added a commit that referenced this issue Oct 18, 2022
When recording the profile data into the method context, SPMI was
assuming all data items were `sizeof(uintptr_t)` which is not guaranteed.
Use the proper size.

Fixes #76991.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 18, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants