Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub_17777 test failing on linux-arm64 #64162

Closed
am11 opened this issue Jan 23, 2022 · 12 comments · Fixed by #65153
Closed

GitHub_17777 test failing on linux-arm64 #64162

am11 opened this issue Jan 23, 2022 · 12 comments · Fixed by #65153
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@am11
Copy link
Member

am11 commented Jan 23, 2022

From outerloop (Pri1) test log: https://helix.dot.net/api/2019-06-17/jobs/e865e8bd-955d-4179-af2a-4c8d0686f1cc/workitems/JIT.Regression.JitBlue/console

+ dotnet /root/helix/work/correlation/xunit/xunit.console.dll JIT/Regression/JIT.Regression.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=JIT.Regression.JitBlue
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 6.0.0-rtm.21522.10)
  Discovering: JIT.Regression.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.Regression.XUnitWrapper (found 224 of 1542 test cases)
  Starting:    JIT.Regression.XUnitWrapper (parallel test collections = on, max threads = 4)
    JIT/Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.sh [FAIL]
      
      Assert failure(PID 526 [0x0000020e], Thread: 526 [0x020e]): Assertion failed 'emitCurIG != emitPrologIG' in 'Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int' during 'Generate code' (IL size 2418)
      
          File: /__w/1/s/src/coreclr/jit/emit.cpp Line: 8610
          Image: /root/helix/work/correlation/corerun

This assert fails:

assert(emitCurIG != emitPrologIG);

After that, createdump fails this assert in libunwind:

assert (ip >= di->start_ip && ip < di->end_ip);

      createdump: /__w/1/s/src/coreclr/pal/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:929: int _Uaarch64_dwarf_search_unwind_table(unw_addr_space_t, unw_word_t, unw_dyn_info_t *, unw_proc_info_t *, int, void *): Assertion `ip >= di->start_ip && ip < di->end_ip' failed.
      /root/helix/work/workitem/e/JIT/Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.sh: line 411:   526 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
      
      Return code:      1
      Raw output file:      /root/helix/work/workitem/uploads/Reports/JIT.Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.output.txt
      Raw output:
      BEGIN EXECUTION
      /root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false GitHub_17777.dll ''
      Expected: 100
      Actual: 134
      END EXECUTION - FAILED
      Test Harness Exitcode is : 1
      To run the test:
      > set CORE_ROOT=/root/helix/work/correlation
      > /root/helix/work/workitem/e/JIT/Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.sh
      Expected: True
      Actual:   False
      Stack Trace:
           at JIT_Regression._JitBlue_GitHub_17777_GitHub_17777_GitHub_17777_._JitBlue_GitHub_17777_GitHub_17777_GitHub_17777_sh()
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jan 23, 2022
@ghost
Copy link

ghost commented Jan 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

From outerloop (Pri1) test log: https://helix.dot.net/api/2019-06-17/jobs/e865e8bd-955d-4179-af2a-4c8d0686f1cc/workitems/JIT.Regression.JitBlue/console

+ dotnet /root/helix/work/correlation/xunit/xunit.console.dll JIT/Regression/JIT.Regression.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=JIT.Regression.JitBlue
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 6.0.0-rtm.21522.10)
  Discovering: JIT.Regression.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.Regression.XUnitWrapper (found 224 of 1542 test cases)
  Starting:    JIT.Regression.XUnitWrapper (parallel test collections = on, max threads = 4)
    JIT/Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.sh [FAIL]
      
      Assert failure(PID 526 [0x0000020e], Thread: 526 [0x020e]): Assertion failed 'emitCurIG != emitPrologIG' in 'Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int' during 'Generate code' (IL size 2418)
      
          File: /__w/1/s/src/coreclr/jit/emit.cpp Line: 8610
          Image: /root/helix/work/correlation/corerun

This assert fails:

assert(emitCurIG != emitPrologIG);

After that, createdump fails this assert in libunwind:

assert (ip >= di->start_ip && ip < di->end_ip);

      createdump: /__w/1/s/src/coreclr/pal/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:929: int _Uaarch64_dwarf_search_unwind_table(unw_addr_space_t, unw_word_t, unw_dyn_info_t *, unw_proc_info_t *, int, void *): Assertion `ip >= di->start_ip && ip < di->end_ip' failed.
      /root/helix/work/workitem/e/JIT/Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.sh: line 411:   526 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
      
      Return code:      1
      Raw output file:      /root/helix/work/workitem/uploads/Reports/JIT.Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.output.txt
      Raw output:
      BEGIN EXECUTION
      /root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false GitHub_17777.dll ''
      Expected: 100
      Actual: 134
      END EXECUTION - FAILED
      Test Harness Exitcode is : 1
      To run the test:
      > set CORE_ROOT=/root/helix/work/correlation
      > /root/helix/work/workitem/e/JIT/Regression/JitBlue/GitHub_17777/GitHub_17777/GitHub_17777.sh
      Expected: True
      Actual:   False
      Stack Trace:
           at JIT_Regression._JitBlue_GitHub_17777_GitHub_17777_GitHub_17777_._JitBlue_GitHub_17777_GitHub_17777_GitHub_17777_sh()
Author: am11
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@am11
Copy link
Member Author

am11 commented Jan 23, 2022

@jkotas, @janvorli, if it is not known, should we create a separate arm64 issue for libunwind assert during createdump?

@jkotas
Copy link
Member

jkotas commented Jan 23, 2022

if it is not known, should we create a separate arm64 issue for libunwind assert during createdump?

Yes. Opened #64168

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 25, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Jan 25, 2022
@AndyAyersMS
Copy link
Member

@BruceForstall if you end up bumping up the IG size for this let's make sure it is big enough for OSR too.

@BruceForstall
Copy link
Member

@BruceForstall if you end up bumping up the IG size for this let's make sure it is big enough for OSR too.

@AndyAyersMS Do you have any idea what the maximum number of instructions might be for OSR? Or, put another way, what is the maximum amount bigger (in number of instructions) that an OSR prolog might be compared to a non-OSR prolog (for arm64/x64)?

This issue has only occurred (AFAIK) now on arm64 due when we have very large frames that require loading large local variable constant offsets into a temp register for many initializations.

I'm inclined to just double the arm64 max IG size.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 9, 2022

Looks like OSR is trying to load up 25 enregistered locals, each load takes 3 instructions on arm64:

IN48985:                           movz    w8, #0xeaf0
IN48986:                           movk    w8, #4 LSL #16
IN48987:                           ldr     w10, [fp, x8]

and there are at least 25 other instructions in the prolog before this (and possibly a few after).

So perhaps 120 instructions worth?

If this is unworkable I can probably common some of the address math and squeeze this down a bit.

@AndyAyersMS
Copy link
Member

I think we're ok on x64 for now, latest osr stress only has arm64 failures.

@BruceForstall
Copy link
Member

Looks like OSR is trying to load up 25 enregistered locals

That's for this particular test case? Where does the number "25" come from?

In general, is this number capped (is there a maximum?), or can it be unbounded?

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 9, 2022

It's limited by the number of allocatable registers (I think just int registers as we initialize FP registers outside the OS prolog).

@BruceForstall
Copy link
Member

It's limited by the number of allocatable registers (I think just int registers as we initialize FP registers outside the OS prolog).

Actually, since we care here about the JIT prolog insGroup, it includes everything in the JIT prolog, including zero inits, moving registers to the LSRA assigned locations, etc. -- not just the OS prolog.

@AndyAyersMS
Copy link
Member

Well then it might include initializing all the allocatable FP registers and all the allocatable INT registers.

@AndyAyersMS
Copy link
Member

In case you're wondering -- OSR is "unusual" in that a lot more things can be live on entry than in a normal method. Anything live on entry and enregistered gets loaded up from the Tier0 frame in the OSR prolog.

I suppose we could defer all of this loading but it would complicate things quite a bit.

BruceForstall added a commit to BruceForstall/runtime that referenced this issue Feb 10, 2022
We require that the maximum number of prolog instructions all fit in one
instruction group. Recent changes appear to have increased the number of
instructions we are generating the prolog, leading to NOWAY assert on
Release builds and test failure on linux-arm64.

Bump up the number to avoid this problem, and leave some headroom for
possible additional needs.

Fixes dotnet#64162, dotnet#64793.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 10, 2022
BruceForstall added a commit that referenced this issue Feb 11, 2022
We require that the maximum number of prolog instructions all fit in one
instruction group. Recent changes appear to have increased the number of
instructions we are generating the prolog, leading to NOWAY assert on
Release builds and test failure on linux-arm64.

Bump up the number to avoid this problem, and leave some headroom for
possible additional needs.

Fixes #64162, #64793.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants