Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project segfaults on Ubuntu ( ARM64 ) #76051

Closed
budgetdevv opened this issue Sep 23, 2022 · 8 comments · Fixed by #76061
Closed

Project segfaults on Ubuntu ( ARM64 ) #76051

budgetdevv opened this issue Sep 23, 2022 · 8 comments · Fixed by #76061
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI regression-from-last-release
Milestone

Comments

@budgetdevv
Copy link

Description

The project @ https://github.com/budgetdevv/JITBugRepro segfaults on my Ubuntu ( ARM64 ) machine when ran. However, it seems to run fine on my Windows x64 machine.

Reproduction Steps

  1. Clone the project @ https://github.com/budgetdevv/JITBugRepro
  2. Run Program.cs located in the BugRepro folder

Expected behavior

The program should run indefinitely, given that there is an await Task.Delay(Timeout.Infinite)

Actual behavior

The program segfaults. Interestingly, Services.GetRequiredService<InvitesModule>().OnReady() is ran to completion ( The program prints "Bleh" ) before segfaulting. When debugging with GDB, the program exited with Thread 7 ".NET Tiered Com" received signal SIGSEGV, Segmentation fault..

Both of which suggest to me that it is a JIT bug.

Regression?

No response

Known Workarounds

Use Windows :p

Configuration

Which version of .NET is the code running on?

  • .NET 7 RC 1

What OS and version, and what distro if applicable?

  • Ubuntu 20.04.3 LTS (GNU/Linux 5.15.0-1017-oracle aarch64)

What is the architecture (x64, x86, ARM, ARM64)?

  • ARM64

Do you know whether it is specific to that configuration

  • Seems to be the case, it works on my Windows x64 machine

Other information

Suspects

  • Executing project using NonBlocking.ConcurrentDictionary bundled in referenced project

  • Services.GetRequiredService<MemberDB>() calls into ctor, which calls ConnectionPool.CreateOrGetConnection("") which reads from a static readonly NonBlocking.ConcurrentDictionary field

  • Services.GetRequiredService<InvitesModule>().OnReady() contains a foreach loop that writes into a NonBlocking.ConcurrentDictionary. The write happens after an await, which might also be responsible for the weird behavior

  • This bug seems to be caused by tiered compilation ( GDB mentioned TC and threadpool thread )

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 23, 2022
@EgorBo EgorBo added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 23, 2022
@ghost
Copy link

ghost commented Sep 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

The project @ https://github.com/budgetdevv/JITBugRepro segfaults on my Ubuntu ( ARM64 ) machine when ran. However, it seems to run fine on my Windows x64 machine.

Reproduction Steps

  1. Clone the project @ https://github.com/budgetdevv/JITBugRepro
  2. Run Program.cs located in the BugRepro folder

Expected behavior

The program should run indefinitely, given that there is an await Task.Delay(Timeout.Infinite)

Actual behavior

The program segfaults. Interestingly, Services.GetRequiredService<InvitesModule>().OnReady() is ran to completion ( The program prints "Bleh" ) before segfaulting. When debugging with GDB, the program exited with Thread 7 ".NET Tiered Com" received signal SIGSEGV, Segmentation fault..

Both of which suggest to me that it is a JIT bug.

Regression?

No response

Known Workarounds

Use Windows :p

Configuration

Which version of .NET is the code running on?

  • .NET 7 RC 1

What OS and version, and what distro if applicable?

  • Ubuntu 20.04.3 LTS (GNU/Linux 5.15.0-1017-oracle aarch64)

What is the architecture (x64, x86, ARM, ARM64)?

  • ARM64

Do you know whether it is specific to that configuration

  • Seems to be the case, it works on my Windows x64 machine

Other information

Suspects

  • Executing project using NonBlocking.ConcurrentDictionary bundled in referenced project

  • Services.GetRequiredService<MemberDB>() calls into ctor, which calls ConnectionPool.CreateOrGetConnection("") which reads from a static readonly NonBlocking.ConcurrentDictionary field

  • Services.GetRequiredService<InvitesModule>().OnReady() contains a foreach loop that writes into a NonBlocking.ConcurrentDictionary. The write happens after an await, which might also be responsible for the weird behavior

  • This bug seems to be caused by tiered compilation ( GDB mentioned TC and threadpool thread )

Author: budgetdevv
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added this to the 7.0.0 milestone Sep 23, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 23, 2022
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Sep 23, 2022

I can repro with x64 win, WSL2, arm64 altjit (repros with main).

We are morphing a complex tree in NonBlocking.Counter32:Increment():this

fgMorphIndexAddr (before remorph):
               [000135] -A-X-------                         *  COMMA     byref
               [000122] -A-X-------                         +--*  ASG       int
               [000121] D------N---                         |  +--*  LCL_VAR   int    V12 tmp9
               [000079] ---X-------                         |  \--*  CAST      int <- long
               [000078] ---X-------                         |     \--*  UMOD      long
               [000072] -----------                         |        +--*  LCL_VAR_ADDR long   V09 tmp6
               [000077] ---------U-                         |        \--*  CAST      long <- ulong <- uint
               [000076] -----------                         |           \--*  LCL_VAR   int   (AX) V09 tmp6

               [000134] ---X-------                         \--*  COMMA     byref
               [000126] ---X-------                            +--*  BOUNDS_CHECK_Rng void
               [000123] -----------                            |  +--*  LCL_VAR   int    V12 tmp9
               [000125] ---X-------                            |  \--*  ARR_LENGTH int
               [000042] -----------                            |     \--*  LCL_VAR   ref    V07 tmp4
               [000133] -----------                            \--*  ARR_ADDR  byref ref[]
               [000132] -----------                               \--*  ADD       byref
               [000131] -----------                                  +--*  ADD       byref
               [000120] -----------                                  |  +--*  LCL_VAR   ref    V07 tmp4
               [000130] -----------                                  |  \--*  CNS_INT   long   16
               [000129] -----------                                  \--*  MUL       long
               [000127] ---------U-                                     +--*  CAST      long <- uint
               [000124] -----------                                     |  \--*  LCL_VAR   int    V12 tmp9

               [000128] -------N---                                     \--*  CNS_INT   long   8

Morphing MOD/UMOD [000078] to Sub/Mul/Div

It looks like in fgMorphModToSubMulDiv we assume gtClone will always succeed, which is not the case.

@EgorBo
Copy link
Member

EgorBo commented Sep 23, 2022

Minimal repro:

unsafe int GetIndex(uint cellCount) => (int)((ulong)&cellCount % cellCount);

with arm64 jit (or altjit)

@AndyAyersMS
Copy link
Member

Seems like this is a result of #69770, before then we were using fgMakeMultiUse which was invoking gtCloneExpr for simple things and introducing commas for more complex ones.

@jakobbotsch
Copy link
Member

I don't think this was #69770, if I read the diff right fgMakeMultiUse was also using gtClone before that.

I think this was exposed by #68484. Before that the IsInvariant check was returning false for GT_LCL_VAR_ADDR so we would introduce a temp in this case.

It is odd that gtClone does not handle GT_LCL_VAR_ADDR, yet does handle GT_LCL_FLD_ADDR.

@jakobbotsch jakobbotsch self-assigned this Sep 23, 2022
@jakobbotsch
Copy link
Member

Hmm, actually this reproduces on the parent of #68484 too. From a bisection it seems to actually be introduced with #65118.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Sep 23, 2022
… trees

We may get here for any invariant dividend/divisor but these can be
'complex' address-of trees that gtClone does not handle.

Fix dotnet#76051
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 23, 2022
@AndyAyersMS
Copy link
Member

From a bisection it seems to actually be introduced with #65118.

There goes my plan for trying to pin this on somebody else.

jakobbotsch added a commit that referenced this issue Sep 23, 2022
… trees (#76061)

We may get here for any invariant dividend/divisor but these can be
'complex' address-of trees that gtClone does not handle.

Fix #76051
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 23, 2022
github-actions bot pushed a commit that referenced this issue Sep 26, 2022
… trees

We may get here for any invariant dividend/divisor but these can be
'complex' address-of trees that gtClone does not handle.

Fix #76051
carlossanlop pushed a commit that referenced this issue Sep 28, 2022
…ress-of expressions (#76171)

* Add a test

* JIT: Use gtCloneExpr in fgMorphModToSubMulDiv for potentially complex trees

We may get here for any invariant dividend/divisor but these can be
'complex' address-of trees that gtClone does not handle.

Fix #76051

* Fix test build

Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com>
@ghost ghost locked as resolved and limited conversation to collaborators Oct 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI regression-from-last-release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants