Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application crashes in release mode throwing System.ArgumentOutOfRangeException #100758

Closed
daiger14 opened this issue Apr 8, 2024 · 27 comments
Closed
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI needs-author-action An issue or pull request that requires more info or actions from the author. needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration

Comments

@daiger14
Copy link

daiger14 commented Apr 8, 2024

Description

Hello,
I'm trying to update our application to NET 8, but the application started to crash when deployed.
It crashes only in release mode and when we are calculating big projects.
If I Suppress JIT optimization on module load, it works without exception.
image
The project was working starting from NET Core 2.0

We are not able to understand how and where this exception happens.
Suppose I'm checking the index of the array and length, logging via System.Diagnostics.Trace.WriteLine, everything is fine before the exception.
image
In the screenshot, you can check the index of the array before the line of code throws an exception.
The index is 4, the number of elements in the array is 5.
Also, you can see the Trace.WriteLine('MATRIX1') before the exception in the Debug output, but the line is after.

Reproduction Steps

I'm not able to create a project which will reproduce this issue.
The previous versions of NET were working fine. Release, Debug build doesn't matter.
Now(NET 8) it works only in Debug mode or suppressing JIT optimization

Expected behavior

Should not crash the application.
There should be a valid stack trace of the exception.

Actual behavior

Crash with "unreal" System.ArgumentOutOfRangeException
Application crashes throwing System.ArgumentOutOfRangeException.
The exception should show a real call stack.

Regression?

No response

Known Workarounds

Build in debug mode or Suppress JIT optimization on module load

Configuration

NET 8.0.203

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 8, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 8, 2024
@huoyaoyuan huoyaoyuan added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 8, 2024
@AndyAyersMS
Copy link
Member

@daiger14 we will probably need more information from you to figure out what is going wrong.

Which method is getting an error?

If you add [MethodImplOptions(MethodImpl.NoOptimization)] to that method where the problem is happening, does it go away?

If so, can you share the (unmodified) assembly with us?

cc @dotnet/jit-contrib in case this seems familiar to anyone.

@AndyAyersMS AndyAyersMS added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 13, 2024
@AndyAyersMS
Copy link
Member

@daiger14 please help us understand more about this so we can figure out what is going wrong.

@daiger14
Copy link
Author

daiger14 commented Apr 16, 2024

@daiger14 please help us understand more about this so we can figure out what is going wrong.

Hi, we are trying to figure out where the method is.
Everytime when we placing [MethodImplOptions(MethodImpl.NoOptimization)]
It just appears in another file/line of code.
Seems like it's not related to some method.
Once per 10-20 runs of our big project, application works fine without throwing the exception.
Maybe we can do something more to find what is wrong?

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 16, 2024
@AndyAyersMS
Copy link
Member

If the issue can't be pinned down to a specific method, it could mean there is similar code in some number of methods, or there is perhaps some kind of memory corruption happening.

Since you are able to reproduce the problem, can you try running under windbg, and once the exception happens, use the !VerifyHeap command from SOS to check for corruption?

@daiger14
Copy link
Author

I'm not familiar with windbg, but will try tomorrow.
Thanks

@daiger14
Copy link
Author

daiger14 commented Apr 17, 2024

If the issue can't be pinned down to a specific method, it could mean there is similar code in some number of methods, or there is perhaps some kind of memory corruption happening.

Since you are able to reproduce the problem, can you try running under windbg, and once the exception happens, use the !VerifyHeap command from SOS to check for corruption?

Hi, here is the result of !VerifyHeap command (no heap corruption detected):
image
Adding comments from my colleague who is a developer of the application core:
"Observations, conclusions:

  1. the exception thrown is: index out of range
  2. the exception is moving when adding logging info, or trying to add checks
  3. when we succeed to log directly before exception is thrown all relevant values are OK, index, array size, e.g. 0<=index<size
  4. the same code is passed many times before exception is thrown
  5. no exception if runtime optimization is off
  6. the problem happens only with huge data
  7. IMHO there is some problem when runtime optimization collects the data, something gets wrong (like counter overflow, etc.), which results in bad optimization"

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone May 3, 2024
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 3, 2024
@AndyAyersMS
Copy link
Member

@daiger14 thanks for checking with windbg. Since this seems to happen almost every run, is there any chance we could set up some kind of collaborative debugging session?

@dotnet/jit-contrib any other thoughts on how we could figure out what is going wrong?

@EgorBo
Copy link
Member

EgorBo commented May 3, 2024

@dotnet/jit-contrib any other thoughts on how we could figure out what is going wrong?

We've fixed two similar bug reports (#96839 and #100809) in .net 9.0 and backported to 8.0 (not yet available). Might be related (we can wait for the fix to propagate to 8.0 and check again)

@AndyAyersMS
Copy link
Member

Thanks @EgorBo -- I thought this issue sounded familiar but couldn't find those issues. Do you know if both of those fixes will be in 8.0.5?

@EgorBo
Copy link
Member

EgorBo commented May 3, 2024

Thanks @EgorBo -- I thought this issue sounded familiar but couldn't find those issues. Do you know if both of those fixes will be in 8.0.5?

Sadly, only in 8.0.6. I think it should be available in the next (if not in current) preview of 9.0 too if that is an option to try

@daiger14
Copy link
Author

daiger14 commented May 6, 2024

@daiger14 thanks for checking with windbg. Since this seems to happen almost every run, is there any chance we could set up some kind of collaborative debugging session?

@dotnet/jit-contrib any other thoughts on how we could figure out what is going wrong?

Hi @AndyAyersMS, about set up a collaborative session, sure we can, just to agree about the time ;)
HI @EgorBo, can I somehow check this commits locally.
Thank you guys!

@AndyAyersMS
Copy link
Member

@EgorBo do you know for sure if the fixes will be in 9.0 Preview 4? Seems likely.

It should be available later this month. @daiger14 simplest thing might be for you to download this once it's available and try... you don't need to rebuild anything, just run it.

Alternatively, you can build a version of the 8.0.6 JIT yourself, or I can build one and make it available to you, and I can tell you how to patch it into an existing 8.0 installation for testing, but I understand if you'd rather not.

@AndyAyersMS AndyAyersMS added the Priority:2 Work that is important, but not critical for the release label May 8, 2024
@EgorBo
Copy link
Member

EgorBo commented May 8, 2024

@EgorBo do you know for sure if the fixes will be in 9.0 Preview 4? Seems likely.

I've just checked - yes, it will be in Preview4

@daiger14
Copy link
Author

daiger14 commented May 9, 2024

@AndyAyersMS, @EgorBo Thank you guys, I will wait for preview 4 and will write the results of the tests.

@daiger14
Copy link
Author

Hi @AndyAyersMS @EgorBo, I checked with version 9.0 preview 4, and the issue is still present and replicable with our project :(

@EgorBo
Copy link
Member

EgorBo commented May 29, 2024

Hi @AndyAyersMS @EgorBo, I checked with version 9.0 preview 4, and the issue is still present and replicable with our project :(

That is sad to hear. Unfortunately, it's unlikely we can diagnose this issue without a repro (and presumably a memory dump won't help much here since it sounds like a silent codegen bug).

Can you check if disabling TieredCompilation (or/and TieredPGO) helps to reproduce it more reliably? (it's <TieredCompilation>false</TieredCompilation> or <TieredPGO>false</TieredPGO> properties).

@daiger14
Copy link
Author

@EgorBo Nothing changed turning off TieredCompilation.
@AndyAyersMS I found a function on which setting the [MethodImpl(MethodImplOptions.NoOptimization)] avoids the exception.
I can share it with you guys, please tell me how I can do this privately.
It is okay to use [MethodImpl(MethodImplOptions.NoOptimization)] in production env?
Thank you!

@AndyAyersMS
Copy link
Member

Yes, you can use [MethodImpl(MethodImplOptions.NoOptimization)] in production if necessary.

To share your code example privately, it is best to open a parallel issue on .NET Community site: https://developercommunity.microsoft.com/dotnet

Once that is set up you can add private attachments.

@AndyAyersMS
Copy link
Member

@AndyAyersMS
Copy link
Member

@daiger14 can we set up an interactive debug session for this?

@AndyAyersMS AndyAyersMS added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 27, 2024
@AndyAyersMS
Copy link
Member

@daiger14 we're still very interested in trying to resolve this.

@daiger14
Copy link
Author

daiger14 commented Aug 8, 2024

@daiger14 we're still very interested in trying to resolve this.

Hi @AndyAyersMS, @EgorBo
I'm no longer working on this project, but I sent the link and all the information to the principal developer of this application.
I appreciate your support.

@dotnet-policy-service dotnet-policy-service bot added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Aug 8, 2024
@AndyAyersMS AndyAyersMS added the needs-author-action An issue or pull request that requires more info or actions from the author. label Aug 8, 2024
Copy link
Contributor

This issue has been marked needs-author-action and may be missing some important information.

@AndyAyersMS
Copy link
Member

@daiger14 we're still very interested in trying to resolve this.

Hi @AndyAyersMS, @EgorBo I'm no longer working on this project, but I sent the link and all the information to the principal developer of this application. I appreciate your support.

Thank you for following up.

@AndyAyersMS
Copy link
Member

Still not actionable, so moving to future.

@AndyAyersMS AndyAyersMS removed the Priority:2 Work that is important, but not critical for the release label Aug 14, 2024
@AndyAyersMS AndyAyersMS modified the milestones: 9.0.0, Future Aug 14, 2024
Copy link
Contributor

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

Copy link
Contributor

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@dotnet-policy-service dotnet-policy-service bot removed this from the Future milestone Sep 12, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI needs-author-action An issue or pull request that requires more info or actions from the author. needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

5 participants