Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

superpmi tool crashing with "Heap contamination detected!" #101708

Closed
jkotas opened this issue Apr 30, 2024 · 11 comments · Fixed by #101826
Closed

superpmi tool crashing with "Heap contamination detected!" #101708

jkotas opened this issue Apr 30, 2024 · 11 comments · Fixed by #101826
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Apr 30, 2024

This assert happened in runtime-coreclr superpmi-collect pipeline

[18:03:51] Invoking: /tmp/helix/working/AF7F0923/p/coreclr/superpmi -p -f /tmp/helix/working/AF7F0923/w/B2C509AA/u/spmi_collect/basefail.mcl /tmp/helix/working/AF7F0923/w/B2C509AA/u/spmi_collect/base.mch /tmp/helix/working/AF7F0923/p/coreclr/libclrjit.dylib
[18:04:10] 
[18:04:10] Assert failure(PID 25678 [0x0000644e], Thread: 278306 [0x43f22]): !"Heap contamination detected! HeapFree was called on a heap other than the one that memory was allocated from.\n" "Possible cause: you used new (executable) to allocate the memory, but didn't use DeleteExecutable() to free it."
[18:04:10]     File: /Users/runner/work/1/s/src/coreclr/utilcode/clrhost_nodependencies.cpp:290
[18:04:10]     Image: /private/tmp/helix/working/AF7F0923/p/coreclr/superpmi
@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 30, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 30, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@kunalspathak
Copy link
Member

@jkotas - can you please point to the run that failed with this error? I was going through some of the recent runs on main and these are the collections failing but in the superpmi logs i do not see the errors reported in this issue.

image
image

Are they happening in different runs?

@jkotas
Copy link
Member Author

jkotas commented May 1, 2024

This was originally reported by @JulieLeeMSFT at #55517 (comment) . I just moved it to a new fresh issue to avoid confusion. @JulieLeeMSFT Could you please point @kunalspathak to the runs where this is happening?

(I am not able to access any superpmi logs from last week currently to point you to the exact run.)

@kunalspathak
Copy link
Member

Ok, I found it now. They were under the libraries_tests.run collection for osx/arm64

@kunalspathak
Copy link
Member

stack trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x1000f6bbc)
  * frame #0: 0x00000001000f6bbc superpmi`DBG_DebugBreak
    frame #1: 0x000000010009904c superpmi`DebugBreak + 820
    frame #2: 0x0000000100094cc8 superpmi`DbgAssertDialog + 160
    frame #3: 0x00000001000806f4 superpmi`operator delete(void*) + 64
    frame #4: 0x0000000100078504 superpmi`MethodContextReader::CheckForPairedFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 840
    frame #5: 0x00000001000787d8 superpmi`MethodContextReader::MethodContextReader(char const*, int const*, int, char*, int, int) + 288
    frame #6: 0x000000010000f310 superpmi`main + 1224
    frame #7: 0x000000018c6be0e0 dyld`start + 2360

@kunalspathak
Copy link
Member

When using substr, we end up using libc++ malloc and then we free the tmp using ClrFree. This happens with Clang 14 (that is in CI). When I locally built using Clang 15 on osx/arm64, it doesn't repro and substr does use the ClrMalloc. Thanks @jkoritzinsky for helping validate this theory. Talking offline with @jkoritzinsky, we were discussing if we should just stop using the new/delete overrides and use the standard operator new/delete. This is definitely unrelated to superpmi.

@jkotas or @davidwrighton - any idea on how to proceed with this and assign to the right owner?

PS: It is surprising why we don't see this failure for every test running on osx/arm64. Superpmi-collect still gathers the collection except some of the failing tests, so this is not totally a blocking issue.

@kunalspathak kunalspathak removed their assignment May 2, 2024
@jkotas
Copy link
Member Author

jkotas commented May 2, 2024

use the standard operator new/delete.

We use custom operator new to make it possible for our EH infrastructure to catch our custom our of memory exceptions. If we were to switch to standard operator new, we would need to update our EH infrastructure to do the same for the std out of memory exceptions. I am not sure what it would take.

To fix the immediate problem with superpmi, you may want to build a custom utilcode without the operator new or stop linking utilcode into superpmi completely (how much utilcode is used by superpmi?).

cc @AaronRobinsonMSFT

@kunalspathak
Copy link
Member

kunalspathak commented May 2, 2024

how much utilcode is used by superpmi?

Is removing utilcodestaticnohost good enough?

2cbda5a

@jkotas
Copy link
Member Author

jkotas commented May 2, 2024

I think so - if it builds.

@kunalspathak
Copy link
Member

I think so - if it builds.

yes, it build on windows and as expected not on linux. I will see if i can pull the dependencies out.

@kunalspathak
Copy link
Member

There are quite a bit of code that I tried bring in superpmi starting from DbgAssertDialog, PAL_CPP_TRY, EX_TRY, etc. and the list gets bigger. It seems @jkoritzinsky is fixing this problem as
I read #101811 (comment). I will probably wait for it rather than going behind this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants