Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MulticoreJIT accesses m_GenericInfoCount field from multiple threads without synchronization #46778

Closed
agocke opened this issue Jan 9, 2021 · 8 comments · Fixed by #46783
Closed
Labels
area-TieredCompilation-coreclr bug untriaged New issue has not been triaged by the area owner

Comments

@agocke
Copy link
Member

agocke commented Jan 9, 2021

See MultiCoreJitRecord::WriteOutput for an example.

This looks like it can cause memory corruption.

@agocke agocke added bug area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jan 9, 2021
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 9, 2021
@agocke
Copy link
Member Author

agocke commented Jan 9, 2021

@mangod9 It looks like this is the cause of the crashes in the PGO runs. It may also be a severe problem for other users of multi-core JIT.

cc @davidwrighton

@jkotas
Copy link
Member

jkotas commented Jan 9, 2021

#39996 cc @clamp03

@mangod9
Copy link
Member

mangod9 commented Jan 9, 2021

@agocke do you have a link to the pgo jobs which are failing?

@agocke
Copy link
Member Author

agocke commented Jan 9, 2021

https://dev.azure.com/dnceng/internal/_build/results?buildId=939897&view=logs&j=4859eead-9322-5aef-9825-19eb054d3a82

Although I wouldn't suggest trying to debug through that -- I can provide a dump, but simply doing a volatile load from that field at the beginning of WriteOutput and using a local from then on was enough to reduce the race to no longer produce memory corruption.

I'm not sure what the appropriate level of synchronization for that field is, though.

@clamp03
Copy link
Member

clamp03 commented Jan 9, 2021

Hi, @agocke I cannot access that dump. In the dumped backtrace, is there "WriteMulticoreJitProfiler"?
I found that both recording and writeoutput can run if writeouput is executed from WriteMulticoreJitProfile in non-unix.
If not, could you please share the backtrace?

clamp03 added a commit to clamp03/runtime that referenced this issue Jan 9, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 9, 2021
@jkotas jkotas added area-TieredCompilation-coreclr and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jan 9, 2021
@agocke
Copy link
Member Author

agocke commented Jan 9, 2021

Yup, stack trace in the crash looks like

 	ntdll.dll!RtlReportFatalFailure�()	Unknown
 	ntdll.dll!RtlReportCriticalFailure�()	Unknown
 	ntdll.dll!RtlpHeapHandleError�()	Unknown
 	ntdll.dll!RtlpHpHeapHandleError�()	Unknown
 	ntdll.dll!RtlpLogHeapFailure�()	Unknown
 	ntdll.dll!RtlpFreeHeapInternal()	Unknown
 	ntdll.dll!RtlFreeHeap()	Unknown
>	coreclr.dll!ClrFree(void * p) Line 302	C++
 	coreclr.dll!operator delete[](void * p) Line 410	C++
 	coreclr.dll!MulticoreJitRecorder::WriteOutput(IStream * pStream) Line 557	C++
 	coreclr.dll!MulticoreJitRecorder::WriteOutput() Line 155	C++
 	coreclr.dll!MulticoreJitRecorder::StopProfile(bool) Line 951	C++
 	coreclr.dll!MulticoreJitRecorder::WriteMulticoreJitProfiler(_TP_CALLBACK_INSTANCE * pInstance, void * pvContext, _TP_TIMER * pTimer) Line 810	C++
 	ntdll.dll!TppTimerpExecuteCallback()	Unknown
 	ntdll.dll!TppWorkerThread()	Unknown
 	kernel32.dll!00007fff9a567034()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

@agocke agocke mentioned this issue Jan 11, 2021
2 tasks
jkotas pushed a commit that referenced this issue Jan 12, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 12, 2021
@agocke
Copy link
Member Author

agocke commented Jan 12, 2021

@jkotas This manifested as memory corruption in the test case. Do we want to consider backporting the fix?

@jkotas
Copy link
Member

jkotas commented Jan 12, 2021

#39996 that introduced this crash is in .NET 6 only.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-TieredCompilation-coreclr bug untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants