Fix race condition between GCInfo and Rundown #70609

davmason · 2022-06-11T09:19:15Z

Adds a check to see if the GCInfo is published yet before trying to emit a rundown event
Published GCInfo under the code heap crst so we can only see an uninitialized GCInfo or a complete one
Moved the freeing of the temporary code heaps in LCG methods to after we call FreeCodeMemory so we don't race on deletion

jkotas · 2022-06-11T15:11:09Z

This makes the JIT/EE contract complicated. Instead of teaching JIT the intricacies of publishing the code and related artifacts that is VM implementation detail, it would be better to publish everything in the right order once the JIT returns here:

runtime/src/coreclr/vm/jitinterface.cpp

Line 12954 in 86a59cd

jitInfo.WriteCode(jitMgr);

. We are doing some of the publishing in this place already.

davmason · 2022-06-15T08:20:33Z

@jkotas Looking at the prior art, it seems like EEJitManager is a good place to put this sort of logic. Is that right?

Then CEEJitInfo::WriteCode can call EEJitManager::PublishGCInfo and it can take the code heap lock, and everything is all good - in rundown we can check and either get fully published GCInfo or uninitialized and skip it.

jkotas · 2022-06-15T16:43:41Z

@jkotas Looking at the prior art, it seems like EEJitManager is a good place to put this sort of logic. Is that right?

Yep.

davmason · 2022-06-16T00:10:19Z

With the test app below it would hit asserts on a checked build within a minute before, and I have run it for an hour or so with no issue now on windows x64. On x86 it runs out of memory after ~25k dynamic methods, but works fine up until then.

I am going to run it against linux arm32 to make sure, but I have no reason to suspect it will be any different there


using Microsoft.Diagnostics.NETCore.Client;
using Microsoft.Diagnostics.Tracing;
using System.Diagnostics;
using System.Diagnostics.Tracing;
using System.Reflection;
using System.Reflection.Emit;

Console.WriteLine("Hello, World!");

long numDynamicMethods = 0;
List<Thread> threads = new List<Thread>();
int numThreads = 100;
for (int i = 0; i < numThreads; i++)
{
    Thread t = new Thread(MakeDynamicMethods);
    t.Start();
    threads.Add(t);
}

Thread gcTriggerThread = new Thread(() =>
    {
        while (true)
        {
            Thread.Sleep(100);
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }
    });

gcTriggerThread.Start();
threads.Add(gcTriggerThread);

while (true)
{
    Console.WriteLine($"New EventPipe session, dynamic methods={numDynamicMethods}");

    int processId = Process.GetCurrentProcess().Id;
    DiagnosticsClient client = new DiagnosticsClient(processId);

    int numEvents = 0;
    List<EventPipeProvider> providers = new List<EventPipeProvider>()
    {
        new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Verbose),
        new EventPipeProvider("Microsoft-Windows-DotNETRuntimeRundown", EventLevel.Verbose),
        new EventPipeProvider("Microsoft-DotNETCore-SampleProfiler", EventLevel.Verbose),
    };
    using (EventPipeSession session = client.StartEventPipeSession(providers, /* requestRunDown */ true))
    { 
        EventPipeEventSource source = new EventPipeEventSource(session.EventStream);
        source.Dynamic.All += (TraceEvent traceEvent) =>
        {
            ++numEvents;
        };

        Thread processingThread = new Thread(new ThreadStart(() =>
        {
            source.Process();
            Console.WriteLine($"Saw {numEvents} events.");
        }));
        processingThread.Start();

        Thread.Sleep(100);

        // The events are fired in the JITCompilationStarted callback for TriggerMethod,
        // so by the time we are here, all events should be fired.
        session.Stop();

        processingThread.Join();
    }
}

void MakeDynamicMethods(object? obj)
{
    Random random = new Random();
    while (true)
    {
        AssemblyName name = new AssemblyName(GetRandomName());
        AssemblyBuilder dynamicAssembly = AssemblyBuilder.DefineDynamicAssembly(name, AssemblyBuilderAccess.RunAndCollect);
        ModuleBuilder dynamicModule = dynamicAssembly.DefineDynamicModule(GetRandomName());

        Type[] methodArgs = { typeof(int) };

        DynamicMethod squareIt = new DynamicMethod(
            "SquareIt",
            typeof(long),
            methodArgs,
            dynamicModule);

        ILGenerator il = squareIt.GetILGenerator();
        il.Emit(OpCodes.Ldarg_0);
        il.Emit(OpCodes.Conv_I8);
        il.Emit(OpCodes.Dup);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Ret);

        OneParameter<long, int> invokeSquareIt =
            (OneParameter<long, int>)
            squareIt.CreateDelegate(typeof(OneParameter<long, int>));

        invokeSquareIt(random.Next());

        Interlocked.Increment(ref numDynamicMethods);
    }
}

static string GetRandomName()
{
    return Guid.NewGuid().ToString();
}

delegate long SquareItInvoker(int input);

delegate TReturn OneParameter<TReturn, TParameter0>
    (TParameter0 p0);

jkotas · 2022-06-16T00:21:40Z

src/coreclr/vm/eventtrace.cpp

+ // it is not in use yet.
+#ifdef TARGET_X86
+ hdrInfo gcInfo;
+ DecodeGCHdrInfo(codeInfo.GetGCInfoToken(),


Should this rather check for NULL GetGCInfoToken? I do not think we should be attempting to decode the GC info that has not been published.

Or maybe this change is not needed with the new approach?

I spent a couple hours today trying to convince myself if this check is needed or not, and the more I look at it the more convinced I am that the race condition is different than the original hypothesis

The EECodeHeapIterator uses MethodSectionIterator to go through all active jitted methods, and it will only return a method if the appropriate index in HeapList::pHdrMap is set.

But we only set the index in pHdrMap in EEJitManager::NibbleMapSet, which is called from CEEJitInfo::WriteCode, after the GCInfo is generated.

Either I'm missing something or the real issue is a combination of the freeing happening in the wrong order on all arches, and then pointer tearing on arm archictures because the publishing for codeHeaders happens outside the lock

runtime/src/coreclr/vm/jitinterface.cpp

Line 10927 in 48adb84

memcpy(codeWriterHolder.GetRW(), m_CodeHeaderRW, m_codeWriteBufferSize);

If I run my repro app with the fix to move freeing the code header to after freeing the code data, I no longer hit the assert on x64, which suggests but does not confirm my hypothesis

Hmm, the memcpy I point out there also happens before NibbleMapSet, so I think I am missing something

jkotas · 2022-06-16T00:24:37Z

src/coreclr/vm/jitinterface.cpp

@@ -10970,6 +10970,12 @@ void CEEJitInfo::WriteCode(EEJitManager * jitMgr)
 UnwindInfoTable::PublishUnwindInfoForMethod(m_moduleBase, m_CodeHeader->GetUnwindInfo(0), m_totalUnwindInfos);
 #endif // defined(TARGET_AMD64)

+ {
+ ExecutableWriterHolder<BYTE *> gcInfoWriterHolder(m_CodeHeader->GetGCInfoAddr(), sizeof(void *));


ExecutableWriterHolder is not a cheap operation. WriteCodeBytes has one already. Can we refactor such that we have just one ExecutableWriterHolder for both operations?

jkotas · 2022-06-16T00:25:53Z

src/coreclr/vm/codeman.cpp

@@ -3215,21 +3215,22 @@ BYTE* EEJitManager::allocGCInfo(CodeHeader* pCodeHeader, DWORD blockSize, size_t
 } CONTRACTL_END;

 MethodDesc* pMD = pCodeHeader->GetMethodDesc();


You can just pass in the MethodDesc instead of the whole CodeHeader. This method does not need the CodeHeader anymore.

jkotas · 2022-06-16T00:26:22Z

src/coreclr/vm/jitinterface.cpp

- block = m_jitManager->allocGCInfo(m_CodeHeaderRW,(DWORD)size, &m_GCinfo_len);
- if (!block)
+ m_pGCInfo = m_jitManager->allocGCInfo(m_CodeHeaderRW,(DWORD)size, &m_GCinfo_len);
+ if (!m_pGCInfo)


This should not be needed. allocGCInfo throws on OOM.

tommcdon · 2022-06-28T15:13:13Z

@davmason is this PR still active or should we move to draft mode?

davmason · 2022-06-29T10:45:37Z

@davmason is this PR still active or should we move to draft mode?

Still active, just didn't finish it before I took vacation

davmason · 2022-06-30T10:25:10Z

@jkotas - I've tested the heck out of it and have convinced myself the only change needed is to free the dynamic code heaps after freeing the code data. I can run my test program for hours without a crash with just that change

jkotas

Thanks

jkotas · 2022-06-30T14:02:25Z

The test failure is #70450

davmason requested a review from a team June 11, 2022 09:19

davmason requested a review from MichalStrehovsky as a code owner June 11, 2022 09:19

dotnet-issue-labeler bot added the area-Tracing-coreclr label Jun 11, 2022

davmason self-assigned this Jun 11, 2022

davmason added this to the 7.0.0 milestone Jun 11, 2022

jkotas reviewed Jun 16, 2022

View reviewed changes

Move freeing the dynamic heaps until after freeing the code data

9236c0b

davmason force-pushed the tracing_bug branch from 11f0b86 to 9236c0b Compare June 30, 2022 10:22

jkotas approved these changes Jun 30, 2022

View reviewed changes

jkotas merged commit 02b840c into dotnet:main Jun 30, 2022

ghost locked as resolved and limited conversation to collaborators Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition between GCInfo and Rundown #70609

Fix race condition between GCInfo and Rundown #70609

davmason commented Jun 11, 2022

jkotas commented Jun 11, 2022

davmason commented Jun 15, 2022

jkotas commented Jun 15, 2022

davmason commented Jun 16, 2022

jkotas Jun 16, 2022

jkotas Jun 16, 2022

davmason Jun 17, 2022

davmason Jun 17, 2022

jkotas Jun 16, 2022

jkotas Jun 16, 2022

jkotas Jun 16, 2022

tommcdon commented Jun 28, 2022

davmason commented Jun 29, 2022

davmason commented Jun 30, 2022

jkotas left a comment

jkotas commented Jun 30, 2022

		@@ -3215,21 +3215,22 @@ BYTE* EEJitManager::allocGCInfo(CodeHeader* pCodeHeader, DWORD blockSize, size_t
		} CONTRACTL_END;

		MethodDesc* pMD = pCodeHeader->GetMethodDesc();

Fix race condition between GCInfo and Rundown #70609

Fix race condition between GCInfo and Rundown #70609

Conversation

davmason commented Jun 11, 2022

jkotas commented Jun 11, 2022

davmason commented Jun 15, 2022

jkotas commented Jun 15, 2022

davmason commented Jun 16, 2022

jkotas Jun 16, 2022

Choose a reason for hiding this comment

jkotas Jun 16, 2022

Choose a reason for hiding this comment

davmason Jun 17, 2022

Choose a reason for hiding this comment

davmason Jun 17, 2022

Choose a reason for hiding this comment

jkotas Jun 16, 2022

Choose a reason for hiding this comment

jkotas Jun 16, 2022

Choose a reason for hiding this comment

jkotas Jun 16, 2022

Choose a reason for hiding this comment

tommcdon commented Jun 28, 2022

davmason commented Jun 29, 2022

davmason commented Jun 30, 2022

jkotas left a comment

Choose a reason for hiding this comment

jkotas commented Jun 30, 2022