EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

davmason · 2020-05-07T21:20:05Z

tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh fails under gcstress (as seen in #35884)

Copying the analysis from this comment: #35884 (comment)

An EventSource will call dispose on its underlying providers when it is finalized or Dispose is called, and in turn the EventPipeProvider will call EventUnregister which calls EventPipeInternal::DeleteProvider.

Up until this point everything is ok, but inside of EventPipe::DeleteProvider there is a check for Enabled(), which checks if there are any active sessions, and if so will put the providers in a deferred delete state.

runtime/src/coreclr/src/vm/eventpipe.cpp

Lines 616 to 627 in c2b1018

    
           if (pProvider != NULL) 
        
           { 
        
               if (Enabled()) 
        
               { 
        
                   // Save the provider until the end of the tracing session. 
        
                   pProvider->SetDeleteDeferred(); 
        
               } 
        
               else 
        
               { 
        
                   // Delete the provider now. 
        
                   s_config.DeleteProvider(pProvider); 
        
               }

Which means if any other provider is active when the EventSource tries to delete itself from EventPipe, it will get placed in the deferred list. For example, in this test Microsoft-Windows-DotNETRuntime is also active. This is fundamentally flawed for interacting with managed providers, because as soon as they are disposed/finalized the EventProvider's m_etwCallback will be collected and later when EventPipe goes to delete the deferred providers we will attempt to reverse pinvoke to a collected delegate.

I suspect this has gone undetected for so long because we don't have a ton of testing around EventSources and how they interact with EventPipe, and likely the tests that do test the scenario just happen to be lucky and the delegate doesn't get collected in the timeframe. Even this test passes fine under normal circumstances and it takes GCStress to fail.

ghost · 2020-05-07T21:20:07Z

Tagging subscribers to this area: @tommcdon
Notify danmosemsft if you want to be subscribed.

davmason · 2020-05-07T21:21:33Z

Fixing this will likely involve either doing something to root the EventSource in question until after shutdown occurs, or cleaning up the provider during DeleteProvider instead of marking it as deferred. A fix still needs investigated, but we should fix this for 5.0.

davmason · 2020-05-07T21:26:25Z

Here is a repro program, if you run withCOMPLUS_GCStress=3 it will repro consistently.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Diagnostics.Tracing;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Threading;
using Microsoft.Diagnostics.NETCore.Client;
using Microsoft.Diagnostics.Tracing;
using Microsoft.Diagnostics.Tracing.Etlx;

namespace MultiThreadedApp
{
    [EventData]
    struct NestedSerializableClass
    {
        public int i;
    }

    [EventData]
    struct SerializableClass
    {
        public int i;
        public NestedSerializableClass nested;
        public DateTime time;
    }

    class MyEventSource : EventSource
    {
        public MyEventSource()
            : base(EventSourceSettings.EtwSelfDescribingEventFormat)
        {

        }

        [Event(1, Level = EventLevel.LogAlways)]
        public void BasicEvent(int i)
        {
            WriteEvent(1, i);
        }

        protected override void OnEventCommand(EventCommandEventArgs command)
        {
            Console.WriteLine($"command={command.Command}");
        }
    }

    class Program
    {
        public static EventPipeSession AttachEventPipeSessionToSelf(IEnumerable<EventPipeProvider> providers)
        {
            int processId = Process.GetCurrentProcess().Id;
            DiagnosticsClient client = new DiagnosticsClient(processId);
            return client.StartEventPipeSession(providers, /* requestRunDown */ false);
        }

        static void Main(string[] args)
        {
            try
            {
                List<EventPipeProvider> providers = new List<EventPipeProvider>
                {
                    new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, (long)Keywords.Default),
                    new EventPipeProvider("MyEventSource", EventLevel.Verbose, (long)0xf00000000000)
                };

                using (EventPipeSession session = AttachEventPipeSessionToSelf(providers))
                {
                    WriteEvents();

                    ManualResetEvent allEventsReceivedEvent = new ManualResetEvent(false);

                    int eventCount = 0;
                    var source = new EventPipeEventSource(session.EventStream);
                    source.Dynamic.All += (TraceEvent traceEvent) =>
                    {
                        if (traceEvent.ProviderName != "Microsoft-Windows-DotNETRuntime")
                        {
                            Console.WriteLine($"{traceEvent.ToString()} {traceEvent.Keywords.ToString("X")}");
                        }

                        if (traceEvent.ProviderName != "MyEventSource")
                        {
                            return;
                        }

                        Console.WriteLine($"Got event {traceEvent.EventName} with payload count {traceEvent.PayloadNames.Length}");
                        ++eventCount;
                        if (eventCount == 3)
                        {
                            allEventsReceivedEvent.Set();
                        }
                    };

                    Thread processingThread = new Thread(new ThreadStart(() =>
                    {
                        source.Process();
                    }));

                    Console.WriteLine("Starting processing thread");
                    processingThread.Start();

                    session.Stop();

                    Console.WriteLine("Waiting on event");
                    allEventsReceivedEvent.WaitOne(TimeSpan.FromSeconds(5));
                    Console.WriteLine("Joining processing thread");
                    processingThread.Join();
                }
            }
            catch (Exception e)
            {
                Console.WriteLine($"Got exception {e.Message} while processing events.");
            }
        }

        private static void WriteEvents()
        {
            MyEventSource eventSource = new MyEventSource();

            Console.WriteLine("Writing events from MyEventSource");
            eventSource.BasicEvent(12);

            Console.WriteLine("Done");
        }
    }
}

davmason · 2020-05-30T08:08:35Z

I'm going to look at this as the next bug on my plate

tommcdon · 2020-07-29T20:59:48Z

@davmason is this still an issue we are targeting for 5.0?

davmason · 2020-07-31T00:17:01Z

Yes, this should still go in for 5.0. It will be a relatively low risk fix and have a lot of value. We regularly run in to it in our test runs

davmason added the area-Diagnostics-coreclr label May 7, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 7, 2020

sywhang added this to the 5.0 milestone May 7, 2020

tommcdon added p1 and removed untriaged New issue has not been triaged by the area owner labels May 7, 2020

davmason added area-Tracing-coreclr and removed area-Diagnostics-coreclr labels May 8, 2020

davmason self-assigned this May 30, 2020

sywhang mentioned this issue Jun 5, 2020

Test failure: tracing\\eventpipe\\eventsourceerror\\eventsourceerror\\eventsourceerror.cmd #37238

Closed

davmason mentioned this issue Jul 31, 2020

Test failed: eventpipe_readevents "A callback was made on a garbage collected delegate" #39682

Closed

davmason mentioned this issue Jul 31, 2020

Unregister callback when setting a provider for deferred delete #40191

Merged

davmason closed this as completed in #40191 Jul 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

davmason commented May 7, 2020

ghost commented May 7, 2020

davmason commented May 7, 2020

davmason commented May 7, 2020 •

edited

Loading

davmason commented May 30, 2020

tommcdon commented Jul 29, 2020

davmason commented Jul 31, 2020

EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

Comments

davmason commented May 7, 2020

ghost commented May 7, 2020

davmason commented May 7, 2020

davmason commented May 7, 2020 • edited Loading

davmason commented May 30, 2020

tommcdon commented Jul 29, 2020

davmason commented Jul 31, 2020

davmason commented May 7, 2020 •

edited

Loading