Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

Closed
davmason opened this issue May 7, 2020 · 6 comments · Fixed by #40191
Assignees
Milestone

Comments

@davmason
Copy link
Member

davmason commented May 7, 2020

tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh fails under gcstress (as seen in #35884)

Copying the analysis from this comment: #35884 (comment)

An EventSource will call dispose on its underlying providers when it is finalized or Dispose is called, and in turn the EventPipeProvider will call EventUnregister which calls EventPipeInternal::DeleteProvider.

Up until this point everything is ok, but inside of EventPipe::DeleteProvider there is a check for Enabled(), which checks if there are any active sessions, and if so will put the providers in a deferred delete state.

if (pProvider != NULL)
{
if (Enabled())
{
// Save the provider until the end of the tracing session.
pProvider->SetDeleteDeferred();
}
else
{
// Delete the provider now.
s_config.DeleteProvider(pProvider);
}

Which means if any other provider is active when the EventSource tries to delete itself from EventPipe, it will get placed in the deferred list. For example, in this test Microsoft-Windows-DotNETRuntime is also active. This is fundamentally flawed for interacting with managed providers, because as soon as they are disposed/finalized the EventProvider's m_etwCallback will be collected and later when EventPipe goes to delete the deferred providers we will attempt to reverse pinvoke to a collected delegate.

I suspect this has gone undetected for so long because we don't have a ton of testing around EventSources and how they interact with EventPipe, and likely the tests that do test the scenario just happen to be lucky and the delegate doesn't get collected in the timeframe. Even this test passes fine under normal circumstances and it takes GCStress to fail.

@ghost
Copy link

ghost commented May 7, 2020

Tagging subscribers to this area: @tommcdon
Notify danmosemsft if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 7, 2020
@davmason
Copy link
Member Author

davmason commented May 7, 2020

Fixing this will likely involve either doing something to root the EventSource in question until after shutdown occurs, or cleaning up the provider during DeleteProvider instead of marking it as deferred. A fix still needs investigated, but we should fix this for 5.0.

@sywhang sywhang added this to the 5.0 milestone May 7, 2020
@davmason
Copy link
Member Author

davmason commented May 7, 2020

Here is a repro program, if you run withCOMPLUS_GCStress=3 it will repro consistently.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Diagnostics.Tracing;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Threading;
using Microsoft.Diagnostics.NETCore.Client;
using Microsoft.Diagnostics.Tracing;
using Microsoft.Diagnostics.Tracing.Etlx;

namespace MultiThreadedApp
{
    [EventData]
    struct NestedSerializableClass
    {
        public int i;
    }

    [EventData]
    struct SerializableClass
    {
        public int i;
        public NestedSerializableClass nested;
        public DateTime time;
    }

    class MyEventSource : EventSource
    {
        public MyEventSource()
            : base(EventSourceSettings.EtwSelfDescribingEventFormat)
        {

        }

        [Event(1, Level = EventLevel.LogAlways)]
        public void BasicEvent(int i)
        {
            WriteEvent(1, i);
        }

        protected override void OnEventCommand(EventCommandEventArgs command)
        {
            Console.WriteLine($"command={command.Command}");
        }
    }

    class Program
    {
        public static EventPipeSession AttachEventPipeSessionToSelf(IEnumerable<EventPipeProvider> providers)
        {
            int processId = Process.GetCurrentProcess().Id;
            DiagnosticsClient client = new DiagnosticsClient(processId);
            return client.StartEventPipeSession(providers, /* requestRunDown */ false);
        }

        static void Main(string[] args)
        {
            try
            {
                List<EventPipeProvider> providers = new List<EventPipeProvider>
                {
                    new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, (long)Keywords.Default),
                    new EventPipeProvider("MyEventSource", EventLevel.Verbose, (long)0xf00000000000)
                };

                using (EventPipeSession session = AttachEventPipeSessionToSelf(providers))
                {
                    WriteEvents();

                    ManualResetEvent allEventsReceivedEvent = new ManualResetEvent(false);

                    int eventCount = 0;
                    var source = new EventPipeEventSource(session.EventStream);
                    source.Dynamic.All += (TraceEvent traceEvent) =>
                    {
                        if (traceEvent.ProviderName != "Microsoft-Windows-DotNETRuntime")
                        {
                            Console.WriteLine($"{traceEvent.ToString()} {traceEvent.Keywords.ToString("X")}");
                        }

                        if (traceEvent.ProviderName != "MyEventSource")
                        {
                            return;
                        }

                        Console.WriteLine($"Got event {traceEvent.EventName} with payload count {traceEvent.PayloadNames.Length}");
                        ++eventCount;
                        if (eventCount == 3)
                        {
                            allEventsReceivedEvent.Set();
                        }
                    };

                    Thread processingThread = new Thread(new ThreadStart(() =>
                    {
                        source.Process();
                    }));

                    Console.WriteLine("Starting processing thread");
                    processingThread.Start();

                    session.Stop();

                    Console.WriteLine("Waiting on event");
                    allEventsReceivedEvent.WaitOne(TimeSpan.FromSeconds(5));
                    Console.WriteLine("Joining processing thread");
                    processingThread.Join();
                }
            }
            catch (Exception e)
            {
                Console.WriteLine($"Got exception {e.Message} while processing events.");
            }
        }

        private static void WriteEvents()
        {
            MyEventSource eventSource = new MyEventSource();

            Console.WriteLine("Writing events from MyEventSource");
            eventSource.BasicEvent(12);

            Console.WriteLine("Done");
        }
    }
}

@tommcdon tommcdon added p1 and removed untriaged New issue has not been triaged by the area owner labels May 7, 2020
@davmason davmason self-assigned this May 30, 2020
@davmason
Copy link
Member Author

I'm going to look at this as the next bug on my plate

@tommcdon
Copy link
Member

@davmason is this still an issue we are targeting for 5.0?

@davmason
Copy link
Member Author

Yes, this should still go in for 5.0. It will be a relatively low risk fix and have a lot of value. We regularly run in to it in our test runs

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants