Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker service sometimes hangs at profiler startup #191

Closed
dolly22 opened this issue Feb 14, 2023 · 4 comments
Closed

Worker service sometimes hangs at profiler startup #191

dolly22 opened this issue Feb 14, 2023 · 4 comments
Assignees
Labels
status: investigating Issue in investigating, fix not available yet. troubleshooting Tracking for troubleshooting
Milestone

Comments

@dolly22
Copy link

dolly22 commented Feb 14, 2023

Hello, I have implemented service profiler in worker service for Orleans silo grain calls based on #140. Sometimes i run into a problem that whole process just hangs up at startup. I tried to replicate this issue locally to create isolated repro, but was not yet successful. I do have stuck process memory dumps from our test environment. The application is hosted as ServiceFabric stateless service.

The process is stuck when the host is starting background services somewhere in Microsoft.ApplicationInsights.Profiler.AspNetCore.dll!Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerBackgroundService.ExecuteAsync(System.Threading.CancellationToken stoppingToken)

I think this might have something to do with how ServiceProfilerBackgroundService.ExecuteAsync is implemented and when it truly becomes asynchronous (dotnet/runtime#36063).

This is a partial stacktrace of stuck startup thread:

ntdll.dll!00007ffa2d7508b4()	Unknown
KERNELBASE.dll!00007ffa2a4f98a7()	Unknown
System.IO.Pipes.dll!00007ff9fd209e3a()	Unknown
[Managed to Native Transition]	
System.IO.Pipes.dll!Interop.Kernel32.ReadFile(System.Runtime.InteropServices.SafeHandle handle, byte* bytes, int numBytesToRead, out int numBytesRead, nint mustBeZero)	Unknown
System.IO.Pipes.dll!System.IO.Pipes.PipeStream.ReadCore(System.Span<byte> buffer)	Unknown
System.Private.CoreLib.dll!System.IO.Stream.ReadAtLeastCore(System.Span<byte> buffer, int minimumBytes, bool throwOnEndOfStream) Line 1065	C#
System.Private.CoreLib.dll!System.IO.BinaryReader.ReadBytes(int count) Line 448	C#
Microsoft.Diagnostics.NETCore.Client.dll!Microsoft.Diagnostics.NETCore.Client.IpcHeader.Parse(System.IO.BinaryReader reader) Line 71	C#
Microsoft.Diagnostics.NETCore.Client.dll!Microsoft.Diagnostics.NETCore.Client.IpcMessage.Parse(System.IO.Stream stream) Line 97	C#
Microsoft.Diagnostics.NETCore.Client.dll!Microsoft.Diagnostics.NETCore.Client.IpcClient.Read(System.IO.Stream stream) Line 95	C#
Microsoft.Diagnostics.NETCore.Client.dll!Microsoft.Diagnostics.NETCore.Client.IpcClient.SendMessageGetContinuation(Microsoft.Diagnostics.NETCore.Client.IpcEndpoint endpoint, Microsoft.Diagnostics.NETCore.Client.IpcMessage message) Line 34	C#
Microsoft.Diagnostics.NETCore.Client.dll!Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Start(Microsoft.Diagnostics.NETCore.Client.IpcEndpoint endpoint, System.Collections.Generic.IEnumerable<Microsoft.Diagnostics.NETCore.Client.EventPipeProvider> providers, bool requestRundown, int circularBufferMB) Line 34	C#
Microsoft.Diagnostics.NETCore.Client.dll!Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.StartEventPipeSession(System.Collections.Generic.IEnumerable<Microsoft.Diagnostics.NETCore.Client.EventPipeProvider> providers, bool requestRundown, int circularBufferMB) Line 55	C#
Microsoft.ApplicationInsights.Profiler.Core.dll!Microsoft.ApplicationInsights.Profiler.Core.TraceControls.DiagnosticsClientTraceControl.Enable(string traceFilePath) Line 103	C#
Microsoft.ApplicationInsights.Profiler.Core.dll!Microsoft.ApplicationInsights.Profiler.Core.ServiceProfilerProvider.StartServiceProfilerAsync(Microsoft.ServiceProfiler.Orchestration.IProfilerSource source, System.Threading.CancellationToken cancellationToken)	C#
Microsoft.ServiceProfiler.EventPipe.Orchestration.dll!Microsoft.ApplicationInsights.Profiler.Core.Orchestration.OrchestratorEventPipe.StartProfilingAsync(Microsoft.ServiceProfiler.Orchestration.SchedulingPolicy policy, System.Threading.CancellationToken cancellationToken)	C#
Microsoft.ServiceProfiler.Orchestration.dll!Microsoft.ServiceProfiler.Orchestration.SchedulingPolicy.StartPolicyAsync(System.Threading.CancellationToken cancellationToken)	C#
Microsoft.ServiceProfiler.EventPipe.Orchestration.dll!Microsoft.ApplicationInsights.Profiler.Core.Orchestration.OrchestratorEventPipe.StartAsync(System.Threading.CancellationToken cancellationToken)	C#
Microsoft.ApplicationInsights.Profiler.AspNetCore.dll!Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerAgentBootstrap.ActivateAsync(System.Threading.CancellationToken cancellationToken)	C#
Microsoft.ApplicationInsights.Profiler.AspNetCore.dll!Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerBackgroundService.ExecuteAsync(System.Threading.CancellationToken stoppingToken) Line 22	C#
Microsoft.Extensions.Hosting.Abstractions.dll!Microsoft.Extensions.Hosting.BackgroundService.StartAsync(System.Threading.CancellationToken cancellationToken)	Unknown
Microsoft.Extensions.Hosting.dll!Microsoft.Extensions.Hosting.Internal.Host.StartAsync(System.Threading.CancellationToken cancellationToken)	Unknown
@xiaomi7732 xiaomi7732 self-assigned this Feb 17, 2023
@xiaomi7732
Copy link
Member

Hey @dolly22, thanks for filing the issue. Could you fill me in on why do you think the Profiler background service is blocking the whole host? By the callstack, it looks like the background service is running as expected. What do I miss?

@xiaomi7732 xiaomi7732 added the troubleshooting Tracking for troubleshooting label Feb 17, 2023
@dolly22
Copy link
Author

dolly22 commented Feb 17, 2023

HI @xiaomi7732, thanks for your time looking into this.

  1. The last lines in application log indicated there is some problem with profiler startup. There were no other log messages in next 24h until process was forcibly restarted. There was no indication in log that other background services would be started.
Debug	2023-01-31 15:00:40.7073	Microsoft.ApplicationInsights.Profiler.Core.ServiceProfilerProvider		Call TraceControl.Enable().		
Debug	2023-01-31 15:00:40.7073	Microsoft.ApplicationInsights.Profiler.Core.ServiceProfilerProvider		Trace File Path: D:\SvcFab\_App\eVzp.Idm_App1464\temp\6fcdb54a-a293-4034-a65a-943541c2d377.netperf		
Debug	2023-01-31 15:00:40.7063	Microsoft.ApplicationInsights.Profiler.Core.Orchestration.OneTimeSchedulingPolicy		Scheduling Policy: OneTimeSchedulingPolicy requesting StartProfilingSession for 00:02:00		
Debug	2023-01-31 15:00:40.7063	Microsoft.ApplicationInsights.Profiler.Core.ServiceProfilerProvider		Entering StartServiceProfilerAsync.		
Debug	2023-01-31 15:00:40.7063	Microsoft.ApplicationInsights.Profiler.Core.ServiceProfilerProvider		Got the semaphore. Try starting the Profiler.		
Debug	2023-01-31 15:00:40.7053	Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerAgentBootstrap		Good major version. Pass Runtime Compatibility test.		
Debug	2023-01-31 15:00:40.7053	Microsoft.ApplicationInsights.Profiler.Core.Orchestration.OrchestratorEventPipe		5 scheduling policies has been activated.		
Debug	2023-01-31 15:00:40.7053	Microsoft.ApplicationInsights.Profiler.Core.Orchestration.OrchestratorEventPipe		Starting the orchestrator.		
Debug	2023-01-31 15:00:40.7043	Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerAgentBootstrap		User Settings:
{"bufferSizeInMB":250,"duration":"00:02:00","initialDelay":"00:00:00","configurationUpdateFrequency":"00:00:05","provideAnonymousTelemetry":true,"isDisabled":false,"randomProfilingOverhead":0.05,"isSkipCompatibilityTest":false,"endpoint":null,"skipUpload":false,"uploadMode":"OnSuccess","preserveTraceFile":false,"skipEndpointCertificateValidation":false,"cpuTriggerThreshold":80,"cpuTriggerCooldown":"04:00:00","memoryTriggerThreshold":80,"memoryTriggerCooldown":"04:00:00","standaloneMode":false,"localCacheFolder":"D:\\SvcFab\\_App\\eVzp.Idm_App1464\\temp\\","allowsCrash":false,"uploaderEnvironment":"Production","namedPipe":{"connectionTimeout":"00:00:30","defaultMessageTimeout":"00:02:00"},"customEventPipeProviders":null}		
Info	2023-01-31 15:00:40.6713	Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerAgentBootstrap		Starting application insights profiler with instrumentation key: [removed]	
  1. Before restarting the process i have obtained memory dump for further analysis. If you look how base BackgroundService ExecuteAsync is implemented it seems from the stack trace the thread is still executing synchronous part of ExecuteAsync...
        public virtual Task StartAsync(CancellationToken cancellationToken)
        {
            // Create linked token to allow cancelling executing task from provided token
            _stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);

            // Store the task we're executing
            _executeTask = ExecuteAsync(_stoppingCts.Token);

            // If the task is completed then return it, this will bubble cancellation and failure to the caller
            if (_executeTask.IsCompleted)
            {
                return _executeTask;
            }

            // Otherwise it's running
            return Task.CompletedTask;
        }
  1. In the discussion of mentioned dotnet/runtime issue there are further details how synchronous part of ExecuteAsync BackgroundService blocked the execution of whole host dotnet/runtime#36063 (comment) delays further host startup. There is also link do docs that mention this explicitly.

  2. I was able to workaround this issue by conceptually changing the profiler startup method not to block startup and initialize on thread pool.

        protected override async Task ExecuteAsync(CancellationToken ct)
        {
            await Task.Yield();
            await _bootstrap.ActivateAsync(stoppingToken);
        }

The profiler is now starting on thread pool without blocking, and i did not encounter any further startup hangups in our test environment. I suspect the core issue might be something similar to dotnet/diagnostics#1060 (that blocking the host startup somehow causes no diagnostics events to be emitted and EventPipeSession.Start not to complete).

Microsoft.ApplicationInsights.Profiler.AspNetCore.dll!Microsoft.ApplicationInsights.Profiler.AspNetCore.ServiceProfilerBackgroundService.ExecuteAsync(System.Threading.CancellationToken stoppingToken) Line 22	C#
Vzp.CloudFabric.Common.ConsoleHost.dll!Vzp.CloudFabric.Common.ApplicationInsights.StartupWrapperBuilder.ExecuteAsync(System.Func<System.Threading.CancellationToken, System.Threading.Tasks.Task> baseExecuteAsync, System.Threading.CancellationToken ct) Line 101	C#
[Resuming Async Method]	
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)	Unknown
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<Vzp.CloudFabric.Common.ApplicationInsights.StartupWrapperBuilder.<ExecuteAsync>d__1>.MoveNext(System.Threading.Thread threadPoolThread)	Unknown
System.Private.CoreLib.dll!System.Runtime.CompilerServices.YieldAwaitable.YieldAwaiter.OutputCorrelationEtwEvent.AnonymousMethod__6_0(System.Action innerContinuation, System.Threading.Tasks.Task continuationIdTask)	Unknown
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()	Unknown
System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()	Unknown

The only problem with this workaround is that due to ServiceProfilerBackgroundService and ServiceProfilerAgentBootstrap being internal it involves some ugly reflection and ILGenerator.Emit that i would prefer not to have to maintain.

BTW: Thanks very much for implementing worker service support. Besides of this issue the profiler works wonderfully. Implementing support for profiling Orleans silo grain calls was pretty straight forward.

I did have some issues when implementing support for profiling Service Bus message processing in our worker service, because ApplicationInsights tracking for Azure Sdk telemetry (Microsoft.ApplicationInsights.DependencyCollector.DependencyTrackingTelemetryModule) internally does not Start() created RequestTelemetry objects. That causes service profiler no to see any interesting events/requests in traces. Again the workaround involves some reflection to internal implementation and it would be wonderful if it just worked out of the box.

@xiaomi7732 xiaomi7732 added the status: investigating Issue in investigating, fix not available yet. label Feb 17, 2023
@xiaomi7732
Copy link
Member

Hey @dolly22 thanks for the detailed analysis. It didn't make sense to me until I see this line of code in our codebase:

...
await _orchestrator.StartAsync(cancellationToken).ConfigureAwait(true);  // ConfigureAwait to true? Why?

I am going to issue a PR:

  1. To fix the bug above.
  2. Update the background service:
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
    await Task.Yield();
    await _bootstrap.ActivateAsync(stoppingToken).ConfigureAwait(false);
}

Thanks again for your report.

@xiaomi7732 xiaomi7732 added this to the 2.5.1 milestone Feb 17, 2023
@xiaomi7732
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: investigating Issue in investigating, fix not available yet. troubleshooting Tracking for troubleshooting
Projects
None yet
Development

No branches or pull requests

2 participants