Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test process crash on Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll #55639

Closed
dibarbet opened this issue Aug 16, 2021 · 10 comments
Closed
Assignees
Labels
Area-IDE Flaky Infraswat Test Test failures in roslyn-CI
Milestone

Comments

@dibarbet
Copy link
Member

dibarbet commented Aug 16, 2021

Multiple runs have the test runner crash during the Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll helix work item.

Example output from console log (from https://runfo.azurewebsites.net/view/build/?number=1290779)

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:43.03]     Microsoft.CodeAnalysis.Editor.UnitTests.Workspaces.ProjectCacheHostServiceFactoryTests.TestImplicitCacheMonitoring [SKIP]
Dumping info before call to failfast
Dumping info before call to failfast
Exception info
Exception info
Thread was being aborted.
Thread was being aborted.
   at System.Diagnostics.StackFrameHelper.InitializeSourceInfo(Int32 iSkip, Boolean fNeedFileInfo, Exception exception)
   at System.Diagnostics.StackTrace.CaptureStackTrace(Int32 iSkip, Boolean fNeedFileInfo, Thread targetThread, Exception e)
   at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
   at System.Environment.get_StackTrace()
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonExtensions.SetExtraParameters(IFaultUtility fault, Exception exception, Boolean emptyCallstack) in /_/src/VisualStudio/Core/Def/Implementation/Watson/WatsonExtensions.cs:line 37
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(Exception exception) in /_/src/VisualStudio/Core/Def/Implementation/Watson/WatsonReporter.cs:line 121
   at System.Diagnostics.StackFrameHelper.InitializeSourceInfo(Int32 iSkip, Boolean fNeedFileInfo, Exception exception)
   at System.Diagnostics.StackTrace.CaptureStackTrace(Int32 iSkip, Boolean fNeedFileInfo, Thread targetThread, Exception e)
   at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
   at System.Environment.get_StackTrace()
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonExtensions.SetExtraParameters(IFaultUtility fault, Exception exception, Boolean emptyCallstack) in /_/src/VisualStudio/Core/Def/Implementation/Watson/WatsonExtensions.cs:line 37
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(Exception exception) in /_/src/VisualStudio/Core/Def/Implementation/Watson/WatsonReporter.cs:line 121
Stack trace of handler
Stack trace of handler
   at Microsoft.CodeAnalysis.FailFast.DumpStackTrace(Exception exception, String message)
   at Microsoft.CodeAnalysis.FailFast.OnFatalException(Exception exception)
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(Exception exception)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.Report(Exception exception, Action`1 handler)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.ReportAndCatch(Exception exception)
   at Roslyn.Utilities.TaskExtensions.ReportNonFatalErrorWorker(Task task, Object continuationFunction)
   at System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.Tasks.Task.ExecutionContextCallback(Object obj)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued)
   at System.Threading.Tasks.TaskScheduler.TryRunInline(Task task, Boolean taskWasPreviouslyQueued)
   at System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(Task task, Boolean needsProtection)
   at System.Threading.Tasks.StandardTaskContinuation.Run(Task completedTask, Boolean bCanInlineContinuationTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean bUserDelegateExecuted)
   at System.Threading.Tasks.Task`1.TrySetException(Object exceptionObject)
   at System.Threading.Tasks.UnwrapPromise`1.TrySetFromTask(Task task, Boolean lookForOce)
   at System.Threading.Tasks.UnwrapPromise`1.ProcessInnerTask(Task task)
   at System.Threading.Tasks.UnwrapPromise`1.ProcessCompletedOuterTask(Task task)
   at System.Threading.Tasks.UnwrapPromise`1.InvokeCore(Task completingTask)
   at System.Threading.Tasks.UnwrapPromise`1.Invoke(Task completingTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean bUserDelegateExecuted)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

   at Microsoft.CodeAnalysis.FailFast.DumpStackTrace(Exception exception, String message)
   at Microsoft.CodeAnalysis.FailFast.OnFatalException(Exception exception)
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(Exception exception)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.Report(Exception exception, Action`1 handler)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.ReportAndCatch(Exception exception)
   at Roslyn.Utilities.TaskExtensions.ReportNonFatalErrorWorker(Task task, Object continuationFunction)
   at System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.Tasks.Task.ExecutionContextCallback(Object obj)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued)
   at System.Threading.Tasks.TaskScheduler.TryRunInline(Task task, Boolean taskWasPreviouslyQueued)
   at System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(Task task, Boolean needsProtection)
   at System.Threading.Tasks.StandardTaskContinuation.Run(Task completedTask, Boolean bCanInlineContinuationTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean bUserDelegateExecuted)
   at System.Threading.Tasks.Task`1.TrySetException(Object exceptionObject)
   at System.Threading.Tasks.UnwrapPromise`1.TrySetFromTask(Task task, Boolean lookForOce)
   at System.Threading.Tasks.UnwrapPromise`1.ProcessInnerTask(Task task)
   at System.Threading.Tasks.UnwrapPromise`1.ProcessCompletedOuterTask(Task task)
   at System.Threading.Tasks.UnwrapPromise`1.InvokeCore(Task completingTask)
   at System.Threading.Tasks.UnwrapPromise`1.Invoke(Task completingTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean bUserDelegateExecuted)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

The active test run was aborted. Reason: Test host process crashed
Results File: C:\h\w\B1FE0A07\w\AAE60987\e\Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll.3_net472_x86_test_results.xml

Passed!  - Failed:     0, Passed:   483, Skipped:     0, Total:   483, Duration: 38 s - Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll (net472)
Test Run Aborted with error System.Exception: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Exception: An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer) in System.Net.Sockets.dll:token 0x600011d+0x68
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer) in System.Net.Sockets.dll:token 0x600011d+0x68
   at System.Net.Sockets.NetworkStream.ReadByte() in System.Net.Sockets.dll:token 0x600011e+0x13
   at System.IO.BinaryReader.Read7BitEncodedInt() in System.Private.CoreLib.dll:token 0x6005866+0x0
   at System.IO.BinaryReader.ReadString() in System.Private.CoreLib.dll:token 0x600585c+0xfb
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable() in Microsoft.TestPlatform.CommunicationUtilities.dll:token 0x6000026+0x0
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken) in Microsoft.TestPlatform.CommunicationUtilities.dll:token 0x6000064+0xa0.
Build Kind Start Time
1284007 Rolling 2021-09-08
1288443 Rolling 2021-11-08
1289467 Rolling 2021-12-08
1290779 Rolling 2021-12-08
1293089 Rolling 2021-13-08
1309160 Rolling 2021-20-08
1309486 Rolling 2021-21-08
1310075 Rolling 2021-21-08
1313140 Rolling 2021-23-08
1313896 Rolling 2021-24-08
1329845 Rolling 2021-30-08
1339887 Rolling 2021-03-09
1340048 Rolling 2021-03-09
1341334 Rolling 2021-03-09
1341823 Rolling 2021-03-09
@dibarbet dibarbet added Test Test failures in roslyn-CI Flaky labels Aug 16, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Aug 16, 2021
@dibarbet
Copy link
Member Author

We have no dumps for any of these failures, so I'm first investigating that.

I believe this is due to incorrect configuration when we pass our test runs off to helix.

Example failure - Pipelines - Run 20210812.14 logs (azure.com)
This is an interesting case as we're actually hitting an Environment.FailFast triggered by a failure in ReportNonFatal.  (One of these two paths- WatsonReporter.cs (sourceroslyn.io)).   So we're definitely hitting an Environment.FailFast.  See https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-roslyn-refs-heads-release-dev171-previ2e63555f8ece43b79a/Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll.3/1/console.c94e2288.log?sv=2019-07-07&se=2021-09-01T19%3A39%3A14Z&sr=c&sp=rl&sig=Eic8%2Fu2RWTuZmz0aggZ%2BLSChx5xOERsPU%2F8tga83Iio%3D

This issue I think is that our RunTests project is not correctly configuring the windows dump collection registry key on the helix work item.  The dump collection is configured here - roslyn/Program.cs at 315c2e1 · dotnet/roslyn (github.com) which runs before the code where we pass the job off to helix (roslyn/Program.cs at 315c2e1 · dotnet/roslyn (github.com)​​​​​​​).  
This has two problems

  1. The registry key is not set on the machine running the helix work item.
  2. The path that we would set in the registry key is not necessarily the correct helix path to set.

@dibarbet
Copy link
Member Author

dibarbet commented Aug 16, 2021

After discussing with coreeng, the windows error reporting registry keys should be set on the machines running helix work items. However some other user could have changed these values and wiped them.

Next step:
Add logging to verify if the windows error reporting keys are set, and potentially set them if not set.

If we get failures with no dumps even when set, it likely means windows error reporting is not capturing these failures.

@dibarbet
Copy link
Member Author

dibarbet commented Aug 17, 2021

I made a change to always throw in the NFW path to check dump uploading. Only one work item with Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll failed, other work items on the same tests (but different legs) successfully uploaded dumps.
https://dev.azure.com/dnceng/public/_build/results?buildId=1297372&view=ms.vss-test-web.build-test-results-tab&runId=38256264&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab&resultId=219212

In all the runs however, the registry keys look set correctly.

C:\h\w\B85209B1\w\AF3B096B\e>reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps" 

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps
    DumpType    REG_DWORD    0x2
    DumpFolder    REG_SZ    C:\cores
    DumpCount    REG_DWORD    0x2

I do notice multiple calls to FailFast in the run that failed to upload a dump. Next step is to add some more logging to verify that no dump was produced.

@dibarbet
Copy link
Member Author

Attempting to reproduce the issue locally in #55657

I have successfully managed to crash the test process numerous times without producing a dump file (but it is very inconsistent). In those cases as best as I can tell WER is not attempting to do anything. Though from the windows event viewer the test host process crash is logged

Application: testhost.net472.x86.exe
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: System.Exception: Throwing a random exception in the report non fatal code path
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(Exception exception) in C:\Users\dabarbet\source\repos\roslyn\src\VisualStudio\Core\Def\Implementation\Watson\WatsonReporter.cs:line 130
Stack:
   at System.Environment.FailFast(System.String, System.Exception)
   at Microsoft.CodeAnalysis.FailFast.OnFatalException(System.Exception)
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(System.Exception)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.Report(System.Exception, System.Action`1<System.Exception>)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.ReportAndCatch(System.Exception)
   at Microsoft.CodeAnalysis.Editor.UnitTests.TestNFWThrows+<>c__DisplayClass2_0.<Throws>b__0()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.Tasks.Task.ExecutionContextCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean)
   at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

@dibarbet
Copy link
Member Author

Some notes.

All the debug failures that did not upload dumps show multiple calls to the NFW handling that result in multiple Environment.FailFast calls. I think there is some kind of race where calling Environment.FailFast multiple times causes windows error reporting to not upload a dump.

I don't know exactly how to follow that thread further (likely need to get in touch with runtime? / WER folks). So for now my immediate next path is to see if procdump will capture dumps locally when WER does not. If it does, then I plan to switch the test runs over to proc dump temporarily in hopes of capturing some dumps of the actual CI failures.

@dibarbet
Copy link
Member Author

#55939 potentially fixes the issue with collecting dumps

@dibarbet
Copy link
Member Author

dibarbet commented Sep 3, 2021

Latest dump from debug + non-spanish run here https://dev.azure.com/dnceng/public/_build/results?buildId=1341334&view=ms.vss-test-web.build-test-results-tab&runId=39293838&resultId=216584&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

Looks like calling Environment.StackTrace is throwing an exception, causing the FailFast

Dumping info before call to failfast
Exception info
Thread was being aborted.
   at System.Diagnostics.StackFrameHelper.InitializeSourceInfo(Int32 iSkip, Boolean fNeedFileInfo, Exception exception)
   at System.Diagnostics.StackTrace.CaptureStackTrace(Int32 iSkip, Boolean fNeedFileInfo, Thread targetThread, Exception e)
   at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
   at System.Environment.get_StackTrace()
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonExtensions.SetExtraParameters(IFaultUtility fault, Exception exception, Boolean emptyCallstack) in /_/src/VisualStudio/Core/Def/Implementation/Watson/WatsonExtensions.cs:line 37
   at Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.ReportNonFatal(Exception exception) in /_/src/VisualStudio/Core/Def/Implementation/Watson/WatsonReporter.cs:line 121

@jinujoseph jinujoseph removed the untriaged Issues and PRs which have not yet been triaged by a lead label Sep 7, 2021
@jinujoseph jinujoseph added this to the 17.1 milestone Sep 7, 2021
@dibarbet
Copy link
Member Author

dibarbet commented Nov 9, 2021

this hasn't shown up in runfo in quite a while, closing

@dibarbet dibarbet closed this as completed Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE Flaky Infraswat Test Test failures in roslyn-CI
Projects
None yet
Development

No branches or pull requests

2 participants