Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test host crashes at cleanup with COMException #4222

Closed
kunom opened this issue Dec 23, 2022 · 8 comments · Fixed by #4291
Closed

Test host crashes at cleanup with COMException #4222

kunom opened this issue Dec 23, 2022 · 8 comments · Fixed by #4291
Labels

Comments

@kunom
Copy link

kunom commented Dec 23, 2022

On our build agents, we occasionally observe red test execution with all test cases green. It seems to be that the crash happens at test cleanup:

Unhandled exception. System.Runtime.InteropServices.COMException (0x80070006): The handle is invalid. (0x80070006 (E_HANDLE))
at System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(Int32 errorCode)
at Interop.Kernel32.ProcessWaitHandle..ctor(SafeProcessHandle processHandle)
at System.Diagnostics.Process.UpdateHasExited()
at System.Diagnostics.Process.get_HasExited()
at Microsoft.VisualStudio.TestPlatform.PlatformAbstractions.ProcessHelper.TryGetExitCode(Object process, Int32& exitCode)
at Microsoft.TestPlatform.TestHostProvider.Hosting.TestHostManagerCallbacks.ExitCallBack(IProcessHelper processHelper, Object process, StringBuilder testHostProcessStdError, Action`1 onHostExited)
at Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.Hosting.DotnetTestHostManager.<get_ExitCallBack>b__37_0(Object process)
at Microsoft.VisualStudio.TestPlatform.PlatformAbstractions.ProcessHelper.<>c__DisplayClass1_0.<<LaunchProcess>b__3>d.MoveNext()
--- End of stack trace from previous location ---
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()

The crucial part here is that Process.get_HasExited is documented to crash with InvalidOperationException, Win32Exception or NotSupportedException, none of which includes the COMException.

I wonder whether there is a missing call to Process.WaitForExit() as documented in the same location for asynchronous event handlers, but in general I have no clue what is going on. I tend to think that there is a race condition in the Process class:

image

This issue occurred 3 times in the last 600 analyzed runs.

@kunom kunom changed the title Test host crashes at cleanup with COMExeption Test host crashes at cleanup with COMException Dec 23, 2022
@Evangelink
Copy link
Member

Hi @kunom, thank you for the analysis and the report. We will investigate the issue.

@beinjamin
Copy link

hi

@MarcoRossignoli
Copy link
Contributor

@kunom how are you running test in your agent?
@VSTest2 task, dotnet test, vstest.console.exe or?

@kunom
Copy link
Author

kunom commented Jan 11, 2023

@MarcoRossignoli Sorry the slow feedback, was in holiday.

At least some of the red tests runs are executed with

- task: AzureCLI@2
  displayName: "dotnet test (solution) with azure connection"
  inputs:
    # Referencing the 'azureSubscription' within this 'AzureCLI' task implictly enables us to make use of the existing service connection from Azure DevOps to Azure.
    # This allows us to authorize ourself via new DefaultAzureCredential() within our integration tests against Azure resources.
    azureSubscription: ${{ parameters.azureSubscription }}
    scriptType: pscore
    scriptLocation: inlineScript
    inlineScript: dotnet test ${{parameters.solutionPath}} --configuration ${{parameters.buildConfiguration}} ${{parameters.appliedTestFilter}} --no-restore --no-build --logger trx --collect:"XPlat Code Coverage" --results-directory $(Agent.TempDirectory) --  DataCollectionRunSettings.DataCollectors.DataCollector.Configuration.Format=opencover DataCollectionRunSettings.DataCollectors.DataCollector.Configuration.Exclude=[*Tests?]*
  env:
    ${{ each parameter in parameters.testEnvironmentVariables }}:
      ${{ parameter.Key }}: ${{ parameter.Value }}

Other occurances are with

- task: DotNetCoreCLI@2
  displayName: "dotnet test (solution)"
  inputs:
    command: test
    arguments: '--configuration $(BuildConfiguration) $(excludeFlakyAndQuarantined) --no-restore --no-build --collect:"XPlat Code Coverage" --  DataCollectionRunSettings.DataCollectors.DataCollector.Configuration.Format=opencover DataCollectionRunSettings.DataCollectors.DataCollector.Configuration.Exclude=[*Tests?]*'
    projects: $(Build.SourcesDirectory)/NexusPlatform/Nexus.sln
  env:
    TESTING_ASB_CONNINFO: $(TESTING_ASB_CONNINFO) # secret variables must be explicitly mapped
    TESTING_SLOWDOWNFACTOR: $(TESTING_SLOWDOWNFACTOR)

@kunom
Copy link
Author

kunom commented Feb 7, 2023

@MarcoRossignoli Do you need some more information? The issue keeps occurring in our pipelines, and I tend to write a retry loop to get around the issue.

@MarcoRossignoli
Copy link
Contributor

I don't need any other information thanks, I think it's a race condition where we try to get information from host process already closed.
I'll try to find a good solution to avoid it. We don't have any other case like that for what I know so I think that's the race is really specific to your use case.

@MarcoRossignoli
Copy link
Contributor

MarcoRossignoli commented Feb 8, 2023

@kunom I did a small fix to avoid the crash, anyway we will return -1 so you see failed tests in this case, we cannot access to the real exit value of the test host so we cannot be sure that the host didn't failed some tests.
It should be shipped into 17.6

@kunom
Copy link
Author

kunom commented Feb 9, 2023

@MarcoRossignoli Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants