Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet test is hanging after updating xUnit and test platform #10355

Open
jaredpar opened this issue Sep 17, 2024 · 7 comments
Open

dotnet test is hanging after updating xUnit and test platform #10355

jaredpar opened this issue Sep 17, 2024 · 7 comments

Comments

@jaredpar
Copy link
Member

jaredpar commented Sep 17, 2024

Description

dotnet test is timing out after roslyn [pgraded it's xUnit references][roslyn-pr]

Steps to reproduce

  1. Clone https://github.com/dotnet/roslyn
  2. Sync to commit 7a3a133247f22f710d06c5bbaa0c23375462383c
  3. Build Roslyn.sln
  4. Run `dotnet test artifacts\bin\Microsoft.CodeAnalysis.LanguageServer.UnitTests\Debug\net8.0\Microsoft.CodeAnalysis.LanguageServer.UnitTests.dll

This is only happening in this project. Our other projects are executing just fine.

Expected behavior

The tests run.

Actual behavior

VSTest version 17.12.0-preview-24353-03 (x64)                                                                                         
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Failed to negotiate protocol, waiting for response timed out after 90 seconds. This may occur due to machine slowness, please set environment variable VSTEST_CONNECTION_TIMEOUT to increase timeout.

Test Run Aborted.

Diagnostic logs

After attaching to the testhost process I see the following exception:

> System.MissingMethodException: 'Method not found: 'Void Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.Interfaces.ICommunicationChannel.add_MessageReceived(System.EventHandler`1<Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.Interfaces.MessageReceivedEventArgs>)'.'

With the stack trace

	System.Private.CoreLib.dll!System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(object obj, System.Span<object> copyOfArgs, System.Reflection.BindingFlags invokeAttr) Line 178	C#
 	System.Private.CoreLib.dll!System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, object[] parameters, System.Globalization.CultureInfo culture) Line 145	C#
 	System.Private.CoreLib.dll!System.Delegate.DynamicInvokeImpl(object[] args) Line 85	C#
 	Microsoft.TestPlatform.CoreUtilities.dll!Microsoft.VisualStudio.TestPlatform.Utilities.MulticastDelegateUtilities.SafeInvoke(System.Delegate delegates, object sender, object args, string traceDisplayName) Line 59	C#
 	Microsoft.TestPlatform.CoreUtilities.dll!Microsoft.VisualStudio.TestPlatform.Utilities.MulticastDelegateUtilities.SafeInvoke(System.Delegate delegates, object sender, System.EventArgs args, string traceDisplayName) Line 29	C#
 	Microsoft.TestPlatform.CommunicationUtilities.dll!Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.SocketClient.OnServerConnected(System.Threading.Tasks.Task connectAsyncTask) Line 94	C#
 	System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread threadPoolThread, System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 264	C#
 	System.Private.CoreLib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot, System.Threading.Thread threadPoolThread) Line 2349	C#
 	System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 989	C#
 	System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() Line 102	C#

Errata

Getting to this stack trace was challenging and took hours of debugging and trying different ideas out. After putting dotnet test under a debugger it became clear that testhost is the process that was actually responsible for the problem. But testhost cannot be run as an independent process because if you don't specify items like parentid it will crash on startup. That meant there was no reasonable way to debug it. If I started dotnet test and attached to testhost when it started the exception had already happened. I spent significant time reading through the process options, generating diagnostics and none had any leads.

Eventually I reached out in a internal channel and someone told me about setting %VSTEST_HOST_DEBUG%=1 to enable child process debugging. I feel like that should not be a hidden environment variable but rather a first class option on dotnet test. As someone who just went through this I really feel like there needs to be some breadcrumb in the dotnet test arguments that show you a reasonable path to debugging the testhost process. Maybe even a first class switch like --debug-testhost. Otherwise when it goes wrong there is no obvious way to debug it.

@jaredpar
Copy link
Member Author

After a hint from @nohwnd I was able to figure out what is going on here. Given the message there was likely a tearing issue between DLLs here so I grabbed a binlog of the build of this test and noticed this:

image

Parts of test platform are coming from 17.10 and others from 17.5. Changed around my references so that everything was 17.5 and the problem went away.

This does raise the question though of why is this allowed? It seems there is an API dependency between these layers that is subject to tearing when the versions differ. Given that why don't we error or force exact NuGet matches here so that the tearing can't happen?

I feel like the path that lead us here is one that customers are likely to follow. NuGet audit warnings lead us to upgrade our xunit references. Or incentive here is to upgrade to latest packages and that meant moving to xunit.runner.visualstudio 2.8.2. That in turn required us to move our Microsoft.TestPlatform.ObjectModel reference to 17.10.0. After that build succeeded and the majority of our tests ran. But our Microsoft.NET.Test.Sdk reference holds us back to 17.5.0 for testhost and now we have the tearing problem.

image

@nohwnd
Copy link
Member

nohwnd commented Sep 17, 2024

Translation Layer is VisualStudio client this package is not normally referenced by test projects. In roslyn it is being used in LanguageServer and version is determined by <MicrosoftTestPlatformVersion>17.10.0 which in your commit is out of sync with <MicrosoftNETTestSdkVersion>17.5.0 that arcade defines.

The problem here is that the test.sdk is outdated (from Feb 2023), and the update brings TestPlatform.ObjectModel CommunicationUtilities that is too new. Testhost then fails to run because of api mismatch, and Nuget won't warn us because it allows updating packages, and setting maximum version is not very usual.

In ideal case arcade should force you to update to newer or newest test.sdk, the same way package analyzer pushes you to newer xunit.

Unfortunately we cannot force hard dependencies between test platform and xunit versions to force users to update.

Once you update also test.sdk your tests will run:

S:\c\roslyn [(7a3a133...) +2 ~1 -0 !]> dotnet test artifacts\bin\Microsoft.CodeAnalysis.LanguageServer.UnitTests\Debug\net8.0\Microsoft.CodeAnalysis.LanguageServer.UnitTests.dll --diag:logs2\log.txt
VSTest version 17.12.0-preview-24353-03 (x64)

Starting test execution, please wait...
Logging Vstest Diagnostics in file: S:\c\roslyn\logs2\log.txt
A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:    11, Skipped:     0, Total:    11, Duration: 6 s - Microsoft.CodeAnalysis.LanguageServer.UnitTests.dll (net8.0)

In our new testing platform that xunit is onboarding to the version to use is defined by the dependency that xunit defines, and there is no additional "test.sdk" that needs to be kept in sync. So hopefully this situation will eventually get better.


I agree that testhost is hard to debug, but first thing we would ask you to do is to run with diag logs: --diag:logs\log.txt, there you can see the exception on the top:

TpTrace Error: 0 : 61288, 9, 2024/09/17, 20:16:21.797, 5254755191848, testhost.dll, MulticastDelegateUtilities.SafeInvoke: 1: Invoking callback 1/Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler for .SocketClient: ServerConnected, failed after 1 ms with: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.MissingMethodException: Method not found: 'Void Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.Interfaces.ICommunicationChannel.add_MessageReceived(System.EventHandler`1<Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.Interfaces.MessageReceivedEventArgs>)'.
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.<InitializeCommunication>b__32_0(Object sender, ConnectedEventArgs connectedArgs)
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
   --- End of inner exception stack trace ---
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Delegate.DynamicInvokeImpl(Object[] args)
   at Microsoft.VisualStudio.TestPlatform.Utilities.MulticastDelegateUtilities.SafeInvoke(Delegate delegates, Object sender, Object args, String traceDisplayName) in /_/src/Microsoft.TestPlatform.CoreUtilities/Utilities/MulticastDelegateUtilities.cs:line 59.
TpTrace Verbose: 0 : 61288, 9, 2024/09/17, 20:16:21.797, 5254755192965, testhost.dll, Connected to server, and starting MessageLoopAsync

Followed by connection loop and then

TpTrace Error: 0 : 47248, 5, 2024/09/17, 20:17:51.804, 5255655259532, vstest.console.dll, ProxyExecutionManager.StartTestRun: Failed to start test run: Microsoft.VisualStudio.TestPlatform.ObjectModel.TestPlatformException: Failed to negotiate protocol, waiting for response timed out after 90 seconds. This may occur due to machine slowness, please set environment variable VSTEST_CONNECTION_TIMEOUT to increase timeout.

In the main (vstest.console) log.

Meaning that we did not get through the initial protocol negotiation.

@nohwnd
Copy link
Member

nohwnd commented Sep 17, 2024

I feel like the path that lead us here is one that customers are likely to follow.

By the look of the error, and seeing that only LanguageServer project failed, this is most likely caused by CommunicationUtilities library mismatch, rather than ObjectModel mismatch. Xunit does not depend on CommunicationUtilities, but TranslationLayer does. BUT translation layer is not commonly used by anyone except for VS, VSCode, Stryker and other IDE integration authors, so this is not a common scenario.

@jaredpar
Copy link
Member Author

Unfortunately we cannot force hard dependencies between test platform and xunit versions to force users to update.

Understood. But you can force hard dependencies between your own packages. Then if customers pick versions of xUnit and test platform that break this dependencies there will be a warning at restore time.

@nohwnd
Copy link
Member

nohwnd commented Sep 17, 2024

The break happened because testhost and translation layer both ship communicationUtilities dll in the nuget, and translation layer overwrote the testhost dll with a newer version. So we would have to say: install only when other package is not installed at all or is installed with the same version.

@jaredpar
Copy link
Member Author

Why are they shipping private dependencies this way vs. making that a NuPkg they can both reference?

@nohwnd
Copy link
Member

nohwnd commented Sep 23, 2024

They each normally live in a separate process, and we don't want others to take dependencies on the library that facilitates communication between the processes. Also because it has been done like that since forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants