Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CollectDumpOnTestSessionHang doesn't produce a dump file #2375

Closed
provegard opened this issue Mar 26, 2020 · 22 comments
Closed

CollectDumpOnTestSessionHang doesn't produce a dump file #2375

provegard opened this issue Mar 26, 2020 · 22 comments
Assignees

Comments

@provegard
Copy link

Description

I'm trying to troubleshoot hanging builds on a CI server. I found this which seems very promising:

https://github.com/microsoft/vstest-docs/blob/master/RFCs/0028-BlameCollector-Hang-Detection.md

However, when I use the hang detector, I don't get a dump file.

Steps to reproduce

The test hangs are intermittent, so they are hard to reproduce.

dotnet vstest is invoked with:

<lots of DLLs> --Parallel --logger:"trx;LogFileName=NUnitTestsCore.trx" --logger:"console;verbosity=minimal" --ResultsDirectory:.../build/test-reports --Settings:...\tmpCF7A.tmp

The settings file is auto generated and contains something like this:

<RunSettings>
  <RunConfiguration>
    <MaxCpuCount>4</MaxCpuCount>
  </RunConfiguration>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="blame" enabled="True">
        <Configuration>
          <ResultsDirectory>...\build</ResultsDirectory>
      	  <CollectDumpOnTestSessionHang TestTimeout="120000" DumpType="full"/>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>    
</RunSettings>

Expected behavior

I expect the hang detector to detect a hang and produce a crash dump file.

Actual behavior

The hang detector did detect a hang after ~2 minutes:

The active test run was aborted. Reason: Test host process crashed
...
Test Run Aborted.
Attachments:
  ...\build\test-reports\4a680b77-23cd-471a-9b82-ead6630865fa\Sequence_af08f6cfd55f4dd5989add68f10ea91f.xml

However, it only produces a sequence file, not a crash dump.

Note that the sequence file ends up in the result directory used on the command line, rather than the results directory in the settings file.

Diagnostic logs

None produced by the above command.

Environment

Windows Server 2012
.NET Core version 3.0.100

@provegard
Copy link
Author

I should add that the sequence file was very helpful, but I was still surprised not to see a crash dump.

@nohwnd
Copy link
Member

nohwnd commented Mar 27, 2020

After quick analysis I found quite a few problems, most of them would be mitigated by logging errors to the client correctly:

  • if the blame results directory does not exist it will not attempt to create it and will fail with error that it does not report the the console. Relevant code should be in StartHangBasedProcessDump in ProcessDumpUtility
System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Users\jajares\source\repos\UnitTestProject60\blame\Sequence_94c02868a34143d9866f6cd1392e90cf.xml'.
  • if the folder exist it does not use it, and instead it uses the test results directory
TpTrace Verbose: 0 : 22140, 4, 2020/03/27, 09:32:50.999, 2575613428195, vstest.console.dll, DataCollectionRequestSender.SendAfterTestRunStartAndGetResult : Received message: (DataCollection.AfterTestRunEndResult) -> [
  {
    "Uri": "datacollector://Microsoft/TestPlatform/Extensions/Blame/v1",
    "DisplayName": "Blame",
    "Attachments": [
      {
        "Description": "",
        "Uri": "file://C:/Users/jajares/source/repos/UnitTestProject60/TestResults/ddf1714c-a2b3-499c-90c2-a5c57993e28c/Sequence_d40ebf81ba11467e95dd761bd9e0b5a7.xml"
      }
    ]
  }
]

  • it fails to collect the proc dump when env variable is not set, but does not report that to the console, it is done in GetProcDumpExecutable and imho should at least try to search PATH
TpTrace Error: 0 : 34124, 10, 2020/03/27, 09:22:48.201, 2569585445775, datacollector.dll, BlameCollector.CollectDumpAndAbortTesthost: Failed with error Microsoft.VisualStudio.TestPlatform.ObjectModel.TestPlatformException: Required environment variable PROCDUMP_PATH was null or empty. Set PROCDUMP_PATH to path of folder containing appropriate procdump executable.
   at Microsoft.TestPlatform.Extensions.BlameDataCollector.ProcessDumpUtility.GetProcDumpExecutable(Int32 processId)
   at Microsoft.TestPlatform.Extensions.BlameDataCollector.ProcessDumpUtility.StartHangBasedProcessDump(Int32 processId, String dumpFileGuid, String testResultsDirectory, Boolean isFullDump)
   at Microsoft.TestPlatform.Extensions.BlameDataCollector.BlameCollector.CollectDumpAndAbortTesthost()
  • Is the ... in your config just you shortening the path? I did not know that you can create ... folder in windows and it will be recursive on it's parent, thanks for breaking my filesystem 😁

I am afraid we won't be able to get to this anytime soon, but it should be reasonably simple to fix, please consider PRing this 🙂

<RunSettings>
  <RunConfiguration>
    <MaxCpuCount>4</MaxCpuCount>
  </RunConfiguration>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="blame" enabled="True">
        <Configuration>
          <ResultsDirectory>blame</ResultsDirectory>
      	  <CollectDumpOnTestSessionHang TestTimeout="5000" DumpType="full"/>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>    
</RunSettings>
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace UnitTestProject60
{
    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void TestMethod1()
        {
		System.Threading.Thread.Sleep(100_000);
        }
    }
}

I am using dotnet test --settings .\settings.runsettings --diag:c:\temp\log.txt and DebugView++ to easily see the logs from both vstestconsole, datacollector and testhost. The path to the log does not matter, it's just the simplest way to enable verbose logging for all components.

@provegard
Copy link
Author

Sorry, ... is just where I replaced irrelevant portions of the paths. :-)

I didn't have PROCDUMP_PATH set, so that's likely the issue. I can't believe I missed that in the documentation.

I'll see if I can create a PR for the problems you found.

@ShreyasRmsft
Copy link
Member

@provegard please do. You can tag me to help out with the review.
It was a pet project of mine but I never got round to polishing it, will help in any way I can.

@provegard
Copy link
Author

@ShreyasRmsft I have trouble building the repo code using build.cmd. I get:

Failed to find VS installation with requirements: Microsoft.Component.MSBuild Microsoft.Net.Component.4.6.TargetingPack Microsoft.VisualStudio.Component.VSSDK

I follow these contribution guidelines

I have .NET 4.6.1 and 4.7. I cannot install 4.6.2 because it says there's a newer one installed already (4.7, obviously). The doc says "4.6.2 or higher", so that should be fine.

I have installed the two MSIs for the 4.6 targeting pack.

Any ideas?

@ShreyasRmsft
Copy link
Member

@provegard do you have VS2017 or VS2019? I remember someone else facing issues with 2019, the build scripts are slightly outdated.

If you are on VS2019, try getting VS2017.

But before that also try setting up 4.6.2 developer pack from https://dotnet.microsoft.com/download/dotnet-framework/net462.

Also one other thing to try is to install the 4.6.2 sdk from visual studio installer itself and maybe remove the MSIs.

@provegard
Copy link
Author

Problem solved, the part about 4.6 targeting pack was a red herring. Installing VS extension development support in VS2017 did it.

@ShreyasRmsft ShreyasRmsft self-assigned this Apr 8, 2020
@provegard
Copy link
Author

The unit tests pass. Smoke and platform tests fail (the platform tests actually hang).

Also, when I open the solution in VS2017, I get:

The current .NET SDK does not support targeting .NET Standard 2.0. Either target .NET Standard 1.6 or lower, or use a version of the .NET SDK that supports .NET Standard 2.0.

I have .NET 4.6.1, 4.6.2, 4.7, 4.7.2 as well as long range of .NET Core SDKs:

2.1.401 [C:\Program Files\dotnet\sdk]
2.1.402 [C:\Program Files\dotnet\sdk]
2.1.500 [C:\Program Files\dotnet\sdk]
2.1.506 [C:\Program Files\dotnet\sdk]
2.1.508 [C:\Program Files\dotnet\sdk]
2.1.509 [C:\Program Files\dotnet\sdk]
2.1.512 [C:\Program Files\dotnet\sdk]
2.1.801 [C:\Program Files\dotnet\sdk]
2.1.802 [C:\Program Files\dotnet\sdk]
2.2.106 [C:\Program Files\dotnet\sdk]
2.2.108 [C:\Program Files\dotnet\sdk]
2.2.110 [C:\Program Files\dotnet\sdk]
2.2.401 [C:\Program Files\dotnet\sdk]
2.2.402 [C:\Program Files\dotnet\sdk]
3.0.100 [C:\Program Files\dotnet\sdk]
3.1.100 [C:\Program Files\dotnet\sdk]
3.1.201 [C:\Program Files\dotnet\sdk]

Any ideas?

@provegard
Copy link
Author

Example of smoke test failure:

  X RunAllTestExecution [1s 531ms]
  Error Message:
   Assert.IsTrue failed. Test SampleUnitTestProject.UnitTest1.PassingTest does not appear in passed tests list.
  Stack Trace:
     at Microsoft.TestPlatform.TestUtilities.IntegrationTestBase.ValidatePassedTests(String[] passedTests) in C:\kod\projects\vstest\test\Microsoft.TestPlatform.TestUtilities\IntegrationTestBase.cs:line 265
   at Microsoft.TestPlatform.SmokeTests.ExecutionTests.RunAllTestExecution() in C:\kod\projects\vstest\test\Microsoft.TestPlatform.SmokeTests\ExecutionTests.cs:line 18

Another:

... .. . Failed tests:
... .. .. . 1. Microsoft.TestPlatform.SmokeTests.DiscoveryTests.DiscoverAllTests
... .. .. .. .ErrorMessage:
Assert.IsTrue failed. Test SampleUnitTestProject.UnitTest1.PassingTest does not appear in discovered tests list.
Std Output:
Std Error:  Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.VisualStudio.CodeCoverage.Shim, Version=15.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified. at Microsoft.VisualStudio.TestPlatform.CommandLine.Program.Main(String[] args)
... .. .. .. .StackTrace:
   at Microsoft.TestPlatform.TestUtilities.IntegrationTestBase.ValidateDiscoveredTests(String[] discoveredTestsList) in C:\kod\projects\vstest\test\Microsoft.TestPlatform.TestUtilities\IntegrationTestBase.cs:line 315
   at Microsoft.TestPlatform.SmokeTests.DiscoveryTests.DiscoverAllTests() in C:\kod\projects\vstest\test\Microsoft.TestPlatform.SmokeTests\DiscoveryTests.cs:line 17

Are they environment-specific? I'm on Windows as noted before.

@provegard
Copy link
Author

Platform test failures:

  X InitializeShouldSubscribeToDataCollectionEvents [79ms]
  Error Message:
   Test method Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests.EventLogDataCollectorTests.InitializeShouldSubscribeToDataCollectionEvents threw exception:
System.NullReferenceException: Object reference not set to an instance of an object.
  Stack Trace:
      at Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests.TestableDataCollectionEvents.GetTestHostLaunchedInvocationList() in C:\kod\projects\vstest\test\DataCollectors\Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests\EventLogDataCollectorTests.cs:line 456
   at Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests.EventLogDataCollectorTests.InitializeShouldSubscribeToDataCollectionEvents() in C:\kod\projects\vstest\test\DataCollectors\Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests\EventLogDataCollectorTests.cs:line 221

And this one is fun:


  X TestCaseSerialize [472ms]
  Error Message:
   Test method Microsoft.TestPlatform.PerformanceTests.ProtocolV1Tests.TestCaseSerialize threw exception:
System.IO.DirectoryNotFoundException: Could not find a part of the path 'E:\ProtocolPerf.txt'.
  Stack Trace:
      at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.StreamWriter.CreateFile(String path, Boolean append, Boolean checkHost)
   at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding, Int32 bufferSize, Boolean checkHost)
   at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding)
   at System.IO.File.InternalAppendAllText(String path, String contents, Encoding encoding)
   at System.IO.File.AppendAllText(String path, String contents)
   at Microsoft.TestPlatform.PerformanceTests.ProtocolV1Tests.VerifyPerformanceResult(String scenario, Int64 expectedElapsedTime, Int64 elapsedTime) in C:\kod\projects\vstest\test\Microsoft.TestPlatform.PerformanceTests\ProtocolV1Tests.cs:line 121
   at Microsoft.TestPlatform.PerformanceTests.ProtocolV1Tests.TestCaseSerialize() in C:\kod\projects\vstest\test\Microsoft.TestPlatform.PerformanceTests\ProtocolV1Tests.cs:line 58

I need to mount another drive letter it seems. :-)

@provegard
Copy link
Author

I'm not trying to be annoying here, just wondering if you (maintainers/collaborators) are seeing these test errors as well?

@ShreyasRmsft
Copy link
Member

I think you need VS enterprise, some of these dlls are only shipped on the enterprise version like "Microsoft.VisualStudio.CodeCoverage.Shim".

Hehe these are pretty outdated, go ahead with only the UTs locally. The acceptance and smoke tests will get validated on the CI. Plus I don't think blame data collector has any E2E tests.

@provegard
Copy link
Author

Still having bad luck with VS. Lots of errors. It seems I need to have .NET 4.5.1 installed as well to get things to compile. I feel that my disk is slowly filling up with all possible .NET versions. :-D

@ShreyasRmsft
Copy link
Member

@nohwnd did you end up facing any of these initial setup issues when you first cloned the repo? I never encountered these because my VS was already pretty much loaded with all the possible skus and extensions.

@nohwnd
Copy link
Member

nohwnd commented Apr 8, 2020

@provegard sorry, been busy with release. Hope you did not give up. I am installing VS Community to see if I can build. I am pretty sure I was able to do it before.

@nohwnd
Copy link
Member

nohwnd commented Apr 8, 2020

We are all actually using 2019, sorry. I am sure you need at least these workloads. The Visual Studio Extension development should be optional if you skip the vsix generating step in the script, see below.

MicrosoftTeams-image (2)

And then from the individual components you'd need the Portable Pack and .NET 4.5.1.
image

I almost never run all acceptance tests locally. You should be good to go with just unit tests or at best smoke tests.

I did see the same issues (and more) when joining this project. And never got to go back and update the installation guide. Sorry about that. I will changing our release pipeline a lot, and imho you don't need to build the vsix locally in most cases. You can comment out these steps in the build.ps1 and it should still build. If you need more help ping me on twitter or here, I can spend 15 minutes showing you stuff. :)

image

@provegard
Copy link
Author

Thanks, I'll give it a try!

@nohwnd
Copy link
Member

nohwnd commented Apr 22, 2020

Related #2380

@hvinett
Copy link
Contributor

hvinett commented Jul 16, 2020

@provegard , any update on this issue?

@nohwnd
Copy link
Member

nohwnd commented Jul 16, 2020

Actually there is a lot. In the latest net5.0 release (I think since preview6). We are leveraging the Diagnostics NetCore client to create hang dumps. This works on Windows (with any target framework) and Linux (with netcoreapp3.1 and newer). There is no need for procdump.exe when creating hang dumps, or for the temporary folder.

To trigger a hang dump you can now simply do: dotnet test --blame-hang-timeout 2min or vstest.console /Blame:"CollectHangDump;TestTimeout=2min".

For crash dumps the situation is similar as before, but it errors out a bit better. There you still need procdump, because that flow needs to attach to a running process and detect failure, which is no easy task. But luckily crash dumps are usually way less interesting than hang dumps, because when the process crashes it often has an eay to see reason.

From dotnet test help:

  --blame                                  Runs the tests in blame mode. This option is helpful in isolating problematic tests that cause the test host to crash or hang.
                                           When a crash is detected, it creates an sequence file in TestResults/guid/guid_Sequence.xml that captures the order of tests that were run before the crash.
                                           Based on the additional settings, hang dump or crash dump can also be collected.
                                           Example:
                                           Timeout the test run when test takes more than the default timeout of 1 hour, and collect crash dump when the test host exits unexpectedly.
                                           (Crash dumps require additional setup, see below.)
                                           dotnet test --blame-hang --blame-crash
                                           Example:
                                           Timeout the test run when a test takes more than 20 minutes and collect hang dump.
                                           dotnet test --blame-hang-timeout 20min
  --blame-crash                            Runs the tests in blame mode and enables collecting crash dump when testhost exits unexpectedly.
                                           This option is currently only supported on Windows, and requires procdump.exe and procdump64.exe to be available in PATH.
                                           Or PROCDUMP_PATH environment variable to be set, and point to a directory that contains procdump.exe and procdump64.exe.
                                           The tools can be downloaded here: https://docs.microsoft.com/en-us/sysinternals/downloads/procdump
                                           Implies --blame.
  --blame-crash-dump-type <DUMP_TYPE>      The type of crash dump to be collected. Implies --blame-crash.
  --blame-crash-collect-always             Enables collecting crash dump on expected as well as unexpected testhost exit.
  --blame-hang                             Run the tests in blame mode and enables collecting hang dump when test exceeds the given timeout. Implies --blame-hang.
  --blame-hang-dump-type <DUMP_TYPE>       The type of crash dump to be collected. When None, is used then test host is terminated on timeout, but no dump is collected. Implies --blame-hang.
  --blame-hang-timeout <TIMESPAN>          Per-test timeout, after which hang dump is triggered and the testhost process is terminated.
                                           The timeout value is specified in the following format: 1.5h / 90m / 5400s / 5400000ms. When no unit is used (e.g. 5400000), the value is assumed to be in milliseconds.
                                           When used together with data driven tests, the timeout behavior depends on the test adapter used. For xUnit and NUnit the timeout is renewed after every test case,
                                           For MSTest, the timeout is used for all testcases.
                                           This option is currently supported only on Windows together with netcoreapp2.1 and newer. And on Linux with netcoreapp3.1 and newer. OSX and UWP are not supported.

@provegard
Copy link
Author

@hvinett sorry, this hasn't been a priority for me and given all the problems I had with the setup, I put it aside.

But judging from @nohwnd's answer, anything I could do would be pointless anyway. :)

@nohwnd
Copy link
Member

nohwnd commented Sep 10, 2021

The additional options were shipped long time ago. There is still dependency on procdump for some of the workflows (most importantly collecting crash dump on windows).

It is very complicated internally, because we are dealing with multiple platforms, approaches to dumping, that not always work and manually deploying procdump. For azdo agents, the easiest way to deploy procdump is using chocolatey. It will install it in PATH and vstest will pick it up.

For diagnosing issues with dumps, I highly suggest you enable the diagnostic logs, e.g --diag:logs\log.txt and inspect the datacollector log that was produced. Search it for "dump", it prints output from procdump, reports when it was not found, etc.

@nohwnd nohwnd closed this as completed Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants