Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily use procdump to capture testhost dumps on helix #55657

Closed
wants to merge 19 commits into from

Conversation

dibarbet
Copy link
Member

@dibarbet dibarbet commented Aug 17, 2021

Normally, dumps on helix are captured by windows error reporting (configured by registry values on helix machines) and output as attachments to the helix work item.

However, WER is seemingly unable to capture dumps for failures listed in #55639 (more details on the issue).
Only procdump configured to record dumps on process exit seems to consistently produce dumps. This change temporarily introduces procdump to the EditorFeatures.UnitTests work items to capture dumps on exit so that we can investigate the test failures. This should be removed once we have dumps.

Only need to review the current changes, I will do a squash merge on this.

CI run with dumps -https://dev.azure.com/dnceng/public/_build/results?buildId=1316212&view=ms.vss-test-web.build-test-results-tab&runId=38724164&resultId=215635&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

@dibarbet

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@dibarbet

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@dibarbet dibarbet changed the title testing crash in error reporting Temporarily use procdump to capture testhost dumps on helix Aug 25, 2021
@dibarbet dibarbet marked this pull request as ready for review August 25, 2021 19:05
@dibarbet dibarbet requested a review from a team as a code owner August 25, 2021 19:05
// This should be removed once we have enough dumps to investigate the above issue.
if (!string.IsNullOrEmpty(options.ProcDumpFilePath))
{
ConsoleUtil.WriteLine($"Copying procdump files from {options.ProcDumpFilePath} to {duplicateDir}");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the most expedient place to put this. The actual test run step runs build.ps1 which will download procdump (doesn't necessarily happen in prepare tests). Since this is temporary I think its fine to copy the files to the correlation payload here.

@@ -75,6 +75,8 @@ void MaybeAddSeparator(char separator = '|')
builder.AppendFormat($@" --logger {sep}html;LogFileName={GetResultsFilePath(assemblyInfo, "html")}{sep}");
}

builder.Append(" --blame");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this output useful when running local tests as it outputs more data on the test that was running when the testhost crashed
See https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-test#options

The other flags like --blame-crash claim to produce a dump, but similar to WER never for these failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's just keep an eye out for possible regressions in test run time/artifact upload time.

@RikkiGibson
Copy link
Contributor

@dibarbet
Copy link
Member Author

dibarbet commented Aug 25, 2021

Does this also happen in a successful run? Should we divert some of this output elsewhere?

Yes, unfortunately the only way to capture dumps for this failure is if procdump records a dump whenever the process exits (including successful runs). Hence the reason to limit to only that specific dll. I find the procdump output useful as well as it also shows exceptions the process is hitting and that it actually recorded the dump. And the output isn't generally longer than the one you linked.

@RikkiGibson RikkiGibson self-assigned this Aug 25, 2021
@@ -160,6 +178,17 @@ string makeHelixWorkItemProject(AssemblyInfo assemblyInfo)
var rehydrateCommand = isUnix ? $"./{rehydrateFilename}" : $@"call .\{rehydrateFilename}";
var setRollforward = $"{(isUnix ? "export" : "set")} DOTNET_ROLL_FORWARD=LatestMajor";
var setPrereleaseRollforward = $"{(isUnix ? "export" : "set")} DOTNET_ROLL_FORWARD_TO_PRERELEASE=1";

var procDumpCommand = @"echo ""Skip""";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this message slightly more detailed, for example "Skipping procdump collection."

// to capture dumps on test host exit. Typically WER is good enough,
// but https://github.com/dotnet/roslyn/issues/55639 requires procdump dumps.
// This should be removed once we have enough dumps to investigate the above issue.
if (!string.IsNullOrEmpty(options.ProcDumpFilePath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this step is temporary, I think it's ok to do this here. If it were permanent, I would prefer for copy of this tool to happen as part of the build of the relevant test projects, and to happen automatically as part of running tests on the relevant project from as many environments as possible--not just in the Helix environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only plan to do this temporarily as WER is generally preferred over procdump for stability.

@sharwell sharwell marked this pull request as draft August 25, 2021 22:16
if (!isUnix && Equals("Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll", assemblyInfo.AssemblyName))
{
ConsoleUtil.WriteLine("Enabling procdump collection for this work item.");
procDumpCommand = @$"start /b ""ProcDump"" %HELIX_CORRELATION_PAYLOAD%\procdump.exe /accepteula -ma -w -t -e testhost ""C:\cores""";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's C:\cores? I think Helix provides us with a path where we can write out artifacts that we want uploaded after the work item completes. Are we using that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c:\cores is the directory that WER is configured to store dumps in, was just an expedient place to get the dumps automatically uploaded.

@RikkiGibson
Copy link
Contributor

Done with review pass (iteration 19).

A general tip for making changes in this area: you might find it helpful to push a branch to the dotnet/roslyn remote and manually kick off CI for it to verify your changes. Unlike launching CI runs via PR, you can test multiple different modifications concurrently without each one canceling the ones before it.

@dibarbet
Copy link
Member Author

Talked with Sam offline - based on his advice I will instead manually run these tests a bunch of times with procdump rather than introduce more instability into main caused via procdump itself.
@RikkiGibson I'll take your advice there and just queue them up manually from a branch. instead of triggering /azp run a bunch :)

@dibarbet
Copy link
Member Author

Also investigating locally adding synchronization to the failfast calls

@dibarbet
Copy link
Member Author

not needed, replaced by #55939

@dibarbet dibarbet closed this Aug 31, 2021
@dibarbet dibarbet deleted the helix_dump_logging branch August 31, 2021 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants