-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporarily use procdump to capture testhost dumps on helix #55657
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
a8fbd76
to
dc4357a
Compare
0a871e9
to
2407e0c
Compare
// This should be removed once we have enough dumps to investigate the above issue. | ||
if (!string.IsNullOrEmpty(options.ProcDumpFilePath)) | ||
{ | ||
ConsoleUtil.WriteLine($"Copying procdump files from {options.ProcDumpFilePath} to {duplicateDir}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was the most expedient place to put this. The actual test run step runs build.ps1 which will download procdump (doesn't necessarily happen in prepare tests). Since this is temporary I think its fine to copy the files to the correlation payload here.
@@ -75,6 +75,8 @@ void MaybeAddSeparator(char separator = '|') | |||
builder.AppendFormat($@" --logger {sep}html;LogFileName={GetResultsFilePath(assemblyInfo, "html")}{sep}"); | |||
} | |||
|
|||
builder.Append(" --blame"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this output useful when running local tests as it outputs more data on the test that was running when the testhost crashed
See https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-test#options
The other flags like --blame-crash
claim to produce a dump, but similar to WER never for these failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Let's just keep an eye out for possible regressions in test run time/artifact upload time.
I saw a lot of what looked like procdump output in this log: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-roslyn-refs-pull-55657-merge-60762369fadd42aebe/Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll.1/1/console.5894deb5.log?sv=2019-07-07&se=2021-09-14T02%3A20%3A43Z&sr=c&sp=rl&sig=knq4d%2F3O8sRZtpuLdRG5N7YwUEF5g1b%2FhmiXbrlB4mQ%3D Does this also happen in a successful run? Should we divert some of this output elsewhere? |
Yes, unfortunately the only way to capture dumps for this failure is if procdump records a dump whenever the process exits (including successful runs). Hence the reason to limit to only that specific dll. I find the procdump output useful as well as it also shows exceptions the process is hitting and that it actually recorded the dump. And the output isn't generally longer than the one you linked. |
@@ -160,6 +178,17 @@ string makeHelixWorkItemProject(AssemblyInfo assemblyInfo) | |||
var rehydrateCommand = isUnix ? $"./{rehydrateFilename}" : $@"call .\{rehydrateFilename}"; | |||
var setRollforward = $"{(isUnix ? "export" : "set")} DOTNET_ROLL_FORWARD=LatestMajor"; | |||
var setPrereleaseRollforward = $"{(isUnix ? "export" : "set")} DOTNET_ROLL_FORWARD_TO_PRERELEASE=1"; | |||
|
|||
var procDumpCommand = @"echo ""Skip"""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this message slightly more detailed, for example "Skipping procdump collection."
// to capture dumps on test host exit. Typically WER is good enough, | ||
// but https://github.com/dotnet/roslyn/issues/55639 requires procdump dumps. | ||
// This should be removed once we have enough dumps to investigate the above issue. | ||
if (!string.IsNullOrEmpty(options.ProcDumpFilePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as this step is temporary, I think it's ok to do this here. If it were permanent, I would prefer for copy of this tool to happen as part of the build of the relevant test projects, and to happen automatically as part of running tests on the relevant project from as many environments as possible--not just in the Helix environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only plan to do this temporarily as WER is generally preferred over procdump for stability.
if (!isUnix && Equals("Microsoft.CodeAnalysis.EditorFeatures.UnitTests.dll", assemblyInfo.AssemblyName)) | ||
{ | ||
ConsoleUtil.WriteLine("Enabling procdump collection for this work item."); | ||
procDumpCommand = @$"start /b ""ProcDump"" %HELIX_CORRELATION_PAYLOAD%\procdump.exe /accepteula -ma -w -t -e testhost ""C:\cores"""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's C:\cores
? I think Helix provides us with a path where we can write out artifacts that we want uploaded after the work item completes. Are we using that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c:\cores is the directory that WER is configured to store dumps in, was just an expedient place to get the dumps automatically uploaded.
Done with review pass (iteration 19). A general tip for making changes in this area: you might find it helpful to push a branch to the dotnet/roslyn remote and manually kick off CI for it to verify your changes. Unlike launching CI runs via PR, you can test multiple different modifications concurrently without each one canceling the ones before it. |
Talked with Sam offline - based on his advice I will instead manually run these tests a bunch of times with procdump rather than introduce more instability into main caused via procdump itself. |
Also investigating locally adding synchronization to the failfast calls |
not needed, replaced by #55939 |
Normally, dumps on helix are captured by windows error reporting (configured by registry values on helix machines) and output as attachments to the helix work item.
However, WER is seemingly unable to capture dumps for failures listed in #55639 (more details on the issue).
Only procdump configured to record dumps on process exit seems to consistently produce dumps. This change temporarily introduces procdump to the EditorFeatures.UnitTests work items to capture dumps on exit so that we can investigate the test failures. This should be removed once we have dumps.
Only need to review the current changes, I will do a squash merge on this.
CI run with dumps -https://dev.azure.com/dnceng/public/_build/results?buildId=1316212&view=ms.vss-test-web.build-test-results-tab&runId=38724164&resultId=215635&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab