-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test hang investigation support #3094
Comments
Bonus would be if we could dump all stacks to the console when the system grabs the dump. (This is what we've historically done for crashes of course). It's a lot of text, but then hangs are bad 😈 This can be really useful if it's an issue we're already investigating - saves pulling down another dump and cracking it when it's not new. There are some extra SOS commands that are likely to be worth running on a hang, also. Eg., possibly related to in flight async operations. |
From @davidfowl - ASP.NET Core has something for xunit https://github.com/aspnet/AspNetCore/blob/730db07d2905fdc536073c064dab321c9616d25a/src/Servers/Kestrel/test/FunctionalTests/UnixDomainSocketsTests.cs#L31 but it's per test and used for certain failures. I've been thinking more and more about ways to integrate the test runner experience in visual studio with CI system running tests in order to look at live progress and view things like "what tests are currently being executed" as it's too much of a black box today. From @sharwell - We still use it. It’s part of the RunTests tool. From @tmat - Right, Roslyn has its own custom test runner that provides this functionality and also optimized scheduling. I did talk to the VSTest team (see attached thread) about the reasons we currently need custom test runners in Roslyn and in Arcade SDK. From @singhsarab - Here are key takeaways: |
@tmat @markwilkie We have added the support for the html logger in the VS Test platform. Talking about diagnostics, and flaky behaviour like crash and hang, we have a feature called blame, which can be used to identify the problematic tests and also take dumps for such scenarios. |
@singhsarab With xunit we have the ability to enable diagnostic message and set a timeout for long running tests detection (https://xunit.net/docs/configuration-files#longRunningTestSeconds). We have this mode enabled by default so that in case a test hangs we don't need to switch a flag before being able to detect the test. Is there a downside to the blame mode or can it be enabled by default? |
@ViktorHofer Yes we have something called TestTimeout (https://github.com/microsoft/vstest/blob/041133fef03c327d3d759d215f868fdecebcdf45/src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Constants.cs#L81) that if configured can be used to track such things. Blame comes in as a datacollector and monitors the test host process, hence can also take dumps (integrates with procdump for that). There will be some downside perf wise, but then it would not be huge. |
@ViktorHofer Also, Since we are using apphost feature for .NET core scenario and also added the support for environment variables in runsettings, I thought it should unblock corefx scenarios that we had discussed. Can you please help validate those ? |
The difference that I see here is that the timeout in xunit defines when diagnostic messages should be logged that shows what test is currently executed and how much time is already spent on it. The timeout in vstest is a hard kill cap. I don't want to argue what is the better solution but I really like a dump is collected automatically.
Sure, the remaining items before we can use those features in corefx are:
As soon as those two are in, corefx can take a dependency on VSTest and we will be able to dogfood test your changes. |
@ViktorHofer Let me come back with numbers from perf side for blame option. Eagerly waiting for your PR and so we can eventually see the things work end to end. |
We have the long running test notification running and it's definitely useful. What remains here is to have Helix create dumps on hangs. I'm closing this @markwilkie against an issue tracking that specifically dotnet/dnceng#1216 |
Can I see hanging tests in helix easily now? |
The following should be built into Helix:
• Understand what process is hanging.
• Understand what test is hanging.
• Take a dump of the process and have it be a downloadable artifact.
The text was updated successfully, but these errors were encountered: