-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long Running Test - System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow #35506
Comments
Tagging subscribers to this area: @eiriktsarpalis |
It timed out, it wasn't just long running. Not obvious why: we need dump files on hangs dotnet/dnceng#1216 |
I've seen it on OSX as well. Maybe in this case test can enforce reasonable timeout and Assert to cause corefump. |
Not a bad idea, although the core dump won't help if the issue somehow is that the child process won't exit. I'll make the change. |
I started as well as I'm planning to dump process list on failure. I'd be happy to stop and let you some fun @danmosemsft. I think for now, the focus should be to cap test duration and collect useful info on failure. |
Oh you go ahead then since that sounds way better than the simple thing I was going to do. For the RemoteExec case, we have a pretty good setup right now for timeouts and for gathering info on hangs, active processes etc. Do we have a way to trigger this when we're not in a RemoteExec context? It would be super handy to have a general mechanism. I can think of all kinds of ways we could extend it. (I hope we can avoid 2+ implementations of it) |
And ultimately, we need to find a way to hook into xunit so that it triggers for hangs in arbitrary tests, without special effort in each test. |
I think the in-process may be tricky. We may be able to mark test as failure but I don't know if there is reliable and safe way how to terminate running function. |
@wfurt one way would be to spawn the RemoteExecutor with a special flag "make a dump of my process". It would run the same code I linked above but against the PID provided. When it returned, the test could either throw XUnitException to fail itself and continue, or terminate the process. |
cc @stephentoub in case he has another suggestion. |
Doesn't the vstest infrastructure we're about to switch to make it straightforward to get dumps, or am I misremembering? |
Yes, VSTest can be configured to kill the testhost after a specified timeout and will then collect the dump. @nohwnd has the details and can talk about cross-plat support of the dump collection. |
It might be interesting to be able to get a dump, fail only that test, and continue. It would also be good to have a hook to gather other data if and when we need it. It would also be good to understand what platforms we can create dumps on. @nohwnd could you share more about what vstest offers? |
@danmosemsft Update: Got my ARM and AMD abbreviations confused. Hang dumps currently work only on Windows x86, x64 but not We are investigating how to make hang and crash dumps cross-platform in the helix prototype effort you are also part of.
That would be very nice, but it is complicated by the fact that test host cannot safely dump itself, because it is not always safe to do that across platforms. So the blame data collector would have to detect the hang (that is already happening), dump the process when the test timeout is reached, and associate that dump with the test somehow, but NOT kill the process as it does now. Instead it would wait until all tests finish running, or until they are all past their timeout threshold. This would in the worst case generate 1 full dump per test if all are hanging, and full dump is requested. Once all tests are finished or timed-out, the test host would stay hanging because it cannot terminate until it's threads are done running, and because dotnet core does not allow threads to be aborted we need to kill the process externally (or it may be able to terminate itself, I am not sure now). The upside is that this approach should be test framework agnostic, because this happens above the test adapter level. And because there is a special parser for the Sequence file that is produced, AzDo is also able to mark those unfinished tests as aborted.
What kind of data would that be? I did not find that in this thread.
Windows x86 and x64 (but not I am experimenting with improving the overall experience here, where I collect dumps via vstest console across multiple operating systems, ideally by the end of the script all the ticks would be green: https://dev.azure.com/jajares/blame/_build/results?buildId=18&view=results |
Thanks @nohwnd for working on this!
It would depend on the test (or the test failure) but one example that came up elsewhere was logging whether the machine was heavily loaded and what other processes were running. I could imagine we might want to log config files or registry keys, the versions of certain libraries, etc. Certainly this is secondary to reliably getting useable dumps. |
It seems like dotnet/dnceng#1216 is essentially dup of https://github.com/dotnet/core-eng/issues/5380. Second one outlines steps for the NIX platforms and it should not depends on architecture. Alternatively, we could use tools from the diag repo. As far as the auxiliary information, load, m tail for kernel log and process lists with states and and stats would be good start IMHO. For networking tests, dump of interfaces, dns configuration, routing table and connections would be awesome. |
@danmosemsft that could be achieved with a (diagnostics) data logger:
from https://github.com/Microsoft/vstest-docs/blob/master/docs/extensions/datacollector.md |
https://dev.azure.com/dnceng/public/_build/results?buildId=618671&view=ms.vss-test-web.build-test-results-tab&runId=19376690&resultId=184383&paneView=attachments
console.a6c9799c.log:
https://helix.dot.net/api/2019-06-17/jobs/5607cdb0-5b11-499c-a476-0d70c6599ef4/workitems/System.Diagnostics.Process.Tests/files/console.a6c9799c.log
Configuration:
netcoreapp5.0-Windows_NT-Release-x86-CoreCLR_release-Windows.7.Amd64.Open
The text was updated successfully, but these errors were encountered: