-
-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add -showDebugAddress to enable debugging hanging parallel tests #461
base: master
Are you sure you want to change the base?
Conversation
If this information is available on the coordinating node why wouldn't a combination of running ginkgo with a new flag (sorry couldn't come up with a good name) &
I would like to point out that the stacktrace is very important to me when debugging hanging tests. |
I was imagining you might want to get visibility into the test without it hanging, but could explore some of these options. |
Actually - I've received an alternative solution suggestion that would be easier to implement and reason about. Instead, what if:
caused each node to stream its output to a file on disk. Then the user could examine those files to understand what each node is doing and whether it is stuck. In this mode a signal could be interpreted by the coordinating process as a prompt to have the nodes emit their stack trace - perhaps to the same file. Thoughts? |
A bit more on the 'files on disk' suggestion:
|
I think I lean towards a flag over an env-var as it is |
My only concern with the files on disk is it could be harder to get those files if the tests are running on concourse for example. Collecting the information and printing it to stdout could help debugging hanging tests in CI if concourse sends a SIGTERM and allow enough time for ginkgo to print the information to stdout |
Files on disk is easy to look at when debugging failures in CI with hijack, etc, and easy to follow live locally with stdout isn't suitable, both because of redaction concerns in CI, and because of difficulties watching output from multiple nodes in a single stream when watching locally. I prefer writing to a file as a solution over both an API and stdout. |
@onsi, I don’t have a strong view on whether it’s a flag or an environment variable. I would lean towards an environment variable if it’s a “pre-release” feature that I want feedback from specific users, and might change radically. And a command line flag is more appropriate for something that’s stable and is ready for used by anyone. |
I've pushed a file-based approach to With this approach
will emit output from each node to files named |
@onsi, after what must seem like a geological age, here are some very quick thoughts on the
|
thanks for taking a look @blgm I'd love to find an interface that works well. I think the file approach has merit and agree that ending up with lots of files all over the place isn't great. Any thoughts on an alternative design? As for timestamps. Unfortunately given the syscall hackery involved in rerouting stdout I don't think I can inject timestamps. |
Perhaps creating them all in the same directory with a name scheme like I'd be in favour of merging this as is, with a caveat that it's an experimental feature subject to change. Most users would not add the |
Just out of curiosity, what were the reasons to not follow the suggestion from @vito in #323 ? I am not super familiar with the way Ginkgo handles parallelism, but each node could trap SIGQUIT and output the stack dump to a known file. Maybe we could also create a new process group with the coordinating node and the nodes running the test, so that a user could kill them all with one command and get the stack dumps. The advantage of this approach is that there's no need to start Ginkgo with special flags. A disadvantage is that maybe the stack dump is harder to read than logs from the nodes. |
Oh I have missed the second comment on this thread.
^_ fair enough, but it assumes users are starting ginkgo with the |
Good point @nodo We should update the docs here: To recommend that folks add the debug flag in CI environments |
I've merged the file solution via #499. I'll leave this open to remind us to update the docs. |
Is it working @blgm ? :) |
The experience when parallel tests hang currently sucks. (See #323 and #206)
I had some time recently to try out a few ideas and landed on this PR that I'd like feedback on.
The basic premise is that when you now run
ginkgo -p -showDebugAddress
Ginkgo will emit a debug http address that delivers a JSON payload that represents:GinkgoWriter
writes emitted by that test so farThe hope is that this information can be useful to help debug the tests. The JSON output is a bit noisy (it's a serialization of some of the
reporter
types).This would ultimately benefit from some sort of user-friendly client (e.g.
ginkgo debug <DEBUG ADDRESS>
could emit some pretty printed output; or one could imagine a simple web client to render the output nicely). There are also alternative ways to do this (e.g. send a signal toginkgo
to emit this JSON to stdout instead of hitting an HTTP endpoint).Would appreciate any and all feedback and also contributions to improve the user experience.