-
Notifications
You must be signed in to change notification settings - Fork 152
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log agent crashes to file and retrieve from console #335
Comments
Why not use the existing agent log file? |
Because that only (currently) exists under the I envisaged this as a separate, temporary file - which is temporarily written and deleted then by the engine, and simply used as a way to pass information between processes, post-crash. |
It is an interesting idea. It wouldn't work when/if we ever get agents running on additional machines, but we haven't made much progress in that direction and it might help with some immediate issues. What are you thinking of logging? Just the crashing exception and stack trace? Is the idea that it will be another mechanism to pass crash information back to the console runner when the remoting channel fails so that we don't just display the socket exception? Not sure this would work for classes of exceptions that we cannot catch which might limit it's use. Or are you thinking of logging more? If so, what? |
Fair. I imagine multiple agents on a single machine will remain the main usage however, which would make this worthwhile in my eyes. 🙂
Yes to everything. My thinking was, if we can definitely get everything we can catch - then we can say with more confidence that an 'unrecorded' crash is likely e.g. a StackOverflowException. (For my benefit - what other kinds can't we catch?) And we can then supply some more useful information, e.g. try running I'm not totally sure how this will tie in with the timing shutdown issues Joseph has been looking at - but I'm hopeful they will be solved by an eventual more to a new communication method. We're suspicious however that there are also other problems - but they're proving flakey and difficult to pin down. I think there's room for us to do better at recording/reporting these errors, so we (or our users, if it's in their code!) might eventually be able to get them fixed. |
I see. That makes sense. However, in addition, I think you just pointed out a weakness in our existing trace logging. We should obviously be capturing unhandled exceptions and logging them as errors in every agent process we create. Not sure if we need a separate issue for that, however, since it would all be implemented in one handler. Let's stick with the convention of using the process id in any file names we create and make sure to create them in the defined work directory along with everything else. In fact, I don't even think we should delete such a file, but just keep it sitting next to the log files. I'm for doing this with exception info, including inner exceptions, first and then adding more info if it proves necessary. Since a single agent can have multiple AppDomains running tests, we should indicate which domain caused the crash in the report as well. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Agent crashes are often hard to debug - as the exception information is lost on the crashed process. If the console could display the exception stack trace, as it would if the tests were run in process, then that would help debugging - especially of flakey cases where the crash isn't readily reproducible.
How about we add use a temporary log file? The engine could pass the path the the agent on creation. We add an UnhandledExceptionEventHandler to the agent to write any exception to this file. If the console detects the agent has crashed - it then looks for any content in the log file to write out to the user. (And of course, cleans up the log file after itself!)
Thoughts?
The text was updated successfully, but these errors were encountered: