-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate logging and queueing? #34
Comments
Hey @ShuyangCao. Nicely done! Keep up your good works using ts! In fact, logging is handled by the client, not the server, so you can still see your progress via offline log files in Above all, the crash should not happen at all. This is probably due to the poor error handling of GPU query. I pushed a simple fix for this in the branch Btw, there's a log file in |
Thanks! Yes, I can still check the files in The error messages are:
|
Thanks for the reply @ShuyangCao. The first error probably is caused by a message from an orphan client sent to a restarted server. This is harmless in most cases. The second error will return a Please let me know if there's still any problem. |
Thanks again for your work. Your tool helps me push out a lot of great work. Feel free to check out my website.
Recently, our workstation has unstable connection with the GPUs (might be an issue with the driver). Basically,
nvidia-smi
would returnWhen this issue occurs, the
ts
session will break down and restart. While the access to the previous session is lost, the jobs launched by the previousts
session are still running and we can no longer track their logging outputs withts -t
.I guess it might be better to separate logging and queueing, so that the logging module does not depend on the GPU status and can still work when GPU error occurs.
Thanks!
The text was updated successfully, but these errors were encountered: