-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver start recoverable #1891
Driver start recoverable #1891
Conversation
@dadgar I'm still seeing the following under
Shouldn't those be recoverable too? |
Seeing a similar error on
In this case, the timeout is happening when starting the container. Should this timeout be retry-able as well? |
@pdalbora Was docker functioning properly on that machine? That endpoint timing out makes me think docker was unresponsive and failing would be correct (future versions could use that signal to push the task onto another driver) |
@dadgar Yes, I too thought it was strange for the start endpoint to time out, but Docker otherwise seemed to be working fine. My guess is dockerd was overloaded, as we were also pulling some pretty hefty containers on the same machine. Are there any particular logs that would help for you to look at? This was in a test environment that I've since destroyed, but it's reproducible. |
@pdalbora The most useful thing would be the reproduction steps! |
@dadgar It's reproducible in our rather complex testing environment. It will take me some time to narrow it down to a portable reproducible test case. |
@pdalbora Hmm okay. If you can that would be awesome because I haven't been able to reproduce and thus fix :( |
What is the issue and fix for this. Because we are facing the same issue. nomad version docker version Server: |
Hi @goutham-sabapathy! This is a long-closed issue and was on a version that had a very different model for task driver plugins. Please open a new issue describing what you're seeing. Thanks! |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR fixes a regression in which we weren't handling recoverable errors and adds unit tests to the task runner and docker driver to prevent them.
Fixes #1858