Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permanent errors don't cause job failure #28

Closed
jlewi opened this issue Aug 21, 2017 · 0 comments
Closed

Permanent errors don't cause job failure #28

jlewi opened this issue Aug 21, 2017 · 0 comments
Labels

Comments

@jlewi
Copy link
Contributor

jlewi commented Aug 21, 2017

If a container crashes with an exit code of 1 this should be considered a permanent error and cause the job to fail.

This doesn't happen because
isRetryableTerminationState
https://github.com/jlewi/mlkube.io/blob/master/pkg/trainer/training.go

Requires that a termination message be set in order for the exit code to be trusted.
This is legacy code that is no longer applicable. It assumes we were using a launcher.sh script which users aren't.

We should get rid of that check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant