Open
Description
Currently, the Retry
mechanism does not work with TrainingStep
and ProcessingStep
as the full job name must be specified to the step constructor so that if the step fails when the job has already been created, all retries will fail in submitting the job as the job name has already been used.
This happens for almost any error (including capacity errors) excluding throttling errors.
A possible solution might be to add an alternative parameter to specify a job name prefix, instead of a full name, and let SageMaker add some random suffix.
Metadata
Metadata
Assignees
Labels
No labels