Skip to content

Feature request: allowing for Retry to work with SageMaker steps #140

Open
@lucaboulard

Description

@lucaboulard

Currently, the Retry mechanism does not work with TrainingStep and ProcessingStep as the full job name must be specified to the step constructor so that if the step fails when the job has already been created, all retries will fail in submitting the job as the job name has already been used.
This happens for almost any error (including capacity errors) excluding throttling errors.

A possible solution might be to add an alternative parameter to specify a job name prefix, instead of a full name, and let SageMaker add some random suffix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions