You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I have spot instance (or regular on-demand) worker and it fails during the batch processing. How can I ensure that this batch is going to be processed once again?
Or how can I get information about batches, which failed to be processed to start processing them again?
The text was updated successfully, but these errors were encountered:
The Batch API currently only keeps track of the status of the overall job and does monitor the status of each individual batch in the job. Failed batches are currently discarded, making it difficult to do perform retries at the batch level.
I have created these two tickets #1540 and #1541 to address these issues.
Support for Batch is a recent addition to Cortex so there is a lot of room for improvement. I would be happy to jump on a call to discuss workarounds for these issues and other potential improvements that can be made to Cortex. You can reach me at vishal@cortexlabs.com if you are interested.
If I have spot instance (or regular on-demand) worker and it fails during the batch processing. How can I ensure that this batch is going to be processed once again?
Or how can I get information about batches, which failed to be processed to start processing them again?
The text was updated successfully, but these errors were encountered: