You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a batch fails to be processed, the failure counter for the Job is incremented but the batch is discarded.
Rather than discarding failed batches, persist the failed batches to enable users to identify which batches failed and handle failed batches.
Suggestions
Cortex enqueues batches onto a queue (SQS) to distribute work across workers. Cortex should also create a dead letter queue to store the failed batches. When a batch fails, workers can enqueue the failed batch onto a dead letter queue. If a job has completed with failures, users can consume the dead letter queue to figure out which batches and retry them afterwards.
As batches are being placed onto a queue, Cortex persists each batch and metadata to storage such as S3. Upon the successful completion or failure of a batch, the metadata for that batch is updated accordingly. After the job has completed, users can browse the batch metadata to find the failed batches and resubmit them.
The text was updated successfully, but these errors were encountered:
Description
When a batch fails to be processed, the failure counter for the Job is incremented but the batch is discarded.
Rather than discarding failed batches, persist the failed batches to enable users to identify which batches failed and handle failed batches.
Suggestions
The text was updated successfully, but these errors were encountered: