Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Don't reset failed embedding jobs #9877

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

Dschoordsch
Copy link
Contributor

@Dschoordsch Dschoordsch commented Jun 26, 2024

Description

Fixes #9842
When resetting stalled jobs we did not check whether it was still running or it had failed. This would reset the jobs to queued so the embedder would retry failed job indefinetely.

Demo

[If possible, please include a screenshot or gif/video, it'll make it easier for reviewers to understand the scope of the changes and how the change is supposed to work. If you're introducing something new or changing the existing patterns, please share a Loom and explain what decisions you've made and under what circumstances]

Testing scenarios

[Please list all the testing scenarios a reviewer has to check before approving the PR]

  • have some permanently failing jobs in the job queue (for example with null embeddingsMetadataId)
  • optionally decrease the reset stalled job timeout form 5m to something shorter (2 places)
  • see the retry count not going up after 5

Final checklist

  • I checked the code review guidelines
  • I have added Metrics Representative as reviewer(s) if my PR invovles metrics/data/analytics related changes
  • I have performed a self-review of my code, the same way I'd do it for any other team member
  • I have tested all cases I listed in the testing scenarios and I haven't found any issues or regressions
  • Whenever I took a non-obvious choice I added a comment explaining why I did it this way
  • I added the label Skip Maintainer Review Indicating the PR only requires reviewer review and can be merged right after it's approved if the PR introduces only minor changes, does not contain any architectural changes or does not introduce any new patterns and I think one review is sufficient'
  • PR title is human readable and could be used in changelog

When resetting stalled jobs we did not check whether it was still
running or it had failed. This would reset the jobs to queued so the
embedder would retry failed job indefinetely.
@Dschoordsch Dschoordsch merged commit 882443c into master Jun 26, 2024
8 checks passed
@Dschoordsch Dschoordsch deleted the fix/9842/dontResetFailedJobs branch June 26, 2024 15:52
@github-actions github-actions bot mentioned this pull request Jun 26, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: stop retrying to embed stuff
1 participant