Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resilience Updates #450

Merged
merged 7 commits into from
Oct 2, 2020
Merged

Resilience Updates #450

merged 7 commits into from
Oct 2, 2020

Conversation

evantahler
Copy link
Member

@evantahler evantahler commented Oct 2, 2020

  1. Move the act of poping a job off of a queue to a transaction via watch and multi.

    • This will have speed implications. In a busy system, the occurrences of trying to "pop" while writes are happening is high.
    • We may want to move to a LUA implementation to be truly blocking
  2. Add queue.retryStuckJobs() which is a single method to retry jobs which have failed due to the worker timeout.

  3. Add options.retryStuckJobs to the Scheduler, to automatically run the above queue.retryStuckJobs() periodically.

If we can gain certainty that a job can't be lost between poping it and working it... and we have ways to retry stuck jobs, we can be rather sure we can't loose any jobs!

@evantahler evantahler changed the title Resilience Fixes Resilience Updates Oct 2, 2020
@evantahler evantahler marked this pull request as ready for review October 2, 2020 01:04
__tests__/core/queue.ts Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants