-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many mutex fails after #49000a30fd34ac21cabdc7d #57
Comments
Hey @Saicheg! I'm not sure though if this will prevent two quick jobs from duplicating, because if they are checking without blocking they might have the outdated info and still en-queue the same child twice. What do you think? |
@pokonski i think we actually should increase blocking timeout ( up to 5-10 seconds ) and instead of waiting for lock - we should fail all other jobs who are trying to enter that piece. So even in case they will pass, only 1 job will enqueue final one and other will fail. |
@pokonski do you have any chance to think through this? |
Yeah quietly failing makes sense, since the job will be enqueued anyway. |
I also got those mutex errors with 1200 jobs in the queue. I tried the proposed fix but when the last workers ran, the job to be run :after the batch got enqueued x-times, once for each running worker. |
I was able to fix this by setting block to 0 and rescuing it. I found that since the jobs that couldn't get the mutex lock don't enqueue anything, the one that did acquire the lock ran and enqueued or skipped as appropriate. However I did notice that in my case this only occurs on the first batch of jobs run on a Sidekiq server and only if they finish too closely to each other. Subsequent jobs on the same server do not seem to have this lock conflic issue. |
Was a resolution reached here on what the proper fix is? |
# Summary This commit is trying to fix issue RedisMutex::LockError on chaps-io#57. Whenever we have the error, it simply enqueues another worker. At the begin of the worker, if the job is succedeed, we simply call `enqueue_outgoing_jobs` again.
Can anyone help me to review my PR? |
* Try to enqueue outgoing jobs in another worker # Summary This commit is trying to fix issue RedisMutex::LockError on #57. Whenever we have the error, it simply enqueues another worker. At the begin of the worker, if the job is succedeed, we simply call `enqueue_outgoing_jobs` again. * Delay next job's execution to avoid hammering the lock
PR was merged back then so I am closing it this issue :) |
Hey @pokonski !
We start seeing many mutex-failed issues after you commit #49000a30fd34ac21cabdc7d
Problem is that when you have 1000s of really fast-jobs they all start checking if final job should be
enqueued and conflicting too much.
I am thinking that you are actually wrapping wrong part with mutex,
client.enqueue_job(workflow_id, out)
feels like more right oneWhat do you think? @pokonski
The text was updated successfully, but these errors were encountered: