Many mutex fails after #49000a30fd34ac21cabdc7d #57

Saicheg · 2018-07-16T09:17:33Z

We start seeing many mutex-failed issues after you commit #49000a30fd34ac21cabdc7d

Problem is that when you have 1000s of really fast-jobs they all start checking if final job should be
enqueued and conflicting too much.

I am thinking that you are actually wrapping wrong part with mutex, client.enqueue_job(workflow_id, out) feels like more right one

def enqueue_outgoing_jobs
  job.outgoing.each do |job_name|
    out = client.find_job(workflow_id, job_name)

     if out.ready_to_start?
       RedisMutex.with_lock("gush_enqueue_outgoing_jobs_#{workflow_id}-#{job_name}", sleep: 0.3, block: 2) do
          client.enqueue_job(workflow_id, out)
        end
      end
  end
end

What do you think? @pokonski

The text was updated successfully, but these errors were encountered:

pokonski · 2018-07-16T09:34:40Z

Hey @Saicheg!

I'm not sure though if this will prevent two quick jobs from duplicating, because if they are checking without blocking they might have the outdated info and still en-queue the same child twice. What do you think?

Saicheg · 2018-07-16T12:10:42Z

@pokonski i think we actually should increase blocking timeout ( up to 5-10 seconds ) and instead of waiting for lock - we should fail all other jobs who are trying to enter that piece.

So even in case they will pass, only 1 job will enqueue final one and other will fail.

Saicheg · 2018-07-18T13:04:45Z

@pokonski do you have any chance to think through this?

pokonski · 2018-07-18T13:05:53Z

Yeah quietly failing makes sense, since the job will be enqueued anyway.

schorsch · 2018-11-06T10:28:24Z

I also got those mutex errors with 1200 jobs in the queue. I tried the proposed fix but when the last workers ran, the job to be run :after the batch got enqueued x-times, once for each running worker.

treyperkins · 2019-06-26T20:03:18Z

I was able to fix this by setting block to 0 and rescuing it. I found that since the jobs that couldn't get the mutex lock don't enqueue anything, the one that did acquire the lock ran and enqueued or skipped as appropriate.

However I did notice that in my case this only occurs on the first batch of jobs run on a Sidekiq server and only if they finish too closely to each other. Subsequent jobs on the same server do not seem to have this lock conflic issue.

anirbanmu · 2019-09-30T15:37:33Z

Was a resolution reached here on what the proper fix is?

# Summary This commit is trying to fix issue RedisMutex::LockError on chaps-io#57. Whenever we have the error, it simply enqueues another worker. At the begin of the worker, if the job is succedeed, we simply call `enqueue_outgoing_jobs` again.

suonlight · 2019-10-10T03:36:21Z

Can anyone help me to review my PR?

* Try to enqueue outgoing jobs in another worker # Summary This commit is trying to fix issue RedisMutex::LockError on #57. Whenever we have the error, it simply enqueues another worker. At the begin of the worker, if the job is succedeed, we simply call `enqueue_outgoing_jobs` again. * Delay next job's execution to avoid hammering the lock

pokonski · 2022-03-10T12:54:59Z

PR was merged back then so I am closing it this issue :)

Saicheg changed the title ~~Many mutex-fails after #49000a30fd34ac21cabdc7d~~ Many mutex fails after #49000a30fd34ac21cabdc7d Jul 16, 2018

suonlight mentioned this issue Oct 10, 2019

Try to enqueue outgoing jobs in another worker #71

Merged

thukim mentioned this issue Oct 13, 2019

Allow RedisMutex’s locking duration and polling interval to be customizable #74

Merged

pokonski closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many mutex fails after #49000a30fd34ac21cabdc7d #57

Many mutex fails after #49000a30fd34ac21cabdc7d #57

Saicheg commented Jul 16, 2018

pokonski commented Jul 16, 2018 •

edited

Loading

Saicheg commented Jul 16, 2018

Saicheg commented Jul 18, 2018

pokonski commented Jul 18, 2018

schorsch commented Nov 6, 2018

treyperkins commented Jun 26, 2019 •

edited

Loading

anirbanmu commented Sep 30, 2019

suonlight commented Oct 10, 2019

pokonski commented Mar 10, 2022

Many mutex fails after #49000a30fd34ac21cabdc7d #57

Many mutex fails after #49000a30fd34ac21cabdc7d #57

Comments

Saicheg commented Jul 16, 2018

pokonski commented Jul 16, 2018 • edited Loading

Saicheg commented Jul 16, 2018

Saicheg commented Jul 18, 2018

pokonski commented Jul 18, 2018

schorsch commented Nov 6, 2018

treyperkins commented Jun 26, 2019 • edited Loading

anirbanmu commented Sep 30, 2019

suonlight commented Oct 10, 2019

pokonski commented Mar 10, 2022

pokonski commented Jul 16, 2018 •

edited

Loading

treyperkins commented Jun 26, 2019 •

edited

Loading