-
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs executing and immediately returning #418
Comments
Pretty sure the problem is the following:
It should not be silent though. If anything, we need to make sure there is something helpful logged from the middleware at that point. I will have a quick look before cutting a new release. |
I'm having this problem as well using Also, I only recently upgraded this gem from ~5.0 to ~6.0. Before the upgrade and the same setup, I didn't have this error. I hope any of that is helpful! |
Also as an extra note, I'm not 100% confident in the order of my upgrading and when this started. It might have been before I upgraded the front end to 5.2. |
@cpiazza @glennfu version 6.0.15 fixed a minor but persistent bug with locks. You might want to try it out. I also need some people to help me test version 7. Currently When someone can point me to something concrete, then I can fix it. Otherwise it took a really huge refactoring of both tests and implementation to get to a reliable locking mechanism. |
I'm running
EDIT: If it's helpful to know, my queue pretty much never empties out. I also have 3 sidekiq instances running. Could Sidekiq 1 be running it successfully, and 2 and 3 also try to run it at the same time and it fail? I'm getting 2 errors every time it happens, so that's my naive guess at what could be happening. Also, here is my configuration info:
EDIT 2: This error also gets thrown a lot when visiting /sidekiq/queues/default in the sidekiq web ui |
More importantly, it seems that this problem gets worse over time until it kills all sidekiq processes permanently. Even my app instances are getting and throwing this error when attempting to enqueue jobs. |
@glennfu I know what the problem is. I honestly didn't consider multiple sidekiq instances so I will work on fixing that going live. |
@glennfu basically what happens is that the orphan reaper is running on all processes and it runs lua from all three possibly then at the same time. I can imagine how that would cause problems. |
Good to know. Also maybe some consideration into shutdown procedure. I scaled down to a single process, and I thought I was ok, but as I re-deployed, there's technically 2 running at the same time for a small amount of time. It would be nice if this also didn't cause the kaboom. |
Also just FYI I've rolled back to 5.0.11 and everything seems good there. |
@glennfu I will fix the issue with multiple reapers now. |
I just tested 7.0.0beta8 on my staging server, and quickly this problem resurfaced. I even went into the UI to clear all the locks and test here. Here's some log output:
As you can see, uniquejobs spits out a warning, and then nothing enters the queue. If I run @mhenrixon any new ideas here? |
No new ideas yet. I'm struggling a bit with how to debug this. I found some issues while running things using pry but it is cumbersome to say the least. Locally I found that I have to use rails Then I was out with back pain most of December, my wife had surgery and I've not had a chance to do anything except daddying until this week. I'll get back to it soon but I'm open for suggestions. Could caching of classes and eager loading be an issue at all on your staging environment? |
My Staging is a mirror of Production, and I can reproduce the issue on both. Here's an interesting bit happening right now on Production (sidekiq_unique_jobs v5, although I can get similar behavior on staging with v7beta8): I have a job, let's call it Do you have any ideas of things I can do to dive more deeply into this? Debug flags to flip maybe? Also, best of luck to you with the home life! I hope it gets easier for you soon. I'm going to deploy v7beta8 to Production shortly (even though I see similar problems on Staging) and see if the new stuff gives me any more insight. |
@glennfu hot tip is to Lock info contains information about each lock and the history is like an audit trail for each lock digest |
I just tried with these settings:
Unfortunately, I instantly had a massive surge of duplicated jobs. I usually float around 20-40 jobs in queue at any given time, and after a few minutes I had 470 jobs. My clock kept crashing (no error in logs) and I didn't see any out put from sidekiq-unique-jobs in any of my log files, even though I'm watching everything, including the Sidekiq logs, where I expected to see this. I had to revert rather quickly after that. Sorry! I don't think I can try that again, it was too messed up. |
On the upside, the process of upgrading/downgrading cleared out my locks, which means EDIT: I found this in there |
OK I felt bold and tried a bit slower. I deployed it again without my clock, and manually enqueued some jobs for testing. With I'm using this configuration:
I haven't changed that config setup in pretty much ever, and it works correctly on v5. EDIT: I also tried with this, still get duplicated jobs:
|
@cpiazza I accidentally introduced a bug in 6.0.15 that prevented unlock from working when using I've added a warning when installing said 6.0.15-18 by re-releasing the gem with a post_install message. Out of curiousity, @glennfu, what is the reason you limit yourself to ActiveJob like that? Legacy system? I don't mean to be a dick, I just feel like it is like owning a sports car (Sidekiq) and driving it like a Ford Fiat (ActiveJob). There is no support for ActiveJob anymore due to all the weird issues people posted so I decided to put that focus towards pure sidekiq usage instead. Mostly because I am not using (and most likely never will be using) ActiveJob it becomes too hard to make it work properly. If soemone wants to help me get the gem working ok for you ActiveJob users I'm more than happy to reconsider but I need someone to commit some time to that then. As for the If using ActiveJob, it handles sidekiq arguments a little different. I think the first argument is the active job job_id or something like that so you might want to skip that one as well. |
I am running on 7.0.0beta22 and got this error:
then nothing happened, no job entered the queue UPDATE: |
@radilr1 if you configure according to the readme for v7.0.1 it will add a death handler automatically for you now. I strongly recommend having the reaper running as well to reduce the impact of orphaned locks. |
@mhenrixon I was finally able to upgrade this and simultaneously replace ActiveJob with Sidekiq::Worker (in most places) about a month ago. So far it's been smooth sailing. Thanks for your hard work on this! |
Describe the bug
We've had two instances in which we were using
sidekiq_options unique: :until_and_while_executing
on a worker and after some time, calling.perform_async
on the worker would result in the job getting enqueued, a job ID assigned, picked up by a worker, and then exit immediately.For example, we would see in the sidekiq.log
In all cases, there were no other instances of
MyWorker
enqueued or running.Upon changing to
sidekiq_options unique: :until_executing
the problem went away in both cases.Expected behavior
The worker to execute as normal.
Current behavior
See above. Note in the class below we are logging start, and success or failure - in these cases we see nothing logged. These are also typically workers that run for several minutes.
Worker class
Additional context
Anything we can look for here? Ideally we'd want to use
:until_and_while_executing
if possibleThe text was updated successfully, but these errors were encountered: