-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manager failed: stack level too deep with v3.1 #396
Comments
Hi @pjg Could you share your |
Nothing fancy on my end:
|
@pjg did you get that error just by running Shoryuken? Or was that during stop? |
I got it just by running. And it's been flooding my log. But it auto-restarted too, with 3.1 |
And I didn't change my config at all. |
@pjg I couldn't reproduce it, but I believe Is this error easy for you to reproduce? If so, can I do a quick change, for you to test? |
Me and other developer in my team were not able to reproduce it on our Macbooks. I only saw the issue on a fairly old Ubuntu server, 12.04.4 LTS, 3.2.0-32 kernel. When I saw it I downgraded, so cannot reproduce it now. FWIW the queue processing was working fine, just that the system would constantly broke down and restart itself (or via my monit setup, couldn't tell). |
@pjg got it! I will test it on Ubuntu and let you know. |
Hi @pjg I couldn't reproduce that even in a Ubuntu box. Anyway, I did a change that I hope can fix the issue. Would you be able to test it before I release a new hotfix version? gem 'shoryuken', git: 'git://github.com/phstc/shoryuken.git', ref: '484efc0' |
I can reproduce it with two queues, one with priority Similar backtrace:
|
@pjg thanks! at least the backtrace is different now. I will try to reproduce it again. |
I'm wondering since the error is the same (no matter the backtrace), if this |
@pjg if you have a chance to test before me new SHA is But I don't know if that would help much TBH, that sleep is around since 3.x. Maybe a difference now is that we have multiple threads trying to use the |
Sorry, but it still breaks. Just takes a while longer to break (whole 40 minutes) of idling (I've also upgraded aws-sdk this time).
|
I reminded myself that we also have a custom
And in the initializer:
|
@pjg what's your aws-sdk version? I'm wondering if it's an issue with not being thread-safe. |
I tried with two versions. One latest, one little older. You can see the exact versions in the logs above. |
oops 🤦♂️ |
@pjg took me a very long while, but it seems to be fixed now, I can no longer reproduce it. I will keep the process running during the night, let's see how it goes. Would you mind test this SHA |
I've been running |
yay awesome! Thanks for reporting and helping with the fix 🍻 OSS 🤘 |
Thanks for the fast fix! |
@pjg BTW 3.1.2 is out with the fix |
I have now upgraded all my production servers to use Shoryuken 3.1.12 (from 3.0.7) and something is not right. The moment we upgraded, this graph started to look bad: Our retention period is 5 days, that's why those messages accumulate on this graph. I think that some worker process just takes a job, marks it as being processed and then dies silently, without ever releasing it back. I have a strong feeling that this is still a regression related to this issue 😞 ==== UPDATE ==== Nevermind. It's actually the new exception handling for ActiveJob in the sentry-ruby gem. When it cannot find the serialized object (ActiveRecord), it will not be rescued properly and the message won't be deleted from the queue. Now trying to find a fix for that... ==== UPDATE TWO ==== Just found this guide... Should have read it first before jumping into the new version with native ActiveJob support: |
@pjg sorry for that. but I'm not sure if I understood the issue, is it because you have non-ActiveJob messages mixed in a queue also consumed by ActiveJob workers? Shoryuken does not force a message format, but ActiveJob does. |
I got to the bottom of this. My issue is that I have |
@pjg Shoryuken does not auto delete messages if an exception is raised. Rescuing that error will do, but you can also: def perform(site_id)
return unless site = Site.where(id: site_id).first
# ...
end BTW are you using a DL queue? If not, have a look at them, I strongly recommend using DL queues. |
I'm serializing whole objects ( |
I just bumped to 3.1 and now need to revert. Ruby 2.3.3p222.
And then the log goes on and on for megabytes of backtraces with no end in sight.
I suppose it's caused by this: #389
The text was updated successfully, but these errors were encountered: