-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak? #88
Comments
hey @sschepens which version of Shoryuken are you using? Old versions of Shoryuken had a problem with |
@phstc I'm using latest version |
ruby -v? |
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux-gnu] |
What sort of growth are you talking about? Can you give numbers / new relic graphs etc? |
i'll leave a process running for a few hours recording heap metrics and will come back with that info |
Any chance it's your worker itself causing the memory leak? Could you try to run Shoryuken with an empty Some questions for troubleshooting:
|
Only one process with 5 concurrency, only one queue with a priority of 2. I dont think running with an empty perform will do much, as i'm hanging on an empty queue. |
What happens if you stick in an occasional |
That is exactly what i was doing now, and recording metrics before and after gc start |
@sschepens sorry I missed that part:
You are right, no point on testing an empty work. I will try to run the same test here to see how it goes. |
I kept it running for a few hours with default options (checking an empty queue) and the real memory size was consistent around 52MB. dotenv bin/shoryuken -q default -r ./examples/default_worker.rb -p shoryuken.pid -l shoryuken.log -d
top -pid $(cat shoryuken.pid) @sschepens are you using the default My ruby version |
Yes, i'm using default delay.
I still have to record some useful heap information to get a clue of what's going on. |
hm I will try to run that with Rails, because running with Ruby 2.2.1 the real memory size was consistent around 30MB (better than with 2.0) |
Ok, i'm using ruby 2.2.0 and it seems to have a memory leak on symbol gc: https://bugs.ruby-lang.org/issues/10686 |
We are also experiencing memory run-out problems We are running a worker process in docker with a memory cap of 500MB. We have filled our queue with fake data that can be partially processed and will generate an exception after a few lines of code. This keeps the worker busy and makes it easier to monitor it's memory usage behavior. After 12 hours the process hits the memory cap and is shutdown. We are running ruby mri 2.1.5 and Rails 4.1 environment with Shoryuken 1.0.1 Same issue occurs with jruby 9.0.0.0.pre1 and Shoryuken 1.0.2. Essentially the process grows overtime and then hits some upper bound and that's all she wrote. |
@curtislinden could you try ruby 2.2.1? |
I still have leaks on ruby 2.2.1 and 2.2.2 :( |
Same issues with 2.2.2 I found a similar conversation in celluloid and have some insight to share. I believe this will affect shoryuken as I believe in particular when an exception is caught and the celluloid actor is killed it doesn't have it's memory cleaned up. It may be true for actors that complete a task successfully - as-well celluloid/celluloid#463 (comment) @curtislinden try to delete actor instance variables in the celluloid finalizer. The "celluloized" object is known to be never collected by GC. |
The problem i believe is that shoryuken doesn't recreate actors, they are long living and only recreated if an exception happened (and only workers i believe) |
Since Celluloid was written in the spirit of erlang - I assume that "fail-fast" and "let it die" were good patterns to follow with our workers. In that spirit when our workers can't process a job - they throw exceptions for various cases. In practice these exceptions are common but temporary. Since we are throwing exceptions often in our current data-set - I suspect given @sschepens belief about recreation exception handling in shoryuken may be the place to have cleanup. |
I know i've seen exceptions being swallowed in shoryuken, for example here: fetcher Don't know if there are more occurrences of these, and i'm not really sure if those are causing issues. |
@phstc @curtislinden @leemhenson
I've also removed some code which i didn't need right now, such as ActiveJob integration. Anyway, I have been testing and my processes have now been running for a week more or less and I have not had any memory issues. Cheers! |
Fetcher exceptions should be rare. And if a processor dies, it will be handled by manager/trap_exit. I totally believe in your and I'm sure this issue is happening for you. But I couldn't reproduce it in my production and development environment.
👍 for the initiative! Feel free to copy any code you need! A did the same with Sidekiq. Remember to keep the LGPLv3 License. @sschepens @curtislinden as you can reproduce the issue, could you try 🙏
??? Following niamster/dcell@61d4e0d? |
Hey! yeah We can try this On Mon, Apr 27, 2015 at 6:36 PM, Pablo Cantero notifications@github.com
Curtis J Schofield | Senior Systems Analyst | U2U Linden Lab | Makers of Shared Creative Spaces |
hey @curtislinden did you have a chance to try that? |
Hey - Super interested in trying this out - just have to get some time set On Fri, May 8, 2015 at 5:11 AM, Pablo Cantero notifications@github.com
Curtis J Schofield | Senior Systems Analyst | U2U Linden Lab | Makers of Shared Creative Spaces |
Hi - tried it out and it appears that processor isn't an instance variable and also remove_instance_variable is not an expected menthod... I figured this might be because of the 'poetic mode) but it doesn't make sense to remove_instance_variable(:@processor) as that is a local var. 2015-05-08T20:42:57.461708742Z 20:42:57 worker.1 | 2015-05-08T20:42:57Z 14 TID-otkh73en8 ERROR: /usr/local/bundle/gems/celluloid-0.16.0/lib/celluloid/proxies/sync_proxy.rb:23:in |
Good point! The processors are elements inside the arrays The processors flow:
Based on what I understood on this thread, the processors might be kept in memory when they die, increasing the memory usage. The |
Interesting.. I think the hypothesis is indeed that the processors might be This is an interesting problem. On Sun, May 10, 2015 at 3:05 PM, Pablo Cantero notifications@github.com
Curtis J Schofield | Senior Systems Analyst | U2U Linden Lab | Makers of Shared Creative Spaces |
@Senjai Not resolved yet. But this is tricky, for example I couldn't reproduce it. Did you? I closed it because of the lack of activity on this thread, but maybe we can re-open it, you have thoughts on it? |
I believe sidekiq is in the process of or has already migrated to concurrent-ruby instead of celluloid, because it was having similar issues, rails 5 is also gonna depend on concurrent-ruby. |
@phstc We've ran into the issue. We dont care if heroku kills our dyno due to it really as our system is designed around that possibility, so its rather low priority. But it still appears to be a problem. 👍 for dropping celluloid though, but thats a larger discussion |
Any progress on this? I currently have the same issue, shoryuken gets 60% of memory usage only by listening for new messages in SQS... |
@twizzzlers there's a big chance it's more related to what you are doing in your worker instead of in Shoryuken itself. What are you processing? |
I'm executing some jobs with normal database functions, it could be heavy while executing. But my question is why the memory so high even when its only listening for messages? |
Its just shoryuken, you can get the same issue with a worker that does nothing |
So what is my option here? Is there a workaround to this? Should I change to sidekiq or something else? |
@twizzzlers you can try Sidekiq, it's using concurrent-ruby, please if you do, let me know the results. I couldn't easily reproduce the memory issue, but that's happening for sure for some people. Long time ago, when I was using Sidekiq (Celluloid implementation), I use to auto restart Sidekiq using monit service checks, but my workers were doing some (headless) browser operations with libraries known for leaking memory. |
Either way, its an issueand until it can be nailed down to a particular library or code path the issue should be open. |
I agree with @Senjai. Since this is an outstanding issue, it should remain open regardless of the amount of activity. We are seeing this in our app but I did not realize the gem had a memory leak. Our process has been dying in the perform method due to some data missing in the body that we were expecting. Error on Heroku: It's causing our memory quota to reach 1GB within 6 hours. See image below. |
@Senjai @kwokster10 I don't mind on reopening it, but unfortunately, I can't work on this at moment. Not sure if we can do much with Celluloid, I guess the fix would be to migrate to concurrent-ruby as the other cool guys. |
We are staying with this issue here in my company for now while we study other options, altough we'd like to continue with shoryuken since it works so much better with AWS SQS... |
Hey! I started to work on the |
@Senjai @twizzzlers @kwokster10 @gshutler are you using any middleware? From shoryuken or any other repo? |
Does not matter. I found an issue. It was not Celluloid, it was not Ruby (or at least not just Celluloid and not just Ruby), it was the way we handled bookkeeping of threads objects. |
@mariokostelac amazing work on this 👏 Shoryuken 2.1.2 is out with your fix. For more details: CHANGELOG. |
@mariokostelac I don't believe we are using any middleware, only our custom |
@kwokster10 does not matter. We've explained what was the problem and how we've fixed it. Thanks for replying. |
Hey all, Shoryuken v3 is out 🎉 using concurrent-ruby instead of Celulloid, it should fix this leaking issue. Based on this (thanks @paul) the overall memory footprint is lower and more consistent. Besides that, I added some cool/handy SQS commands, such as Full CHANGELOG. |
That's great! Will have to try this one out. Btw @phstc, does Shoryuken use semver? ie. when you incremented the major version, does that mean it's not backwards-compatible? |
Hi @elsurudo Shoryuken uses semver. The version 3.0.0, introduces some incompatible API changes depending on how you use Shoryuken. For instance, deprecation warnings and setting up |
Hi all Just released a new version of Shoryuken 3.1.0 with a new Shoryuken now also supports processing groups (set of queues), which means it supports concurrency, exclusive fetcher, long polling etc per group 🤘 |
Memory seems to grow slowly but without limit when using shoryuken with rails environment, a single worker and waiting on an empty queue.
Any ideas what could be causing this behaviour?
The text was updated successfully, but these errors were encountered: