-
Notifications
You must be signed in to change notification settings - Fork 151
Writes fail with ConnectionPool::PoolShuttingDownError #345
Comments
I started getting this error too. Restarting solved. Now I want to know why is this happening. I have these settings: production:
sessions:
default:
uri: <%= ENV['MONGO_URI'] %>
options:
timeout: 20
pool_timeout: 5
pool_size: 32 I'm using puma with 1 worker and 16:64 threads. |
Same error here. Similar configuration but using Mongoid 4.0.1 and Moped 2.0.3 Puma with 3 workers and 8:16 Threads BTW and not sure if is related, i have the pool_timeout set on 5 and still getting "Waited 0.5 sec" Timeout error |
Got this issue too since my last update of moped and mongoid. |
Seeing this as well. Mongoid 4.0.1 / Moped 2.0.3. Running pool sizes 5-50 on different parts of my app. Default pool timeout. Seems most frequent in multi-threaded environments (sidekiq + my own celluloid app). Don't think I've seen this in my "single-threaded" Rails instances. |
I think I see the issue. Check out moped/node.rb line 153:
@pool will hold its value until I'll monkey-patch and report back. |
I have a PR that has been sitting there for all of these issues. PR #338 IMHO Durran's connection fix isn't adequate. There are still some issues to resolve with the way moped handles stepdowns/failovers etc, but I think first we have to get the connection pool behaving properly. |
I've got this issue again using master branch. |
Fyi, been running this well in production at artsy.net for the last 72 hours. Haven't seen this error again. |
I bumped to latest master 2.0.4 (mongoid 4.0.1) and am seeing this error like crazy still. |
This issue is happening to us as well. Mongoid (4.0.2) Is there a particular old version that is not experiencing this error? I am thinking of just using an old version, since it seems to be (as per thread) regression. |
@ardeearam try this in your Gemfile:
Fixed this error almost entirely on my 150+ concurrency sidekiq instances. #348 still pops up though. |
Thanks @fedenusy! Will try it and will let you know. |
@fedenusy restarted the servers with the operation_timeout branch deployed... Seems ok now. Will observe for days then I'll get back to you all. |
Seeing this particular error pop up like crazy using moped 2.0.4, currently deploying the patches mentioned by @fedenusy to see how that goes. If it helps, the backtraces look like the following:
|
The patches mentioned by @fedenusy seem to do the trick, I'm not seeing the pool errors anymore. |
still seeing this error on 2.0.4 😞 |
I'm also still seeing this issue on 2.0.4 |
some issue on 2.0.4 |
I'm also seeing this on Windows with version 2.0.4. Seems to happen (this is just an hypothesis) when my process spawns a number of threads (each one conntect to mongo via mongoid) greater than then mogoid pool size. |
@fedenusy / @wandenberg, what are the commits which fix this? I have my own fork because of other patches that we need. I'd like to cherry-pick them into my fork. We just upgraded production last night and this error took part of us down until all the processes were restarted. |
+1 |
We "fixed" it by reverting to |
Still happening on v2.0.4. |
+1
working so far, too. |
Give this branch a try. We upgraded from 1.5 to 2.0 about 3 weeks ago and have seen absolutely horrible failover handling with Moped 2.0. We finally now can do stepdowns in production without a single error and haven't seen this error anymore. I cherry-picked in various commits from other pulls that address this and also added many commits of my own to handle different failure scenarios. https://github.com/jonhyman/moped/tree/feature/15988-and-logging It has some extra logging in there that I've been using as we've been doing failover testing, so feel free to fork and remove if you inspect your Moped logs. We've also tested |
@jonhyman, your branch is fantastic! When can we expect a PR? |
Thanks. Yeah, we've used my branch through multiple stepdowns and incidents (like a mongos running out of memory, a physical host failing, etc.) and it holds up great now. I don't think there's any support for Moped anymore. @durran are you interested in me cleaning up my branch (basically just removing the log statements I added) and letting me submit a pull? |
@jonhyman Yeah please do... I'll merge in and release a new version then. |
See #380 for the up-to date MR fixing this. |
Doing a Should I expect any side-effects from this update on the EDIT: And thanks to all involved in fixing this, great work. |
I have not tested 3.2.1. We are using bson 2.3.0. |
@jonhyman: Doesn't |
We're still on my branch I made on May 20. I never bothered to upgrade to 2.0.7 because my branch is working fine for us, and the next mongoid upgrade we are doing is to mongoid 5 / official driver. |
@jonhyman : Ah OK. Thanks. Has anyone else tested |
I've been running moped 2.0.7 + bson 3.2.1 for about a week now. Seems good
|
@fedenusy: Thank you very much. Nice to know 👍 |
We are usin moped at 2.0.7 and bson at 3.2.1 and still getting about 6k |
I also got a few them with |
I have broken ReplicaSet were slave is DOWN and primary is OK. So got tons of I downgraded from |
I upgraded everything and am now running:
This is, I think, the last good version of mongoid before 5.0.0 and the switch away from moped entirely. If it is minimal impact to bump up everything to that spec, it might be worth a try... |
I can confirm that downgrading from |
To the people who downgraded their version of |
I forget the exact errors, it's been a while, but it was something like dropping connections or excessive shutdowns. I think I recall something with "Pool" in it... Connection level stuff.
But this spammed me with TONS of warnings about connection considerations; even when log level was up at |
Something happens in our MongoDB configuration (likely a step down of a master or some kind of failure) that causes all servers to start reporting the following error:
ConnectionPool::PoolShuttingDownError
on database writes to primary.This is recent, we've seen it take out all our writes twice.
Moped 2.0.3
Mongoid 4.0.0
The text was updated successfully, but these errors were encountered: