-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for disconnect issues in #732 #784
Conversation
Connection.disconnect() were ripping connections out from under the threads or processes that owned them redis#732
Extended testing still looks good on the fix. |
@rolette I don't see any tests included with the PR. Maybe you forgot to push them? |
@andymccurdy I didn't create any new redis-py tests... I ran the existing tests to make sure I didn't break anything. The rest of the testing I was referring to was testing we do in my product where I was seeing the various issues originally. |
@andymccurdy To add a little more context, I'm not really sure how you'd go about adding a test that fits into the redis-py test infrastructure. The errors that are generated due to the disconnects ripping connections out from under the owner are very timing dependent. It requires some run time to generate the problem and there's really no fixed number of iterations you can do and say definitively that it is fixed. I was previously able to repro some form of the issue in a reasonable amount of time within my product, but it's a complex environment (see my previous description in the bug thread). With this patch, I haven't been able to reproduce it yet. Beyond gaining more confidence in the fix with accumulated runtime, IMO, code review to validate the fix is probably the way to go. |
I've been running a month with this fix and haven't seen the disconnect issues from #732 since applying it. I'm calling it done from my side. Can we get this merged into the main repo @andymccurdy? |
Would love to see this merged. We run into a similar issue as well - especially when using the sentinel connection pool. When the sentinel elects a new master, we get a lot of exceptions until fully restarting the API servers |
@jhgg - Yeah, we don't use It's been 5 months since I submitted the the PR, so not sure @andymccurdy has any interest in merging it, but I'll try to take a look at your additions this weekend. |
Thanks! I also implemented some tests for the blocking connection pool logic. |
@rolette I've finally got around to resolving this. We already have a "unique-ish" id in the PID on both the Connection and the ConnectionPool so I'm using that as the guard against a child calling shutdown on a socket that wasn't created by them. I've also added some tests using the multiprocessing lib (that uses fork) to make sure we're doing the right thing. Sorry again for the length of time but thanks for raising this issue and putting together this PR. It was definitely helpful. |
3.2.0 has been released with these changes. |
I haven't finished all of my testing yet, but initial signs are all good and it passes your tox tests. There were a couple of behavior changes required to fully fix it, so I wanted to get your feedback on my approach.
socket.shutdown()
call in Connection.disconnect() wasn't safe if the process was forked and the child process inherited the file descriptor. Since theshutdown()
call is desirable for faster cleanup in the non-forked case, I added a newfork_safe
option toConnection.__init__()
that let's the code do the safe thing by default or take advantage of faster socket cleanup in a non-forked environment.That also allowed us to skip doing anything in
_checkpid()
if we know the process isn't forked.ConnectionPool.disconnect()
so it doesn't rip connections that are in-use out from under the threads that own them. I use a generation counter to flush connections safely.A side-effect of this is that
ConnectionPool.disconnect()
doesn't actually disconnect anything immediately. Connections are disconnected gradually. I added an option to force the old behavior of "damn the torpedoes!" killing all connections, but the default is the safe path.Let me know what you think.