Fix for disconnect issues in #732 #784

rolette · 2016-09-10T18:55:04Z

I haven't finished all of my testing yet, but initial signs are all good and it passes your tox tests. There were a couple of behavior changes required to fully fix it, so I wanted to get your feedback on my approach.

The socket.shutdown() call in Connection.disconnect() wasn't safe if the process was forked and the child process inherited the file descriptor. Since the shutdown() call is desirable for faster cleanup in the non-forked case, I added a new fork_safe option to Connection.__init__() that let's the code do the safe thing by default or take advantage of faster socket cleanup in a non-forked environment.

That also allowed us to skip doing anything in _checkpid() if we know the process isn't forked.

The other change fixes ConnectionPool.disconnect() so it doesn't rip connections that are in-use out from under the threads that own them. I use a generation counter to flush connections safely.

A side-effect of this is that ConnectionPool.disconnect() doesn't actually disconnect anything immediately. Connections are disconnected gradually. I added an option to force the old behavior of "damn the torpedoes!" killing all connections, but the default is the safe path.

Let me know what you think.

Connection.disconnect() were ripping connections out from under the threads or processes that owned them redis#732

rolette · 2016-09-15T15:32:41Z

Extended testing still looks good on the fix.

andymccurdy · 2016-09-15T19:53:39Z

@rolette I don't see any tests included with the PR. Maybe you forgot to push them?

rolette · 2016-09-15T20:02:55Z

@andymccurdy I didn't create any new redis-py tests... I ran the existing tests to make sure I didn't break anything. The rest of the testing I was referring to was testing we do in my product where I was seeing the various issues originally.

rolette · 2016-09-16T12:35:41Z

@andymccurdy To add a little more context, I'm not really sure how you'd go about adding a test that fits into the redis-py test infrastructure.

The errors that are generated due to the disconnects ripping connections out from under the owner are very timing dependent. It requires some run time to generate the problem and there's really no fixed number of iterations you can do and say definitively that it is fixed.

I was previously able to repro some form of the issue in a reasonable amount of time within my product, but it's a complex environment (see my previous description in the bug thread). With this patch, I haven't been able to reproduce it yet.

Beyond gaining more confidence in the fix with accumulated runtime, IMO, code review to validate the fix is probably the way to go.

rolette · 2016-10-07T20:43:17Z

I've been running a month with this fix and haven't seen the disconnect issues from #732 since applying it. I'm calling it done from my side. Can we get this merged into the main repo @andymccurdy?

jhgg · 2017-02-09T06:28:29Z

Would love to see this merged. We run into a similar issue as well - especially when using the sentinel connection pool. When the sentinel elects a new master, we get a lot of exceptions until fully restarting the API servers

jhgg · 2017-02-21T22:54:35Z

We've merged this into our fork - and found that this impl does not work w/ the BlockingConnectionPool. I implemented similar logic ontop of that onto our fork, discord#1

@rolette - take a look and see what you think if you have a chance - and if you want to add that to this PR.

rolette · 2017-02-22T00:12:59Z

@jhgg - Yeah, we don't use BlockingConnectionPool in my product, so I didn't touch it.

It's been 5 months since I submitted the the PR, so not sure @andymccurdy has any interest in merging it, but I'll try to take a look at your additions this weekend.

jhgg · 2017-02-22T19:21:33Z

Thanks! I also implemented some tests for the blocking connection pool logic.

andymccurdy · 2019-02-01T06:00:03Z

@rolette I've finally got around to resolving this. We already have a "unique-ish" id in the PID on both the Connection and the ConnectionPool so I'm using that as the guard against a child calling shutdown on a socket that wasn't created by them. I've also added some tests using the multiprocessing lib (that uses fork) to make sure we're doing the right thing.

Sorry again for the length of time but thanks for raising this issue and putting together this PR. It was definitely helpful.

andymccurdy · 2019-02-17T22:20:26Z

3.2.0 has been released with these changes.

fixes the various scenarios where ConnectionPool.disconnect() and

2cc9baf

Connection.disconnect() were ripping connections out from under the threads or processes that owned them redis#732

rolette mentioned this pull request Sep 15, 2017

Fix re-used connections after fork (again) #863

Closed

rolette mentioned this pull request Jan 25, 2018

Sentinel erroring out when connection lost #946

Closed

popravich mentioned this pull request Nov 21, 2018

Fix Redis disconnect handling celery/kombu#954

Merged

andymccurdy closed this in 4e1e748 Feb 1, 2019

rolette mentioned this pull request May 15, 2020

AttributeError: 'NoneType' object has no attribute 'close' #1345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for disconnect issues in #732 #784

Fix for disconnect issues in #732 #784

rolette commented Sep 10, 2016

rolette commented Sep 15, 2016

andymccurdy commented Sep 15, 2016

rolette commented Sep 15, 2016

rolette commented Sep 16, 2016 •

edited

Loading

rolette commented Oct 7, 2016

jhgg commented Feb 9, 2017

jhgg commented Feb 21, 2017

rolette commented Feb 22, 2017

jhgg commented Feb 22, 2017

andymccurdy commented Feb 1, 2019

andymccurdy commented Feb 17, 2019

Fix for disconnect issues in #732 #784

Fix for disconnect issues in #732 #784

Conversation

rolette commented Sep 10, 2016

rolette commented Sep 15, 2016

andymccurdy commented Sep 15, 2016

rolette commented Sep 15, 2016

rolette commented Sep 16, 2016 • edited Loading

rolette commented Oct 7, 2016

jhgg commented Feb 9, 2017

jhgg commented Feb 21, 2017

rolette commented Feb 22, 2017

jhgg commented Feb 22, 2017

andymccurdy commented Feb 1, 2019

andymccurdy commented Feb 17, 2019

rolette commented Sep 16, 2016 •

edited

Loading