Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel: Master didn't update after sentinel failover without loosing connection to old master #435

Closed
reclosedev opened this issue Feb 5, 2014 · 8 comments

Comments

@reclosedev
Copy link
Contributor

I have sentinel managed redis instance:
redis = sentinel.master_for(...)
If failover happens without closing connection to old master, e.g. in redis-cli on sentinel:
> sentinel failover service_name
Then, after trying to write something (redis.set('test', 'test') I got error
ResponseError("READONLY You can't write against a read only slave.")

The reason is execute_command recconects only in case of ConnectionError.

I came with following workaround:

class SentinelAwareStrictRedis(StrictRedis):
    """ Handles case when failover happened without loosing connection to old master
    """
    def execute_command(self, *args, **options):
        try:
            return super(SentinelAwareStrictRedis, self).execute_command(*args, **options)
        except ResponseError as e:
            if e.message.startswith("READONLY"):
                self.connection_pool.get_master_address()
                return super(SentinelAwareStrictRedis, self).execute_command(*args, **options)
            raise

usage:

...
sentinel.master_for(..., redis_class=SentinelAwareStrictRedis)

If this solution is fine, I think it could be added as default redis_class for Sentinel.

@abe-winter
Copy link

This is an important fix -- routine network blips in AWS cause sentinel reelections once a week or so in my cluster. My redis-py clients go nuts when it happens, have to be reset.

@reclosedev
Copy link
Contributor Author

After more testing, it appears that connection_pool.disconnect() has to be called.
Also, if you use pipelines, you need to handle ResponseErrors in pipeline.
Code that I use now (logging removed):

class SentinelAwareStrictRedis(StrictRedis):
    """ Handles case when failover happened without loosing connection to old master
    """
    def execute_command(self, *args, **options):
        try:
            return super(SentinelAwareStrictRedis, self).execute_command(*args, **options)
        except ResponseError as e:
            if not e.message.startswith("READONLY"):
                raise
            old_master = self.connection_pool.master_address
            new_master = self.connection_pool.get_master_address()
            self.connection_pool.disconnect()
            return super(SentinelAwareStrictRedis, self).execute_command(*args, **options)

    def pipeline(self, transaction=True, shard_hint=None):
        return SentinelAwareStrictPipeline(
            self.connection_pool,
            self.response_callbacks,
            transaction,
            shard_hint
        )


class SentinelAwareStrictPipeline(StrictPipeline):
    def execute(self, raise_on_error=True):
        stack = self.command_stack
        try:
            return super(SentinelAwareStrictPipeline, self).execute(raise_on_error)
        except ResponseError as e:
            if "READONLY" not in e.message:
                raise
            if self.watching:
                raise WatchError("Sentinel failover occurred while watching one or more keys")
            # restore all commands
            self.command_stack = stack
            old_master = self.connection_pool.master_address
            new_master = self.connection_pool.get_master_address()
            self.reset()
            self.connection_pool.disconnect()
            return super(SentinelAwareStrictPipeline, self).execute(raise_on_error)

I'm not sure if SentinelAwareStrictPipeline implementation is correct, because sometimes pipe.execute() returns empty response, but I couldn't reproduce it in test environmet and had no time to debug it.

@andymccurdy
Copy link
Contributor

@abe-winter @reclosedev

I believe the fix I just pushed should fix your issue. I just caught the ReadOnlyError in SentinelManagedConnection.send_command. You shouldn't have to modify the client or pipeline classes to make this work.

If you have any issues with this change, please feel free to re-open the issue.

@reclosedev
Copy link
Contributor Author

I've just tried v2.10.0.

ReadOnlyError exception occures in connection.read_response(). So fix from ef05364 doesn't force to query new master.

Overriding read_response() with "try/except readonly/disconnect" works.

@andymccurdy
Copy link
Contributor

@reclosedev You're right, the try/execpt readonly/disconnect logic should be in release_response(). Not sure why I put it in send_command().

@andymccurdy
Copy link
Contributor

This is fixed in 2.10.1, which just went out on pypi. I need to get a better test suite for sentinel logic.

@abe-winter
Copy link

I have some test cases for this which I wrote when the READONLY error first started happening. Do you want the code?

@reclosedev
Copy link
Contributor Author

@andymccurdy Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants