Skip to content

Add middleware to ensure Router only handles errors for its clients #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

KJTsanaktsidis
Copy link
Contributor

@KJTsanaktsidis KJTsanaktsidis commented Jan 8, 2024

If you nest usages of RedisClient (perhaps becuase you're scanning from one client whilst inserting into a different cluster, perhaps), it's possible for errors which were raised from one cluster's RedisClients to instead be handled by the wrong cluster.

This is bad, because it can mean processing bogus topology "updates" when e.g. MOVED or ASKING responses get caught by the wrong client.

To fix this, we can add a middleware to our RedisClient instances (through the documented middleware interface), and tag errors at the source based on what config object they came from. This can then be used inside the router to identify errors that it should and should not handle.


Although at the moment you have to try pretty hard to get into this situation (sscan and hscan are probably the main ways to get it), solving this is important for my plan for transaction support. Recall that last year one of the reasons I wanted to wrap the returned RedisClient instances from#with in a proxy was to make sure they didn't leak errors to each other: #298 (comment)

One reason I think we might need the proxy implementation, though, is what happens if we have two different client instances, that point to totally different Redis clusters? What happens if we get a MOVED response here:

client1.with(key: '{slot1}', retry_count: 1) do |conn1|
  client2.with(key: '{slot2}', retry_count: 1) do |conn2|
    conn1.call('SET', '{slot1}key', 'value')
  end
end

The MOVED response is for client1's cluster, but client2's #with method is what catches the exception. We'd then wind up moving the slot on the wrong client instance. The benefit of the proxy object around conn is that we can catch the exceptions right there with the client that generated them, so we know for sure we're the cluster that the exception was for.

If we do this in RedisClient middleware instead, we can identify errors correctly right at the source, and that should make it safe to pass the raw RedisClient instances to callers, removing the need for a proxy over the top of it.

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/handle_own_errors branch 2 times, most recently from 95eab7f to 64fb6ab Compare January 8, 2024 05:51
If you nest usages of RedisClient (perhaps becuase you're scanning from
one client whilst inserting into a different cluster, perhaps), it's
possible for errors which were raised from one cluster's RedisClients to
instead be handled by the wrong cluster.

This is bad, because it can mean processing bogus topology "updates"
when e.g. MOVED or ASKING responses get caught by the wrong client.

To fix this, we can add a middleware to our RedisClient instances
(through the documented middleware interface), and tag errors at the
source based on what config object they came from. This can then be used
inside the router to identify errors that it should and should not
handle.
@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/handle_own_errors branch from 64fb6ab to 65cef5a Compare January 8, 2024 05:52
@supercaracal supercaracal self-requested a review January 8, 2024 07:00
@supercaracal
Copy link
Member

supercaracal commented Jan 9, 2024

Thank you for your pull request.

I feel this use case isn't a bit general. I'd say that most of the users don't need the handling with the middleware. I'm wondering either these logic should be included to our gem or not. Please give me some time for the consideration.

@KJTsanaktsidis
Copy link
Contributor Author

If you’ll allow me to try and sell it a bit… this is needed for the gem to take responsibility for its own correctness, as it were. The underlying redis clients for the cluster clients are the property of the cluster client, and allowing user code “in between” the cluster client and the underlying redis client (like with #with, #scan etc) means that the cluster client can’t properly maintain its own invariants.

wrapping the raw clients is one way to fix this, but I think the solution in this PR (using middleware) is a more elegant and less invasive way.

I think this problem comes up practically in migration scenarios - like in a script to copy things from one cluster to another (like in the README), or perhaps during “dual writing” in applications where they might write to an old and new redis cluster at the same time.

@KJTsanaktsidis
Copy link
Contributor Author

@supercaracal any further thoughts on this? If you like I can have another try at implementing this fix through a custom client_implementation instead, and we can compare how it looks?

@supercaracal
Copy link
Member

Sorry for my delayed reviewing. As you said, I thought the middleware pattern is more elegant than wrapping the raw client. Although I thought it's not required and it seems like to be responsible for the user side, on second thought, I thought this safety net is nice-to-have. So I'll merge this pull request. Thank you for the improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants