Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle Thread#raise and Thread#kill #963

Merged
merged 1 commit into from
May 30, 2023

Conversation

casperisfine
Copy link
Contributor

Fix: #956

While it's heavily discouraged, Timeout.timeout end up being relatively frequently used in production, so ideally it's better to try to handle it gracefully.

This patch is inspired from redis-rb/redis-client@5f82254 before sending a request we increment a counter, and once we fully read the response(s), we decrement it.

If the counter is not 0 when we start a request, we know the connection may have unread responses from a previously aborted request, and we automatically discard it.

@petergoldstein
Copy link
Owner

I need to give this a closer look, but conceptually the changes look good. Thanks @casperisfine .

Would you mind addressing the various lints? Six of them will autocorrect and I think most or all of the rest are suppressed exceptions, which you can override locally as part of the PR.

@casperisfine casperisfine force-pushed the handle-async-interrupt branch 3 times, most recently from fa5cf15 to 290159d Compare May 16, 2023 13:05
@casperisfine
Copy link
Contributor Author

Thanks for the review. I addressed the linter failures and updated the CHANGELOG.

Copy link

@cornu-ammonis cornu-ammonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @casperisfine - I think this is really close. It doesn't quite solve my original test case for multiget - if the interrupt occurs at this critical point, we will have already sent the getkq ops to memcached but @request_in_progress will be false, so the socket can still be subsequently re-used with an incomplete read.

@@ -66,9 +70,9 @@ def unlock!; end
# Returns nothing.
def pipeline_response_setup
verify_state(:getkq)
@connection_manager.start_request!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd have already called start_request! by the time we get here, since we've already sent the :getkq commands at this point.

maybe per the comment in def pipelined_get , we could move the noop along with start_request! and verify_state to that point? I tried that with my original PR and now that you've changed the write method to pull out start_request!, it can be even simpler to do, making the rest of what I had in that PR unnecessary.

@casperisfine casperisfine force-pushed the handle-async-interrupt branch from 290159d to 738bfd6 Compare May 17, 2023 06:59
@casperisfine
Copy link
Contributor Author

@cornu-ammonis yeah good catch 🤦 .

I refactored the code a bit to handle this, I'll add a couple comments on the diff.

response = send(opkey, *args)

# pipelined_get emit query but doesn't read the response(s)
unless opkey == :pipelined_get
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having to single out that one opkey, but request is clearly defined as a choke point, so I assume going through another method isn't desirable as it may break various monitoring patches.

Also since it takes *args, I can't really have the caller pass an argument to tell the method that the request is incomplete.

But if you are ok with having a specialized send_request method or something like that, I'm happy to refactor this.

@casperisfine casperisfine force-pushed the handle-async-interrupt branch 2 times, most recently from 3555457 to 9e4f51d Compare May 17, 2023 07:34
Fix: petergoldstein#956

While it's heavily discouraged, `Timeout.timeout` end up being
relatively frequently used in production, so ideally it's better
to try to handle it gracefully.

This patch is inspired from redis-rb/redis-client@5f82254
before sending a request we flip a flag, and once we fully read the
response(s), we flip it back.

If the flag is not `false` when we start a request, we know the connection
may have unread responses from a previously aborted request, and we
automatically discard it.
@casperisfine casperisfine force-pushed the handle-async-interrupt branch from 9e4f51d to 0f2b374 Compare May 17, 2023 07:37
@cornu-ammonis
Copy link

@cornu-ammonis yeah good catch 🤦 .

I refactored the code a bit to handle this, I'll add a couple comments on the diff.

Great 👍 , I confirmed that this handles my multiget test case. It reconnects instead of returning an incorrect response. Thanks!

@petergoldstein
Copy link
Owner

Not sure why initially we had so many spec failures. Rerunning things got everything green.

Minor nit that verify_pipelined_state doesn't need an argument, since it's a new method, internal only, and the argument is unused. But that can be fixed post-merge.

And the other unrelated changes (consolidating on byroot in the CHANGELOG) are fine.

Thanks @byroot

@petergoldstein petergoldstein merged commit c01d410 into petergoldstein:main May 30, 2023
@casperisfine casperisfine deleted the handle-async-interrupt branch November 4, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dalli sometimes returns incorrect values
4 participants