-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop panicking when ClientRequests return an error #1531
Conversation
4a3d53f
to
e0dc7ac
Compare
e0dc7ac
to
9713725
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good though it's missing the MustUseSender change that we talked about. Are you planning on doing that as a follow up? My preference would be to do it as part of this PR or to implement the follow up PR before merging this just so we make sure to get that change done, I feel that it is important.
This comment has been minimized.
This comment has been minimized.
Previously, tx would be dropped before send if: - the success case would have used tx to wait for further messages, - but the response was actually an error. Instead, send the error on `tx` and call `fail_with()` using the same error. To support this change, allow `fail_with()` to take a `PeerError` or a `SharedPeerError`.
The previous code would send a Nil message on the Sender, even if the result was actually an error.
This comment has been minimized.
This comment has been minimized.
In case I miss you I figured I should dump the design for the fix that I came up with.
|
This fix also changes heartbeat behaviour in the following ways: * if the queue is full, the connection is closed. Previously, the sender would wait until the queue had emptied * if the queue flush fails, Zebra panics, because it can't send an error on the ClientRequest sender, so the invariant is broken
bd397c1
to
2dfba5f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Iunno, I feel like the fix I suggested is less complicated, even if it requires more code, and it will be more maintainable going forward. With the whole |
How can we make sure we have the correct invariant? We need make sure that:
I'm happy to implement a more complex design to make future changes easier. But we seem to have tried a few different alternative designs here, so I want to make sure we settle on something that works. |
The latest commit seems to be a lot more stable. So I think we've identified almost all the places where a I'll try implementing the |
The `peer::Client` translates `Request`s into `ClientRequest`s, which it sends to a background task. If the send is `Ok(())`, it will assume that it is safe to unconditionally poll the `Receiver` tied to the `Sender` used to create the `ClientRequest`. We enforce this invariant via the type system, by converting `ClientRequest`s to `InProgressClientRequest`s when they are received by the background task. These conversions are implemented by `ClientRequestReceiver`. Changes: * Revert `ClientRequest` so it uses a `oneshot::Sender` * Add `InProgressClientRequest`, which is the same as `ClientRequest`, but has a `MustUseOneshotSender` * `impl From<ClientRequest> for InProgressClientRequest` * Add a new `ClientRequestReceiver` type that wraps a `mpsc::Receiver<ClientRequest>` * `impl Stream<InProgressClientRequest> for ClientRequestReceiver`, converting the successful result of `inner.poll_next_unpin` into an `InProgressClientRequest` * Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection` with the new `ClientRequestReceiver` type * `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
Reverts most of "Instrument some functions to try to locate the panic"
It was a bit more complicated than I expected, because we call As a consequence, I needed to implement @yaahc if you're happy with this PR, let's rebase-merge it, in case we need to revert or inspect some of these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
impl Stream for ClientRequestReceiver { | ||
type Item = InProgressClientRequest; | ||
|
||
/// Converts the successful result of `inner.poll_next()` to an | ||
/// `InProgressClientRequest`. | ||
fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> { | ||
match self.inner.poll_next_unpin(cx) { | ||
Poll::Ready(client_request) => Poll::Ready(client_request.map(Into::into)), | ||
// `inner.poll_next_unpin` parks the task for this future | ||
Poll::Pending => Poll::Pending, | ||
} | ||
} | ||
|
||
/// Returns `inner.size_hint()` | ||
fn size_hint(&self) -> (usize, Option<usize>) { | ||
self.inner.size_hint() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh yea, didn't see this one coming. Sorry about that and great work.
Motivation
Zebra's connection-handling code panics because it drops the ClientRequest oneshot sender in the error case (#1471).
Solution
The code in this pull request has:
Review
I need @yaahc's help with some ownership issues.
Related Issues
Closes #1471
Closes #1510 - fixes at least one known cause of this panic
Closes #1027 - matches are now exhaustive (some patterns contain partial wildcards, but those matches are panics)
Partial work on #1435 - fixes multiple potential hangs, but testing shows we still sometimes end up with an empty or very small peer set
Follow Up Work
Refactor zebra-network Bitcoin to Zebra protocol translation layer #1515
Test translation for zebra-network::{Request, Response} protocol #1048
Split up the large files and functions in zebra-network #1557
Check that these panics are resolved:
Make heartbeats timeout if the connection request queue stays full #1551