Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop panicking when ClientRequests return an error #1531

Merged
merged 13 commits into from
Jan 6, 2021

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Dec 16, 2020

Motivation

Zebra's connection-handling code panics because it drops the ClientRequest oneshot sender in the error case (#1471).

Solution

  • call ClientRequest.tx.send even if there is an error
  • document this invariant

The code in this pull request has:

Review

I need @yaahc's help with some ownership issues.

Related Issues

Closes #1471
Closes #1510 - fixes at least one known cause of this panic

Closes #1027 - matches are now exhaustive (some patterns contain partial wildcards, but those matches are panics)

Partial work on #1435 - fixes multiple potential hangs, but testing shows we still sometimes end up with an empty or very small peer set

Follow Up Work

Refactor zebra-network Bitcoin to Zebra protocol translation layer #1515
Test translation for zebra-network::{Request, Response} protocol #1048
Split up the large files and functions in zebra-network #1557

Check that these panics are resolved:

Make heartbeats timeout if the connection request queue stays full #1551

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code labels Dec 16, 2020
@teor2345 teor2345 added this to the v1.0.0-alpha.1 milestone Dec 16, 2020
@teor2345 teor2345 requested a review from yaahc December 16, 2020 06:55
@teor2345 teor2345 self-assigned this Dec 16, 2020
@teor2345 teor2345 changed the title Client request panic Stop panicking when ClientRequests return an error Dec 16, 2020
@teor2345 teor2345 marked this pull request as draft December 16, 2020 06:59
@teor2345 teor2345 changed the title Stop panicking when ClientRequests return an error WIP: Stop panicking when ClientRequests return an error Dec 16, 2020
@teor2345 teor2345 force-pushed the client-request-panic branch from 4a3d53f to e0dc7ac Compare December 17, 2020 08:52
@teor2345 teor2345 changed the title WIP: Stop panicking when ClientRequests return an error Stop panicking when ClientRequests return an error Dec 18, 2020
@teor2345 teor2345 force-pushed the client-request-panic branch from e0dc7ac to 9713725 Compare December 18, 2020 07:23
@teor2345 teor2345 marked this pull request as ready for review December 18, 2020 07:23
@teor2345 teor2345 changed the base branch from zcashd-missing-blocks to main December 18, 2020 07:35
@teor2345 teor2345 changed the base branch from main to zcashd-missing-blocks December 18, 2020 07:35
Copy link
Contributor

@yaahc yaahc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good though it's missing the MustUseSender change that we talked about. Are you planning on doing that as a follow up? My preference would be to do it as part of this PR or to implement the follow up PR before merging this just so we make sure to get that change done, I feel that it is important.

@teor2345

This comment has been minimized.

Previously, tx would be dropped before send if:
- the success case would have used tx to wait for further messages,
- but the response was actually an error.

Instead, send the error on `tx` and call `fail_with()` using the same
error.

To support this change, allow `fail_with()` to take a `PeerError` or a
`SharedPeerError`.
The previous code would send a Nil message on the Sender, even if the
result was actually an error.
@teor2345

This comment has been minimized.

@yaahc
Copy link
Contributor

yaahc commented Jan 5, 2021

In case I miss you I figured I should dump the design for the fix that I came up with.

  • Update ClientRequest to use a regular sender instead of a MustUseSender
  • Add a new type, InProgressClientRequest, that is the same as ClientRequest but has a MustUseSender
  • impl From<ClientRequest> for InProgressClientRequest
  • Add a new ClientRequestReceiver type that implements only one function, "async fn next(&self) -> Option" that wraps a mpsc::Receiver<ClientRequest>, calls next on it, and then calls into on it to convert it to the must use variant of a client request
  • Replace pub(super) client_rx: mpsc::Receiver<ClientRequest>, in Connection in connection.rs with the new ClientRequestReceiver type

This fix also changes heartbeat behaviour in the following ways:
* if the queue is full, the connection is closed. Previously, the sender
  would wait until the queue had emptied
* if the queue flush fails, Zebra panics, because it can't send an error
  on the ClientRequest sender, so the invariant is broken
@teor2345 teor2345 force-pushed the client-request-panic branch from bd397c1 to 2dfba5f Compare January 5, 2021 01:02
@teor2345

This comment has been minimized.

@teor2345

This comment has been minimized.

@teor2345

This comment has been minimized.

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
@yaahc
Copy link
Contributor

yaahc commented Jan 5, 2021

This seems like a good fix, but it's also a bit complicated, so I'd like to avoid as much of that complexity if we can.

Iunno, I feel like the fix I suggested is less complicated, even if it requires more code, and it will be more maintainable going forward. With the whole InProgressClientRequest design we end up encoding the exact shape of the invariant Client needs us to uphold. The fix you suggested only fixes this one specific instance of misuse, but if someone else accidentally creates and drops a ClientRequest they'll run into the exact same problem even though there was never actually an issue with them dropping that ClientRequest because it was never sent to the Connection.

@teor2345
Copy link
Contributor Author

teor2345 commented Jan 5, 2021

With the whole InProgressClientRequest design we end up encoding the exact shape of the invariant Client needs us to uphold

How can we make sure we have the correct invariant?

We need make sure that:

I'm happy to implement a more complex design to make future changes easier. But we seem to have tried a few different alternative designs here, so I want to make sure we settle on something that works.

@teor2345
Copy link
Contributor Author

teor2345 commented Jan 5, 2021

The latest commit seems to be a lot more stable. So I think we've identified almost all the places where a ClientRequest is dropped without sending anything.

I'll try implementing the InProgressClientRequest design, and see how we go.

@teor2345 teor2345 marked this pull request as ready for review January 6, 2021 01:16
The `peer::Client` translates `Request`s into `ClientRequest`s, which
it sends to a background task. If the send is `Ok(())`, it will assume
that it is safe to unconditionally poll the `Receiver` tied to the
`Sender` used to create the `ClientRequest`.

We enforce this invariant via the type system, by converting
`ClientRequest`s to `InProgressClientRequest`s when they are received by
the background task. These conversions are implemented by
`ClientRequestReceiver`.

Changes:
* Revert `ClientRequest` so it uses a `oneshot::Sender`
* Add `InProgressClientRequest`, which is the same as `ClientRequest`,
  but has a `MustUseOneshotSender`
* `impl From<ClientRequest> for InProgressClientRequest`

* Add a new `ClientRequestReceiver` type that wraps a
  `mpsc::Receiver<ClientRequest>`
* `impl Stream<InProgressClientRequest> for ClientRequestReceiver`,
  converting the successful result of `inner.poll_next_unpin` into an
  `InProgressClientRequest`

* Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection`
  with the new `ClientRequestReceiver` type
* `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
Reverts most of "Instrument some functions to try to locate the panic"
@teor2345
Copy link
Contributor Author

teor2345 commented Jan 6, 2021

I'll try implementing the InProgressClientRequest design, and see how we go.

It was a bit more complicated than I expected, because we call future::select on ClientRequestReceiver::next, which requires ClientRequestReceiver to impl Stream. (Or maybe impl Future?)

As a consequence, I needed to implement ClientRequestReceiver::poll_next rather than ClientRequestReceiver::next. It's a little bit more complicated, because it needs a pinning utility function and a match statement on the Poll result.

@yaahc if you're happy with this PR, let's rebase-merge it, in case we need to revert or inspect some of these changes.

Copy link
Contributor

@yaahc yaahc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

Comment on lines +108 to +125
impl Stream for ClientRequestReceiver {
type Item = InProgressClientRequest;

/// Converts the successful result of `inner.poll_next()` to an
/// `InProgressClientRequest`.
fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
match self.inner.poll_next_unpin(cx) {
Poll::Ready(client_request) => Poll::Ready(client_request.map(Into::into)),
// `inner.poll_next_unpin` parks the task for this future
Poll::Pending => Poll::Pending,
}
}

/// Returns `inner.size_hint()`
fn size_hint(&self) -> (usize, Option<usize>) {
self.inner.size_hint()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh yea, didn't see this one coming. Sorry about that and great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
2 participants