Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite GetData handling to match the zcashd implementation #1518

Merged
merged 5 commits into from
Jan 4, 2021

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Dec 14, 2020

Motivation

When responding to GetData messages, zcashd:

This is significantly different to the behaviour expected by Zebra's current connection state machine.

Solution

  • update Zebra's connection state machine peer response mappings, to correctly handle the zcashd implementation's responses
  • make Zebra respond with a partial list of blocks if any blocks are missing, to match the zcashd implementation's expectations of our responses

The code in this pull request has:

Review

@hdevalence or @yaahc can help me work out if this code is correct, and how to resolve the remaining TODOs.

This fix would be good to merge so we can discover other causes of #1435. But it's not particularly urgent.

Related Issues

Closes #1307 - zcashd missing block handling
Resolves some possible causes of #1435 - temporary or permanent peer connection failures or hangs

Follow Up Work

Refactor zebra-network Bitcoin to Zebra protocol translation layer #1515 - edited to include TODOs from this PR
Test translation for zebra-network::{Request, Response} protocol #1048

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage labels Dec 14, 2020
@teor2345 teor2345 added this to the v1.0.0-alpha.1 milestone Dec 14, 2020
@teor2345 teor2345 requested a review from hdevalence December 14, 2020 09:04
@teor2345 teor2345 self-assigned this Dec 14, 2020
@teor2345 teor2345 added the S-blocked Status: Blocked on other tasks label Dec 14, 2020
@teor2345
Copy link
Contributor Author

Merge blocked on re-enabling post-Sapling sync tests, but we can still do initial code reviews.

`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
https://github.com/zcash/zcash/blob/e7b425298f6d9a54810cb7183f00be547e4d9415/src/main.cpp#L5497

This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.

Also change Zebra's GetData responses to peer request so they ignore
missing blocks.
@teor2345 teor2345 force-pushed the zcashd-missing-blocks branch from a836fc8 to 9823dbd Compare December 14, 2020 10:12
@teor2345 teor2345 requested a review from dconnolly December 14, 2020 21:03
@teor2345
Copy link
Contributor Author

Unblocked by #1529, closing and re-opening to ensure those tests pass.

@teor2345 teor2345 closed this Dec 16, 2020
@teor2345 teor2345 reopened this Dec 16, 2020
@teor2345 teor2345 removed S-blocked Status: Blocked on other tasks S-needs-triage Status: A bug report needs triage labels Dec 16, 2020
@teor2345
Copy link
Contributor Author

This PR blocks #1435, #1471, and #1510 - merging it might solve those bugs, or make it easier to diagnose and fix them. So I've triaged it into the next alpha.

@teor2345 teor2345 requested review from yaahc and removed request for dconnolly December 17, 2020 08:20
@teor2345
Copy link
Contributor Author

Hey @yaahc I added you as a reviewer on this PR, because you're reviewing #1531, which is based on this PR

@teor2345 teor2345 force-pushed the zcashd-missing-blocks branch from e3de693 to 93bfa16 Compare December 18, 2020 04:52
Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
   that were successfully received
2. if none of the expected blocks or transactions were received, return
   an error, and close the connection
@teor2345 teor2345 force-pushed the zcashd-missing-blocks branch from 221e1f8 to 35adfc4 Compare December 18, 2020 06:27
@hdevalence
Copy link
Contributor

Is it necessary to match the zcashd behavior exactly? It's not actually specified anywhere, so this wouldn't be going off-spec.

@teor2345
Copy link
Contributor Author

teor2345 commented Dec 18, 2020

Is it necessary to match the zcashd behavior exactly? It's not actually specified anywhere, so this wouldn't be going off-spec.

If a peer happens to be on a different fork, or slightly ahead of our state, it's more efficient to send it a partial response to its request:
#1307 (comment)

It doesn't actually matter that much whether the peer is Zebra or zcashd, except for the fact that zcashd expects partial responses - and only sends responses once it thinks it's fully synced. (After this PR, Zebra will handle a wide range of behaviour from its peers - including strict or partial responses. But partial responses are still more efficient.)

Here's one scenario where partial responses make a difference:

Someone launches a bunch of Zebra nodes that have good connectivity to each other. The nodes which have fewer blocks will get errors from nodes with more blocks - even if they're within a few hundred blocks of each other, and could benefit from partial responses.

If we launch a bunch of Zebra instances on testnet, partial responses will make a huge difference. It won't matter so much on mainnet, unless someone launches 50 or 100 nodes - which is an unlikely but plausible scenario.

There might also be some cases where strict responses would starve a side-chain node, but partial responses would help it get back on track. I don't think that can happen with Zebra. But I don't know enough about how zcashd handles multiple chains and header verification to be sure about its edge cases.

Edit: swap more and fewer in the multiple Zebra example

Copy link
Contributor

@yaahc yaahc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, got some comment tweeks I think might be worth but feel free to admin merge after applying if you like them.

@@ -91,82 +84,191 @@ impl Handler {
}
}
(Handler::Peers, Message::Addr(addrs)) => Handler::Finished(Ok(Response::Peers(addrs))),
// `zcashd` returns requested transactions in a single batch of messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// `zcashd` returns requested transactions in a single batch of messages.
// `zcashd` returns requested transactions in an uninterrupted batch of messages.

Copy link
Contributor Author

@teor2345 teor2345 Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is an accurate description of zcashd's behaviour. zcashd queues the messages in a single batch, but I don't know if it allows other types of messages to preempt or interrupt that batch.

}
}
// `zcashd` returns requested blocks in a single batch of messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// `zcashd` returns requested blocks in a single batch of messages.
// `zcashd` returns requested blocks in an uninterrupted batch of messages.

@@ -193,7 +193,13 @@ pub enum Message {
/// content of a specific object, and is usually sent after
/// receiving an `inv` packet, after filtering known elements.
///
/// `zcashd` returns requested items in a single batch of messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `zcashd` returns requested items in a single batch of messages.
/// `zcashd` returns requested items in an uninterrupted batch of messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check how zcashd handles missing blocks?
3 participants