-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite GetData handling to match the zcashd implementation #1518
Conversation
Merge blocked on re-enabling post-Sapling sync tests, but we can still do initial code reviews. |
`zcashd` silently ignores missing blocks, but sends found transactions followed by a `NotFound` message: https://github.com/zcash/zcash/blob/e7b425298f6d9a54810cb7183f00be547e4d9415/src/main.cpp#L5497 This is significantly different to the behaviour expected by the old Zebra connection state machine, which expected `NotFound` for blocks. Also change Zebra's GetData responses to peer request so they ignore missing blocks.
a836fc8
to
9823dbd
Compare
Unblocked by #1529, closing and re-opening to ensure those tests pass. |
e3de693
to
93bfa16
Compare
Instead, if the peer sends an unexpected block, unexpected transaction, or NotFound message: 1. end the request, and return a partial response containing any items that were successfully received 2. if none of the expected blocks or transactions were received, return an error, and close the connection
221e1f8
to
35adfc4
Compare
Is it necessary to match the zcashd behavior exactly? It's not actually specified anywhere, so this wouldn't be going off-spec. |
If a peer happens to be on a different fork, or slightly ahead of our state, it's more efficient to send it a partial response to its request: It doesn't actually matter that much whether the peer is Zebra or zcashd, except for the fact that zcashd expects partial responses - and only sends responses once it thinks it's fully synced. (After this PR, Zebra will handle a wide range of behaviour from its peers - including strict or partial responses. But partial responses are still more efficient.) Here's one scenario where partial responses make a difference: Someone launches a bunch of Zebra nodes that have good connectivity to each other. The nodes which have fewer blocks will get errors from nodes with more blocks - even if they're within a few hundred blocks of each other, and could benefit from partial responses. If we launch a bunch of Zebra instances on testnet, partial responses will make a huge difference. It won't matter so much on mainnet, unless someone launches 50 or 100 nodes - which is an unlikely but plausible scenario. There might also be some cases where strict responses would starve a side-chain node, but partial responses would help it get back on track. I don't think that can happen with Zebra. But I don't know enough about how zcashd handles multiple chains and header verification to be sure about its edge cases. Edit: swap more and fewer in the multiple Zebra example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, got some comment tweeks I think might be worth but feel free to admin merge after applying if you like them.
@@ -91,82 +84,191 @@ impl Handler { | |||
} | |||
} | |||
(Handler::Peers, Message::Addr(addrs)) => Handler::Finished(Ok(Response::Peers(addrs))), | |||
// `zcashd` returns requested transactions in a single batch of messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// `zcashd` returns requested transactions in a single batch of messages. | |
// `zcashd` returns requested transactions in an uninterrupted batch of messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is an accurate description of zcashd's behaviour. zcashd queues the messages in a single batch, but I don't know if it allows other types of messages to preempt or interrupt that batch.
} | ||
} | ||
// `zcashd` returns requested blocks in a single batch of messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// `zcashd` returns requested blocks in a single batch of messages. | |
// `zcashd` returns requested blocks in an uninterrupted batch of messages. |
@@ -193,7 +193,13 @@ pub enum Message { | |||
/// content of a specific object, and is usually sent after | |||
/// receiving an `inv` packet, after filtering known elements. | |||
/// | |||
/// `zcashd` returns requested items in a single batch of messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `zcashd` returns requested items in a single batch of messages. | |
/// `zcashd` returns requested items in an uninterrupted batch of messages. |
Motivation
When responding to
GetData
messages,zcashd
:NotFound
message:https://github.com/zcash/zcash/blob/e7b425298f6d9a54810cb7183f00be547e4d9415/src/main.cpp#L5497
This is significantly different to the behaviour expected by Zebra's current connection state machine.
Solution
zcashd
implementation's responseszcashd
implementation's expectations of our responsesThe code in this pull request has:
Review
@hdevalence or @yaahc can help me work out if this code is correct, and how to resolve the remaining TODOs.
This fix would be good to merge so we can discover other causes of #1435. But it's not particularly urgent.
Related Issues
Closes #1307 -
zcashd
missing block handlingResolves some possible causes of #1435 - temporary or permanent peer connection failures or hangs
Follow Up Work
Refactor zebra-network Bitcoin to Zebra protocol translation layer #1515 - edited to include TODOs from this PR
Test translation for zebra-network::{Request, Response} protocol #1048