Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4. Avoid repeated requests to peers after partial responses or errors #3505

Merged
merged 34 commits into from
Feb 15, 2022

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Feb 10, 2022

Motivation

We want to avoid repeated requests to peers that don't have a block or transaction:

  1. zcashd doesn't respond to requests for missing blocks with notfound messages.
    So we have to check its responses, and track missing inventory in them.

  2. When Zebra cancels a request because a peer is slow, or when a peer times out, we don't want to try that peer again.

We want to retry missing blocks or transactions after a few minutes:

  1. Zebra's inventory rotation interval was a bit long.

Solution

  • register peer partial responses and errors as missing inventory
    • add an inventory collector to Client
    • split synthetic NotFoundRegistry errors from NotFoundResponse errors created from peer messages
    • add tests for partial missing and error inventory registration
  • make the inventory rotation interval a bit shorter, to improve performance

Related changes (needed for tests):

  • add a proptest-impl feature to zebra-network
  • add a test-only connect_isolated_with_inbound function

Related Fixes:

Closes #2156.

Review

@oxarbitrage can review this PR, I'm happy to do a video review if that would help.

During reviews, I'd like to focus on any design issues or bugs in this PR.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

We're finally done with these network fixes!

@teor2345 teor2345 added C-bug Category: This is a bug P-Medium ⚡ C-security Category: Security issues I-slow Problems with performance or responsiveness I-remote-node-overload Zebra can overload other nodes on the network A-network Area: Network protocol updates or fixes labels Feb 10, 2022
@teor2345 teor2345 self-assigned this Feb 10, 2022
@teor2345 teor2345 changed the title 4. Avoid repeated requests to peers with previous request errors or partial responses 4. Avoid repeated requests to peers after partial responses or errors Feb 10, 2022
@codecov
Copy link

codecov bot commented Feb 10, 2022

Codecov Report

Merging #3505 (de4ccc8) into main (499ae89) will increase coverage by 2.11%.
The diff coverage is 80.15%.

❗ Current head de4ccc8 differs from pull request most recent head f8474f2. Consider uploading reports for the commit f8474f2 to get more accurate results

@@            Coverage Diff             @@
##             main    #3505      +/-   ##
==========================================
+ Coverage   78.34%   80.45%   +2.11%     
==========================================
  Files         267      274       +7     
  Lines       31526    32229     +703     
==========================================
+ Hits        24698    25931    +1233     
+ Misses       6828     6298     -530     

@teor2345

This comment was marked as resolved.

@teor2345
Copy link
Contributor Author

The rest of the PR is:

zebrad/src/components/sync/tests/timing.rs Outdated Show resolved Hide resolved
zebrad/src/components/sync/tests/timing.rs Outdated Show resolved Hide resolved
zebra-network/src/isolated/tor.rs Show resolved Hide resolved
zebra-network/src/isolated/tor.rs Outdated Show resolved Hide resolved
zebra-network/src/isolated.rs Show resolved Hide resolved
zebra-network/src/isolated.rs Show resolved Hide resolved
jvff
jvff previously approved these changes Feb 14, 2022
zebrad/src/components/inbound/tests/real_peer_set.rs Outdated Show resolved Hide resolved
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
@teor2345 teor2345 requested a review from jvff February 15, 2022 00:41
@mergify mergify bot merged commit a4dd3b7 into main Feb 15, 2022
@mergify mergify bot deleted the notfound-error branch February 15, 2022 01:44
@teor2345
Copy link
Contributor Author

The deny.toml CI failed due to temporary alpine linux Docker network errors.

We can just ignore them for this PR, because this PR doesn't change any dependencies.
(It only adds features that activate dependencies.)

mpguerra added a commit that referenced this pull request May 19, 2023
mergify bot pushed a commit that referenced this pull request May 23, 2023
* ZIPs were updated to remove ambiguity, this was tracked in #1267.

* #2105 was fixed by #3039 and #2379 was closed by #3069

* #2230 was a duplicate of #2231 which was closed by #2511

* #3235 was obsoleted by #2156 which was fixed by #3505

* #1850 was fixed by #2944, #1851 was fixed by #2961 and #2902 was fixed by #2969

* We migrated to Rust 2021 edition in Jan 2022 with #3332

* #1631 was closed as not needed

* #338 was fixed by #3040 and #1162 was fixed by #3067

* #2079 was fixed by #2445

* #4794 was fixed by #6122

* #1678 stopped being an issue

* #3151 was fixed by #3934

* #3204 was closed as not needed

* #1213 was fixed by #4586

* #1774 was closed as not needed

* #4633 was closed as not needed

* Clarify behaviour of difficulty spacing

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour when retrying block downloads

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify when we might want to fix

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify what we might want to change in future

Co-authored-by: teor <teor@riseup.net>

* Clarify benefits of how we do block verification

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt errors

---------

Co-authored-by: teor <teor@riseup.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes C-bug Category: This is a bug C-security Category: Security issues I-remote-node-overload Zebra can overload other nodes on the network I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Security: Send notfound inv items to the inv collector, Credit: Equilibrium
2 participants