Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Consider all peers as potential candidates during pull-request in case of offline nodes #18333

Merged
merged 2 commits into from
Jul 1, 2021

Conversation

AshwinSekar
Copy link
Contributor

@AshwinSekar AshwinSekar commented Jun 30, 2021

Problem

When we receive contact info from an offline node in our list of peers, we assign it a much higher weight than other peers because we have not sent a pull request to it yet. This causes the sampling to always select only this peer to send pull requests to. However the ping check never allows us to send a pull request and update the timestamp, as the node is offline. Thus we never get a chance to send pull requests to other nodes.

Summary of Changes

After assigning the weights, instead of performing random sampling we use a weighted shuffle to ensure that if the higher weighted nodes fail the ping check we can still send pull requests to the other nodes.
Also reenables test_no_optimistic_confirmation_violation_with_tower and test_optimistic_confirmation_violation_without_tower

Fixes #18279

@sakridge
Copy link
Contributor

All peers will be all 500 or 1000 nodes?

@AshwinSekar AshwinSekar changed the title Try all peers during pull-request in case of offline nodes Consider all peers as potential candidates during pull-request in case of offline nodes Jun 30, 2021
@AshwinSekar
Copy link
Contributor Author

Sorry about the wording we are still only sending 1 pull request.

Instead of taking 2 * # nodes samples of the peers and finding the first available to send a pull request to, we do a weighted shuffle amongst all the peers to find the first available. The idea is that if offline nodes have a large weight, all the samples could result in offline nodes which prevents us from sending any pull requests.

@AshwinSekar AshwinSekar requested a review from behzadnouri June 30, 2021 19:35
behzadnouri
behzadnouri previously approved these changes Jun 30, 2021
Copy link
Contributor

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for debugging the local-cluster tests

@mergify mergify bot dismissed behzadnouri’s stale review June 30, 2021 20:21

Pull request has been modified.

@codecov
Copy link

codecov bot commented Jul 1, 2021

Codecov Report

Merging #18333 (3f43f09) into master (8d9a6de) will decrease coverage by 0.0%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           master   #18333     +/-   ##
=========================================
- Coverage    82.4%    82.3%   -0.1%     
=========================================
  Files         434      434             
  Lines      121329   121335      +6     
=========================================
- Hits        99978    99953     -25     
- Misses      21351    21382     +31     

@AshwinSekar AshwinSekar merged commit f4fb5de into solana-labs:master Jul 1, 2021
@CriesofCarrots
Copy link
Contributor

v1.7 backport?

mergify bot pushed a commit that referenced this pull request Jul 1, 2021
…e of offline nodes (#18333)

* Try all peers during pull-request in case of offline nodes

* fix clippy err

(cherry picked from commit f4fb5de)
mergify bot added a commit that referenced this pull request Jul 1, 2021
…e of offline nodes (#18333) (#18372)

* Try all peers during pull-request in case of offline nodes

* fix clippy err

(cherry picked from commit f4fb5de)

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-enable optimistic-confirmation local-cluster tests
4 participants