Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

adds metric for turbine retransmit tree mismatch #17351

Merged
merged 2 commits into from
May 21, 2021

Conversation

behzadnouri
Copy link
Contributor

Problem

In order to remove port-based forwarding logic in turbine, we need to first track how often the turbine retransmit/broadcast trees mismatch across nodes. One consistency condition is that if the node is on the critical path (i.e. the first node in each neighborhood), then we expect that the packet arrives at tvu socket as opposed to tvu-forwards.

Summary of Changes

  • 1st commit adds a metric to track how often above consistency condition is not met.
  • 2nd commit just removes the nested for loop. No code logic change.

@behzadnouri behzadnouri force-pushed the turbine-tree-mismatch branch from 125796e to ec4e94f Compare May 20, 2021 18:18
@codecov
Copy link

codecov bot commented May 20, 2021

Codecov Report

Merging #17351 (2134646) into master (6cba534) will decrease coverage by 0.0%.
The diff coverage is 92.1%.

@@            Coverage Diff            @@
##           master   #17351     +/-   ##
=========================================
- Coverage    82.7%    82.7%   -0.1%     
=========================================
  Files         421      424      +3     
  Lines      118048   118422    +374     
=========================================
+ Hits        97664    97969    +305     
- Misses      20384    20453     +69     

CriesofCarrots
CriesofCarrots previously approved these changes May 20, 2021
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems straightforward. Couple nits.

let r_lock = r.lock().unwrap();
let packets = r_lock.recv_timeout(timer)?;
let packets = r_lock.recv_timeout(RECV_TIMEOUT)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let packets = r_lock.recv_timeout(RECV_TIMEOUT)?;
let packets = r_lock.recv_timeout(Duration::from_secs(1))?;

Could this just be a one-liner? Or do you expect to use the const elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I picked up this habit from previous jobs, to use consts to avoid magic numbers in the code as a means of (mini) documentation and maybe more readability. I guess here it does not matter much either case.

// If the node is on the critical path (i.e. the first node in each
// neighborhood), then we expect that the packet arrives at tvu socket
// as opposed to tvu-forwards. If this is not the case, then the
// turbine broadcast/retransmit tree mismatch across nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// turbine broadcast/retransmit tree mismatch across nodes.
// turbine broadcast/retransmit tree is mismatched across nodes.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

Comment on lines +427 to +428
// neighborhood), then we expect that the packet arrives at tvu socket
// as opposed to tvu-forwards. If this is not the case, then the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever way to detect this, nice!

carllin
carllin previously approved these changes May 20, 2021
In order to remove port-based forwarding logic in turbine, we need to
first track how often the turbine retransmit/broadcast trees mismatch
across nodes.
One consistency condition is that if the node is on the critical path
(i.e. the first node in each neighborhood), then we expect that the
packet arrives at tvu socket as opposed to tvu-forwards.

This commit adds a metric to track how often above condition is not met.
The code can be simplified by just flattening the vector of packets.
@behzadnouri behzadnouri force-pushed the turbine-tree-mismatch branch from ec4e94f to 2134646 Compare May 21, 2021 14:47
@mergify mergify bot dismissed stale reviews from CriesofCarrots and carllin May 21, 2021 14:47

Pull request has been modified.

@behzadnouri behzadnouri merged commit ff0e623 into solana-labs:master May 21, 2021
@behzadnouri behzadnouri deleted the turbine-tree-mismatch branch May 21, 2021 17:10
mergify bot added a commit that referenced this pull request May 21, 2021
…17392)

* adds metric for turbine retransmit tree mismatch

In order to remove port-based forwarding logic in turbine, we need to
first track how often the turbine retransmit/broadcast trees mismatch
across nodes.
One consistency condition is that if the node is on the critical path
(i.e. the first node in each neighborhood), then we expect that the
packet arrives at tvu socket as opposed to tvu-forwards.

This commit adds a metric to track how often above condition is not met.

(cherry picked from commit 71de021)

* removes the nested for loop from retransmit-stage

The code can be simplified by just flattening the vector of packets.

(cherry picked from commit ff0e623)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 4, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
solana-labs#6672
solana-labs#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
solana-labs#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 7, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
solana-labs#6672
solana-labs#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
solana-labs#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 8, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
solana-labs#6672
solana-labs#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
solana-labs#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 11, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
solana-labs#6672
solana-labs#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
solana-labs#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 13, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
solana-labs#6672
solana-labs#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
solana-labs#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
behzadnouri added a commit that referenced this pull request Jun 15, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
#6672
#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
mergify bot pushed a commit that referenced this pull request Jun 15, 2021
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
#6672
#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.

(cherry picked from commit 1618386)
mergify bot added a commit that referenced this pull request Jun 15, 2021
…17973)

Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
#6672
#7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
#17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.

(cherry picked from commit 1618386)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants