limits number of unique pubkeys in the crds table #15539

behzadnouri · 2021-02-25T17:59:46Z

Problem

Summary of Changes

Fixes #

leoluk · 2021-02-25T18:07:19Z

What about unstaked RPC nodes? Those are quite important to the network, too

mvines · 2021-02-25T18:13:15Z

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

sakridge · 2021-02-25T18:14:24Z

core/src/cluster_info.rs

@@ -114,6 +114,8 @@ pub const DEFAULT_CONTACT_DEBUG_INTERVAL_MILLIS: u64 = 10_000;
 pub const DEFAULT_CONTACT_SAVE_INTERVAL_MILLIS: u64 = 60_000;
 /// Minimum serialized size of a Protocol::PullResponse packet.
 const PULL_RESPONSE_MIN_SERIALIZED_SIZE: usize = 167;
+// Limit number of unique pubkeys in the crds table.
+const CRDS_UNIQUE_PUBKEY_CAPACITY: usize = 1024;


1k seems really low. I would expect like 10-20k is probably ok?

Increased it to 4k, which is ~6x current mainnet cluster (~4x current tds).
This is the number of unique pubkeys and the size of table itself is 830x of this value. So 10k might be a bit stretch performance wise. We can update this number as the code is optimized further to handle larger cluster.

behzadnouri · 2021-02-25T18:15:25Z

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

We may do some whitelisting too. The commit includes a keep: &[Pubkey] for pubkeys to never drop. Currently this is populated with the entrypoints and self-pubkey. But potentially can be expanded.

leoluk · 2021-02-25T18:15:43Z

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

We're running a whole bunch of private RPC-only nodes these days and having to stake them (and manage the keys) would be a big headache... not that I have a better idea on solving this

sakridge · 2021-02-25T18:16:53Z

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

We're running a whole bunch of private RPC-only nodes these days and having to stake them (and manage the keys) would be a big headache... not that I have a better idea on solving this

Is it really a problem? the infos are not that expensive, I think we can make the limit high enough that unless we are in constant DOS from nodes spamming ContactInfo it will be ok.

CherryWorm · 2021-02-25T18:20:17Z

The limit can't be too high as the DOS primarily depends on the contact infos being in the table long enough to get pushed out via gossip (which is not long). They get evicted after 15 seconds anyways. II'm not sure what the number has to be, but 20k already seems very high, that should probably be tested first.

leoluk · 2021-02-25T18:20:29Z

Is it really a problem? the infos are not that expensive, I think we can make the limit high enough that unless we are in constant DOS from nodes spamming ContactInfo it will be ok.

But not doing it would leave the network latently vulnerable to such DoS attacks, so I would assume that anyone running an important RPC node would want to stake it?

behzadnouri · 2021-02-25T18:33:03Z

I do not think ContactInfos are special. Any type can cause problems. The key is that the upper bound on the number of values is proportional to the number of unique pubkeys.
https://github.com/solana-labs/solana/blob/ebd43938a/core/src/crds_value.rs#L435-L446

So, the limit is on the number of unique pubkeys. With each pubkey there can be 830 values associated. So with 1000 limit on unique pubkeys, potentially the table can be as large as 830k records.

sakridge · 2021-02-25T18:34:21Z

I do not think ContactInfos are special. Any type can cause problems. The key is that the upper bound on the number of values is proportional to the number of unique pubkeys.
https://github.com/solana-labs/solana/blob/ebd43938a/core/src/crds_value.rs#L435-L446

So, the limit is on the number of unique pubkeys. With each pubkey there can be 830 values associated. So with 1000 limit on unique pubkeys, potentially the table can be as large as 830k records.

Well for unstaked nodes, we don't need to propagate votes and the other labels.

behzadnouri · 2021-02-25T18:42:25Z

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

On the other hand it only takes 1 sol to work around that. So might not be enough.

mvines · 2021-02-25T18:43:33Z

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

Oh yeah that'd be really nice. There's now probably hundreds of RPC nodes on mainnet-beta

codecov · 2021-02-25T19:16:28Z

Codecov Report

Merging #15539 (f32e2e6) into master (61c7ce8) will decrease coverage by 0.0%.
The diff coverage is 63.5%.

@@            Coverage Diff            @@
##           master   #15539     +/-   ##
=========================================
- Coverage    80.1%    80.0%   -0.1%     
=========================================
  Files         407      407             
  Lines      105452   105514     +62     
=========================================
+ Hits        84487    84506     +19     
- Misses      20965    21008     +43

sakridge · 2021-02-25T19:28:26Z

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

On the other hand it only takes 1 sol to work around that. So might not be enough.

True, but it limits the attack surface, you cannot stake/unstake quickly. I think for the stalling issue actually we might need to do something to limit the outgoing unstaked entries even further and then as came up in the thread have the whitelist for like a staked node + unstaked RPC setups. Since each node doesn't really know what the rest of the network is sending, then the limits to stop a distributed reflection type attack has to be pretty low.

behzadnouri · 2021-02-26T18:55:22Z

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

Oh yeah that'd be really nice. There's now probably hundreds of RPC nodes on mainnet-beta

sent out #15561

aeyakovenko · 2021-03-09T17:10:33Z

@behzadnouri once we have this, we can start optimizing for stake weighted propagation in gossip as well.

aeyakovenko · 2021-03-09T17:11:32Z

we might also want to take into consideration the vote frequency, basically votes burn sol, which pays for message access on the network. That might be a stronger spam filter then just stake weighted filters

mvines · 2021-03-09T17:13:56Z

we might also want to take into consideration the vote frequency, basically votes burn sol, which pays for message access on the network. That might be a stronger spam filter then just stake weighted filters

Oh I like this. We've been talking about giving RPC backend nodes a pinch of stake for network access, but that's annoying to manage. Easier to just get all the RPC backends to vote with no delegated stake

sakridge · 2021-03-09T17:22:58Z

we might also want to take into consideration the vote frequency, basically votes burn sol, which pays for message access on the network. That might be a stronger spam filter then just stake weighted filters

Oh I like this. We've been talking about giving RPC backend nodes a pinch of stake for network access, but that's annoying to manage. Easier to just get all the RPC backends to vote with no delegated stake

sounds like a nightmare for block sizes...

mvines · 2021-03-09T17:28:44Z

sounds like a nightmare for block sizes...

But if we can't deal with a couple thousand votes per second how's 50k TPS going to ever work?

sakridge

lgtm

(cherry picked from commit 56923c9) # Conflicts: # core/src/cluster_info.rs

* limits number of unique pubkeys in the crds table (#15539) (cherry picked from commit 56923c9) # Conflicts: # core/src/cluster_info.rs * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>

sakridge reviewed Feb 25, 2021

View reviewed changes

behzadnouri marked this pull request as draft February 26, 2021 14:31

behzadnouri force-pushed the crds-pubkey-trim branch from 81222b6 to 606c9eb Compare February 26, 2021 14:32

behzadnouri mentioned this pull request Feb 26, 2021

requires stakes for propagating crds values through gossip #15561

Merged

behzadnouri force-pushed the crds-pubkey-trim branch from 40a1148 to ad96d3d Compare March 4, 2021 16:08

behzadnouri added 3 commits March 9, 2021 11:16

limits number of unique pubkeys in the crds table

13978c5

adds metrics

cd48b1f

4x capacity

f32e2e6

behzadnouri force-pushed the crds-pubkey-trim branch from ad96d3d to f32e2e6 Compare March 9, 2021 16:16

behzadnouri marked this pull request as ready for review March 9, 2021 16:17

behzadnouri requested a review from sakridge March 9, 2021 18:10

sakridge approved these changes Mar 10, 2021

View reviewed changes

behzadnouri merged commit 56923c9 into solana-labs:master Mar 10, 2021

behzadnouri deleted the crds-pubkey-trim branch March 10, 2021 20:46

behzadnouri mentioned this pull request Mar 17, 2021

limits CrdsGossipPull::pull_request_time size #15793

Merged

behzadnouri added the v1.5 label Apr 11, 2021

mergify bot pushed a commit that referenced this pull request Apr 11, 2021

limits number of unique pubkeys in the crds table (#15539)

97bb5f0

(cherry picked from commit 56923c9) # Conflicts: # core/src/cluster_info.rs

mergify bot mentioned this pull request Apr 11, 2021

limits number of unique pubkeys in the crds table (bp #15539) #16475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limits number of unique pubkeys in the crds table #15539

limits number of unique pubkeys in the crds table #15539

behzadnouri commented Feb 25, 2021

leoluk commented Feb 25, 2021 •

edited

Loading

mvines commented Feb 25, 2021

sakridge Feb 25, 2021

behzadnouri Mar 9, 2021

behzadnouri commented Feb 25, 2021 •

edited

Loading

leoluk commented Feb 25, 2021

sakridge commented Feb 25, 2021

CherryWorm commented Feb 25, 2021

leoluk commented Feb 25, 2021

behzadnouri commented Feb 25, 2021 •

edited

Loading

sakridge commented Feb 25, 2021

behzadnouri commented Feb 25, 2021

mvines commented Feb 25, 2021

codecov bot commented Feb 25, 2021 •

edited

Loading

sakridge commented Feb 25, 2021

behzadnouri commented Feb 26, 2021

aeyakovenko commented Mar 9, 2021

aeyakovenko commented Mar 9, 2021

mvines commented Mar 9, 2021

sakridge commented Mar 9, 2021

mvines commented Mar 9, 2021

sakridge left a comment

limits number of unique pubkeys in the crds table #15539

limits number of unique pubkeys in the crds table #15539

Conversation

behzadnouri commented Feb 25, 2021

Problem

Summary of Changes

leoluk commented Feb 25, 2021 • edited Loading

mvines commented Feb 25, 2021

sakridge Feb 25, 2021

Choose a reason for hiding this comment

behzadnouri Mar 9, 2021

Choose a reason for hiding this comment

behzadnouri commented Feb 25, 2021 • edited Loading

leoluk commented Feb 25, 2021

sakridge commented Feb 25, 2021

CherryWorm commented Feb 25, 2021

leoluk commented Feb 25, 2021

behzadnouri commented Feb 25, 2021 • edited Loading

sakridge commented Feb 25, 2021

behzadnouri commented Feb 25, 2021

mvines commented Feb 25, 2021

codecov bot commented Feb 25, 2021 • edited Loading

Codecov Report

sakridge commented Feb 25, 2021

behzadnouri commented Feb 26, 2021

aeyakovenko commented Mar 9, 2021

aeyakovenko commented Mar 9, 2021

mvines commented Mar 9, 2021

sakridge commented Mar 9, 2021

mvines commented Mar 9, 2021

sakridge left a comment

Choose a reason for hiding this comment

leoluk commented Feb 25, 2021 •

edited

Loading

behzadnouri commented Feb 25, 2021 •

edited

Loading

behzadnouri commented Feb 25, 2021 •

edited

Loading

codecov bot commented Feb 25, 2021 •

edited

Loading