Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

limits number of unique pubkeys in the crds table #15539

Merged
merged 3 commits into from
Mar 10, 2021

Conversation

behzadnouri
Copy link
Contributor

Problem

Summary of Changes

Fixes #

@leoluk
Copy link
Contributor

leoluk commented Feb 25, 2021

What about unstaked RPC nodes? Those are quite important to the network, too

@mvines
Copy link
Contributor

mvines commented Feb 25, 2021

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

@@ -114,6 +114,8 @@ pub const DEFAULT_CONTACT_DEBUG_INTERVAL_MILLIS: u64 = 10_000;
pub const DEFAULT_CONTACT_SAVE_INTERVAL_MILLIS: u64 = 60_000;
/// Minimum serialized size of a Protocol::PullResponse packet.
const PULL_RESPONSE_MIN_SERIALIZED_SIZE: usize = 167;
// Limit number of unique pubkeys in the crds table.
const CRDS_UNIQUE_PUBKEY_CAPACITY: usize = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1k seems really low. I would expect like 10-20k is probably ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased it to 4k, which is ~6x current mainnet cluster (~4x current tds).
This is the number of unique pubkeys and the size of table itself is 830x of this value. So 10k might be a bit stretch performance wise. We can update this number as the code is optimized further to handle larger cluster.

@behzadnouri
Copy link
Contributor Author

behzadnouri commented Feb 25, 2021

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

We may do some whitelisting too. The commit includes a keep: &[Pubkey] for pubkeys to never drop. Currently this is populated with the entrypoints and self-pubkey. But potentially can be expanded.

@leoluk
Copy link
Contributor

leoluk commented Feb 25, 2021

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

We're running a whole bunch of private RPC-only nodes these days and having to stake them (and manage the keys) would be a big headache... not that I have a better idea on solving this

@sakridge
Copy link
Contributor

What about unstaked RPC nodes? Those are quite important to the network, too

Yeah good point. I've been wondering if it makes sense to strongly recommend that RPC nodes have at least a couple SOL staked to them. Not enough to get in the leader schedule but enough so that they actually get prioritized over the rest of the 0 nodes

We're running a whole bunch of private RPC-only nodes these days and having to stake them (and manage the keys) would be a big headache... not that I have a better idea on solving this

Is it really a problem? the infos are not that expensive, I think we can make the limit high enough that unless we are in constant DOS from nodes spamming ContactInfo it will be ok.

@CherryWorm
Copy link
Contributor

The limit can't be too high as the DOS primarily depends on the contact infos being in the table long enough to get pushed out via gossip (which is not long). They get evicted after 15 seconds anyways. II'm not sure what the number has to be, but 20k already seems very high, that should probably be tested first.

@leoluk
Copy link
Contributor

leoluk commented Feb 25, 2021

Is it really a problem? the infos are not that expensive, I think we can make the limit high enough that unless we are in constant DOS from nodes spamming ContactInfo it will be ok.

But not doing it would leave the network latently vulnerable to such DoS attacks, so I would assume that anyone running an important RPC node would want to stake it?

@behzadnouri
Copy link
Contributor Author

behzadnouri commented Feb 25, 2021

I do not think ContactInfos are special. Any type can cause problems. The key is that the upper bound on the number of values is proportional to the number of unique pubkeys.
https://github.com/solana-labs/solana/blob/ebd43938a/core/src/crds_value.rs#L435-L446

So, the limit is on the number of unique pubkeys. With each pubkey there can be 830 values associated. So with 1000 limit on unique pubkeys, potentially the table can be as large as 830k records.

@sakridge
Copy link
Contributor

I do not think ContactInfos are special. Any type can cause problems. The key is that the upper bound on the number of values is proportional to the number of unique pubkeys.
https://github.com/solana-labs/solana/blob/ebd43938a/core/src/crds_value.rs#L435-L446

So, the limit is on the number of unique pubkeys. With each pubkey there can be 830 values associated. So with 1000 limit on unique pubkeys, potentially the table can be as large as 830k records.

Well for unstaked nodes, we don't need to propagate votes and the other labels.

@behzadnouri
Copy link
Contributor Author

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

On the other hand it only takes 1 sol to work around that. So might not be enough.

@mvines
Copy link
Contributor

mvines commented Feb 25, 2021

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

Oh yeah that'd be really nice. There's now probably hundreds of RPC nodes on mainnet-beta

@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #15539 (f32e2e6) into master (61c7ce8) will decrease coverage by 0.0%.
The diff coverage is 63.5%.

@@            Coverage Diff            @@
##           master   #15539     +/-   ##
=========================================
- Coverage    80.1%    80.0%   -0.1%     
=========================================
  Files         407      407             
  Lines      105452   105514     +62     
=========================================
+ Hits        84487    84506     +19     
- Misses      20965    21008     +43     

@sakridge
Copy link
Contributor

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

On the other hand it only takes 1 sol to work around that. So might not be enough.

True, but it limits the attack surface, you cannot stake/unstake quickly. I think for the stalling issue actually we might need to do something to limit the outgoing unstaked entries even further and then as came up in the thread have the whitelist for like a staked node + unstaked RPC setups. Since each node doesn't really know what the rest of the network is sending, then the limits to stop a distributed reflection type attack has to be pretty low.

@behzadnouri
Copy link
Contributor Author

Well for unstaked nodes, we don't need to propagate votes and the other labels.

That might be a good a idea to implement. But currently the gossip propagation code does not look at the stakes. So this will be a new logic in the code.

Oh yeah that'd be really nice. There's now probably hundreds of RPC nodes on mainnet-beta

sent out #15561

@behzadnouri behzadnouri marked this pull request as ready for review March 9, 2021 16:17
@aeyakovenko
Copy link
Member

@behzadnouri once we have this, we can start optimizing for stake weighted propagation in gossip as well.

@aeyakovenko
Copy link
Member

we might also want to take into consideration the vote frequency, basically votes burn sol, which pays for message access on the network. That might be a stronger spam filter then just stake weighted filters

@mvines
Copy link
Contributor

mvines commented Mar 9, 2021

we might also want to take into consideration the vote frequency, basically votes burn sol, which pays for message access on the network. That might be a stronger spam filter then just stake weighted filters

Oh I like this. We've been talking about giving RPC backend nodes a pinch of stake for network access, but that's annoying to manage. Easier to just get all the RPC backends to vote with no delegated stake

@sakridge
Copy link
Contributor

sakridge commented Mar 9, 2021

we might also want to take into consideration the vote frequency, basically votes burn sol, which pays for message access on the network. That might be a stronger spam filter then just stake weighted filters

Oh I like this. We've been talking about giving RPC backend nodes a pinch of stake for network access, but that's annoying to manage. Easier to just get all the RPC backends to vote with no delegated stake

sounds like a nightmare for block sizes...

@mvines
Copy link
Contributor

mvines commented Mar 9, 2021

sounds like a nightmare for block sizes...

But if we can't deal with a couple thousand votes per second how's 50k TPS going to ever work?

@behzadnouri behzadnouri requested a review from sakridge March 9, 2021 18:10
Copy link
Contributor

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@behzadnouri behzadnouri merged commit 56923c9 into solana-labs:master Mar 10, 2021
@behzadnouri behzadnouri deleted the crds-pubkey-trim branch March 10, 2021 20:46
mergify bot pushed a commit that referenced this pull request Apr 11, 2021
(cherry picked from commit 56923c9)

# Conflicts:
#	core/src/cluster_info.rs
mergify bot added a commit that referenced this pull request Apr 12, 2021
* limits number of unique pubkeys in the crds table (#15539)

(cherry picked from commit 56923c9)

# Conflicts:
#	core/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants