-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add signature verification to gossip #1937
Conversation
src/cluster_info.rs
Outdated
@@ -60,13 +60,15 @@ pub enum ClusterInfoError { | |||
pub struct ClusterInfo { | |||
/// The network | |||
pub gossip: CrdsGossip, | |||
/// set the keypair that will be used to sign crds values generated. It is unset only in tests. | |||
keypair: Option<Arc<Keypair>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need for the Option
here. Instead of None
, can you use Pubkey::default()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Rob was saying the same. I think it makes a little more sense for clarity sake. I'll explain and maybe you can suggest a better solution to the overall problem.
Cluster Info uses the pubkey from the node_info which gets passed in during construction. In fullnodes (most common production usecase), the node_info's pubkey is the same as the keypair that gets passed in optionally(via the new constructor I added).
In nearly all other uses (spy nodes, tests etc) cluster info just carrying around dummy nodes who's pubkeys are no longer associated with any keypair that cluster info can use for signature verification. If I init cluster's keypair to default, it will always fail verification during gossip. The same failure occurs if keypair isn't even initialized since the messages won't be signed and their signatures will not verify. So given that the end result is the same, I chose the Option to make it a little clearer why verification might be failing. Having a valid keypair makes it seem like the node's pubkey is the same as the keypair's but that's not necessary and that's the actual problem here. Cluster info can be constructed with nodes and not have a valid keypair to sign messages with.
Optionally, I can ignore the node's pubkey and just use the keypair always but then that becomes a special case for gossip where gossip messages are always signed with a random keypair whose pubkey doesn't tie into any of the other info blobs (nodes) in the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagar-solana have ClusterInfo::new(..) take a keypair and get rid of default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aeyakovenko, yup I can do that but it doesn't solve the problem that most users of cluster info just give it a random node whose keypair is discarded. What keypair do I pass in those cases? In our primary usecase (fullnode's instance of cluster which gets passed to ncp) this isn't an issue but in a whole bunch of places we don't have a keypair to pass in.
Unless its totally fine to expect users of cluster_info to pass in random keypairs knowing that gossip will fail in that cluster_info since that's what will happen with how I have it right now if cluster_info is constructed without a keypair. In that case I'd be more than happy to get rid of default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it looking like a ton of work to refactor such that all tests are using legit keypairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much. Most tests don't even care about having gossip run so it would be sort of unnecessary as I see it right now. For now I've done what you asked and removed the option but left in the other constructor. All Fullnodes should work correctly with gossip verification. If anything else tries to run gossip the peers will drop their messages (like spynodes shouldn't be gossiping anyway, only listening) so that actually works out well.
src/contact_info.rs
Outdated
@@ -159,6 +164,24 @@ impl ContactInfo { | |||
pub fn is_valid_address(addr: &SocketAddr) -> bool { | |||
(addr.port() != 0) && Self::is_valid_ip(addr.ip()) | |||
} | |||
pub fn get_sign_data(&self) -> Vec<u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagar-solana can you break this out into a singe signed serialized structure? @garious i am not a huge fan of this pattern from Transaction, because of how error prone it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Transaction should have a TransactionData struct that includes everything but its signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I can. I'll end up sticking signature into the Protocol enums instead of into the individual crdsvalues. That way I can just use default serialization for the values. Sounds good? Or should I do what @garious suggested with a data inner struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagar-solana the protocol enums wont work because when you receive a list of CrdsValues you need to verify the signature of each one. The protocol messages are the p2p messages, but the values themselves propagate p2N
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's right! Okay then I need to use the inner data structs to avoid manually serializing. Thanks for pointing that out. Got carried away trying to simplify...
sdk/src/signature.rs
Outdated
} | ||
|
||
fn pubkey(&self) -> Pubkey; | ||
fn get_sign_data(&self) -> Vec<u8>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit... getters don't have get_ on them in our conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't come up with that name. Txs already use it :P Suggestions? Just sign_data (could be confused with "please sign this data")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be "signed_data()" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the is_signed state of this data isn't known. It's the data that gets signed and compared against for verification. It's not already signed. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signable_data() to match the trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, awesome. Thanks!
@@ -12,6 +12,9 @@ use crds_value::CrdsValue; | |||
use solana_sdk::hash::Hash; | |||
use solana_sdk::pubkey::Pubkey; | |||
|
|||
///The min size for bloom filters | |||
pub const CRDS_GOSSIP_BLOOM_SIZE: usize = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this in bits? does it make sense for it to be a multiple of 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in bits. Num bits to use is figured out using this value (usually network_size is passed in). See https://hur.st/bloomfilter/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
- Assign default to keypair
- Needs tests
- Added a Signable trait that all signable values can implement. - CrdsValue is a Signable that contains Signables so it doesn't need to provide implementations for all the methods and can just override some defaults
- Spy nodes should be able to make pull requests
5f5e1f7
to
bab48f4
Compare
@aeyakovenko, if this looks ok to you, I'll merge. |
src/cluster_info.rs
Outdated
/// The Pubkey of the intended node/destination for this message | ||
pub destination: Pubkey, | ||
/// The source Addr this message should have been received from | ||
pub source: SocketAddr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source
shouldn't be necessary, and might not be possible to enforce because of NATs. I would add a wallclock
to the message so it expires if its replayed at a later time or by a random node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't think of that. The wall clock's a good idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aeyakovenko, I picked 500ms as the timeout. Does that sound reasonable since prunes are p2p?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
500ms might be tight, but we need to tests all the constants in the real world. Could you add a counter for expired prune messages? (A separate pr to track behavior is fine)
Bumps [typescript](https://github.com/Microsoft/TypeScript) from 4.3.3 to 4.3.4. - [Release notes](https://github.com/Microsoft/TypeScript/releases) - [Commits](microsoft/TypeScript@v4.3.3...v4.3.4) --- updated-dependencies: - dependency-name: typescript dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…port of solana-labs#1888) (solana-labs#1900) (solana-labs#1937) Revert "v2.0: Refactor and additional metrics for cost tracking (backport of solana-labs#1888) (solana-labs#1900)" This reverts commit 0aef62e.
Problem
Gossip lacks signature verification for messages, which means data can be spoofed.
Summary of Changes
Added Signatures to replicated gossip values and added verification of values are they get replicated across the network
Fixes #1855 and #1952