Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signing policy + optional Signature, From and Seqno #359

Merged
merged 6 commits into from
Jul 23, 2020

Conversation

protolambda
Copy link
Contributor

This PR proposes new non-breaking PubSub options, to force stricter validation (avoid hypothethical network split), and avoid privacy problems in Eth2.

Why

Privacy

The current gossip message ID is purely based on a hash of the contents, but it is still wrapped in a protobuf that carries From, Seqno and Signature.
The From and Seqno affect privacy: we don't need, or want, the original source of the message to be known. Currently, I believe that if messages are not re-published,
but propagated, that at least in the Go implementation these details remain in the gossip message.

While From is problematic (and previously known to be, just not fixed by anyone), Seqno alone is also problematic, since (in Go at least) it is initialized as nanosecond time of the node, and then only increments by 1. Because of the slow non-random increase on top of a big number, it's effectively a unique identifier of the origin, embedded in every message.
This could be used to quickly correlate messages, and narrow down which validators (based on message contents) run on which nodes.

Network split

The "Signature" is not really used, and empty. However, the Go implementation seems to validate it anyway, if it is non-empty.
Now other gossip implementations don't use it at all, or have a stalled PR open that implements similar behavior.
In our case, the signature is dangerous, because it can make different nodes mislike eachother:

  • attacker sends message to A with bad signature.
  • A doesn't verify signature
  • A propagates to B
  • B does verify the signature (since it's a non-empty field)
  • B recognizes it as bad
  • B decreases score of A, or outright bans/kicks A.

Changes

Loosely based on discussion with @raulk:

  • Introduce a MessageSignaturePolicy enum:
      // MessageSignaturePolicy describes if signatures are produced, expected, and/or verified.
      type MessageSignaturePolicy uint8
      
      // LaxSign and LaxNoSign are deprecated. In the future msgSigning and msgVerification can be unified.
      const (
      	// msgSigning is set when the locally produced messages must be signed
      	msgSigning MessageSignaturePolicy = 1 << iota
      	// msgVerification is set when external messages must be verfied
      	msgVerification
      )
      
      const (
      	// StrictSign produces signatures and expects and verifies incoming signatures
      	StrictSign = msgSigning | msgVerification
      	// StrictNoSign does not produce signatures and drops and penalises incoming messages that carry one
      	StrictNoSign = msgVerification
      	// LaxSign produces signatures and validates incoming signatures iff one is present
      	// Deprecated: it is recommend to either strictly enable, or strictly disable, signatures.
      	LaxSign = msgSigning
      	// LaxNoSign does not produce signatures and validates incoming signatures iff one is present
      	// Deprecated: it is recommend to either strictly enable, or strictly disable, signatures.
      	LaxNoSign = 0
      )
    • This preserves the option for older "Lax" behavior (which we may just want to remove entirely instead, if nobody relies on it)
  • Update WithStrictSignatureVerification and WithMessageSigning to use the enum. This refactors out the logic away from the function, and into the constructor (but minimal). This avoids an unnecessary peerstore private-key lookup (getting the host private key when not using it as signing key)
  • Introduce WithMessageSignaturePolicy to set the singing policy. I have doubts here, alternatively we could not deprecate WithMessageSigning, and eventually just say that the verification bool is always on. not signing && verification means that signatures must be nil to be valid.
  • pushMsg now checks if the signature is nil, given the right circumstances (and added a trace for it)
    • It still defers signature verification till after the message-seen check. The nil check is cheap and simple enough to do immediately, mirroring the non-nil check if signing was turned on.
  • New WithNoAuthor option, to not sign any messages, and omit any origin data (seq no and signer identity)
    • TODO: unfamiliar with pb.Message Key attribute, but might need to be omitted or handled as well?
    • whenever the signID is nil, the signing option is disabled: you can't be not signing while also requiring signatures. (matches previous "non sensical option" check in constructor). Instead of returning an error I am disabling the signing now. But maybe it should just error instead?
  • Possible bugfix: Message.From should be set to the signer, not the current host (since they may be different, and potentially it is used for signature checking via key extraction, unless Key is set?).

Any feedback welcome, I can make changes, or change the approach.

Copy link
Collaborator

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great at first glance, I'll take a closer look tomorrow morning.

Copy link
Collaborator

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle the unexpected signature in the score tracer.
Also, it might make sense to enforce empty from/seqno when we operate in anoymous mode to defend against and adversary stuffing garbage in those fields.

pubsub.go Show resolved Hide resolved
@protolambda
Copy link
Contributor Author

Some concerns:

  • Testing: I would like some help/feedback here. There's one broken test locally TestGossipsubDirectPeers which I am not familiar with, maybe broken because of other reasons. And then the coverage etc. should be maintained.
  • Compatibility: for default options it's compatible. But if one chooses to use the "no author" option along with a custom message-ID (like in Eth2), it won't work with current other gossip implementations out of the box. Since those still send the "From" and "Seqno" fields. @jrhea is logging that data of different Eth2 clients (4 different gossipsub implementations, 5 if you count lodestar) on Altona testnet. I am curious what the current observed behavior tells us. Also cc @AgeManning who has a PR to Rust libp2p for signing open here: Gossipsub message signing rust-libp2p#1583
  • Configuration: the current option for "lax" signature behavior (i.e. don't sign, but verify if anything is present) is not very clean. Maybe we should just completely move away from that already, and have a single yes/no to the use of, production of, and verification of signatures.

@vyzo
Copy link
Collaborator

vyzo commented Jul 22, 2020

Hrm, the test passes on travis and wfm; maybe there is some non-determinism.

@vyzo
Copy link
Collaborator

vyzo commented Jul 22, 2020

cc @raulk

@protolambda
Copy link
Contributor Author

protolambda commented Jul 22, 2020

The test that fails locally:

=== RUN   TestGossipsubDirectPeers
    TestGossipsubDirectPeers: gossipsub_test.go:1139: expected a connection between direct peers
--- FAIL: TestGossipsubDirectPeers (2.01s)

gossipsub_test.go:1139 and context:

	connect(t, h[0], h[1])
	connect(t, h[0], h[2])

	// verify that the direct peers connected
	time.Sleep(2 * time.Second)
	if len(h[1].Network().ConnsToPeer(h[2].ID())) == 0 {
		t.Fatal("expected a connection between direct peers")
	}

Looks like it's a timing thing that misses, and unrelated to this PR.

Edit: increasing the two sleep statements before expectations to 10s worked. Flaky test.

@protolambda
Copy link
Contributor Author

protolambda commented Jul 22, 2020

I think this is where things go wrong with the flaky test:

go func() {

New go routines are started to make connections, and the connections are not awaited (no waitgroup). At the same time, maybe that is desirable, to not halt the heartbeat loop. Waiting for it in a test is not ideal though. And I wonder what happens next heartbeat, does it just repeatedly try to connect? Is that what the ticking is for?\

Edit: if 1 tick is 1 heartbeat is 1 second, then 2 ticks to try 2nd connect attempt will be just enough or not, depending on go routine order. And the first attempt gs.heartbeatTicks%gs.directConnectTicks == 0 with heartbeatTicks = 0 may be missing for other reasons, requiring the 2nd to pass for the test to pass.

@vyzo
Copy link
Collaborator

vyzo commented Jul 22, 2020

Yeah, we can't block the event loop. It retries every few ticks, with an initial spawn.

Copy link
Collaborator

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this looks good to me.

@vyzo
Copy link
Collaborator

vyzo commented Jul 23, 2020

So, regardless of the very interesting issues you raise, codewise this is ready to be merged.

@vyzo
Copy link
Collaborator

vyzo commented Jul 23, 2020

I am going to merge it but not yet tag a release.

@vyzo vyzo merged commit 9950710 into libp2p:master Jul 23, 2020
@jrhea
Copy link

jrhea commented Jul 23, 2020

Since those still send the "From" and "Seqno" fields. @jrhea is logging that data of different Eth2 clients (4 different gossipsub implementations, 5 if you count lodestar) on Altona testnet. I am curious what the current observed behavior tells us.

With respect to the seqno...not only can it be used to fingerprint nodes, but the fact that it is incremented with each new gossip message authored allows attackers to approximate how many validators a node is running (in the case of eth2).

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just one tiny comment.

}
if t.p.signID != "" {
m.From = []byte(t.p.signID)
m.Seqno = t.p.nextSeqno()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I don't think whether we set or not the Seqno is correlated with whether we send or not the source peer ID, rather with whether we have a custom MessageIdFn or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can introduce another option for it, but the thought here is that seq-no makes little sense without a message author (since everyone can claim the seq-no for any message data), so it's just left out.

There's some polishing that can be done. I appreciate the quick PR merge, but would have welcomed more feedback/discussion. Reviewing the update path for eth2 clients would also help. Implementations all behave a little different, and this is the chance to plan signing functionality (non-eth2, but relevant) for those that don't have it yet, make verification strict yes/no, and align everything. I am tracking behavioral differences in a table in the eth2-specs issue here: ethereum/consensus-specs#1981 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's why I haven't cut a release yet -- we can iterate on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants