security/p2p: prevent peers who errored being added to the peer_set #9500

jmalicevic · 2022-09-30T07:20:30Z

This work is a fix for a bug in the P2P layer.

A node can be attacked via the p2p layer by saturating its incoming connection slots and not allowing the node to accept new conditions. This happens when an attacker continuously submits requests to connect with an erroneous message causing the incoming request to error before it has been accepted. The attacked node, tries to remove the peer from its peer set which silently fails (due to the peer not yet being in the peer set). The routine adding a peer into the peer set happens in parallel in the background and will add the peer after the error has been reported.

This fix resolves the issue in the following way:

We add a field removalAttemptFailed to the Peer datastructure.
If removal of this peer fails, we set it to true.
When adding a peer into the peer set, the Add function will return an ErrPeerRemoval error if this field was true and not add the peer.

Note. This attack does not work if the config flag allow_duplicate_ips is set to false.

PR checklist

Tests written/updated, or no tests needed
CHANGELOG_PENDING.md updated, or no changelog entry needed
Updated relevant documentation (docs/) and code comments, or no
documentation updates needed

p2p/switch.go

cmwaters · 2022-09-30T09:47:27Z

p2p/peer_set.go

+	if peer.GetRemovalFailed() {
+		return ErrPeerRemoval{}
+	}


Can we do this check before calling PeerSet.Add here. I would prefer to do it on line 813 of switch.go before initializing the peer to the individual reactors

Will reordering this as you've recommended @cmwaters not change the effectiveness of the fix?

What do you mean by effectiveness? As in it will no longer fix it?

If this check happens anywhere earlier we introduce the exact same race condition we already heaving.

We can absolutely do it before and if this flag happens to be set we don't do anything but we also do have to do it here as well to be sure we are not in the same position as w/o the fix.

If this check happens anywhere earlier we introduce the exact same race condition we already heaving.

I see. I wasn't aware of that. Need to look in more closely then

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

thanethomson · 2022-09-30T17:28:39Z

p2p/switch.go

+		// This is done to avoid the corner case of adding a peer when this error happens before the
+		// AddPeer routine has finished. Setting this flag will prevent a peer from being added
+		// to a node's peer set.
+		peer.SetRemovalFailed()


Is it safe to do this here without the peer set being locked in the same way it is when calling GetRemovalFailed here? (the GetRemovalFailed call happens while the peer set's mutex is locked)

I had intuitively implemented with a lock in a previous version (although by setting a lock on the peer itself, not on the entire set). But I did not notice the attack being more successful so I opted to not add a mutex into the peer structure itself even though I still do a small window of opportunity for the removal to happen first and then the Add to grab a lock and add the peer before we set the removal failed flag. Thinking about it now, I could actually set the flag inside the Remove function above the line that returns false and not do a lock on the peer itself. I will do a local test with this change to examine the behaviour unless you have objections.

Based on a local test I think it is better to put it inside the Remove. Will confirm on the testnet before pushing the change.

It does not change the behaviour much but is definetly more correct for this to be inside Remove. I moved it there now. Tested both locally and on DO.

thanethomson

I'd say we should go ahead and merge this. Next steps from here:

Backport to v0.37.x and v0.34.x, and add a CHANGELOG_PENDING.md entry to each backport PR.
Run a large-scale testnet (a single test) on v0.34.x with this fix and check that things work as expected.
Cut v0.34.22.
Keep the v0.37.x backport branch open until we've cut v0.37.0, then merge it.
Run another large-scale testnet on v0.37.x to check that things work before cutting v0.37.1.

Unless anyone has any objections there?

jmalicevic · 2022-10-05T19:55:17Z

Thanks! I'll create an issue with the items you listed, I think they all make sense are definitely worth doing. Do you think we should do a large testnet test of whether the attacked is prevented? If so, we can add this item to the list and I'll need to patch the testnet scripts to be able to execute the attack.

thanethomson · 2022-10-05T19:58:46Z

Do you think we should do a large testnet test of whether the attacked is prevented? If so, we can add this item to the list and I'll need to patch the testnet scripts to be able to execute the attack.

The main thing we'd need to know is whether the P2P network is still stable and that the fix doesn't meaningfully affect peer connectivity. I don't think it needs to be a long test. I assume that your previous tests showed that the fix worked in larger-scale testnets?

jmalicevic · 2022-10-05T20:01:37Z

I assume that your previous tests showed that the fix worked in larger-scale testnets?

I tested that a more aggressive attack does not bring the node down and that the attack is prevented. I did not run a 200node test with the attack. I also don;t think it would reveal additional information. But it was worth testing that when we multiple the attackers the node is still behaving correctly. So I agree with you, we should confirm that this fix does not break something else rather then attack a node within a bigger network.

…9500) * Mark failed removal of peer to address security bug Co-authored-by: Callum Waters <cmwaters19@gmail.com> (cherry picked from commit c0bdb24)

…backport #9500) (#9516) * security/p2p: prevent peers who errored being added to the peer_set (#9500) * Mark failed removal of peer to address security bug Co-authored-by: Callum Waters <cmwaters19@gmail.com> (cherry picked from commit c0bdb24) * Changelong entry and added missing functions for implementations of Peer Co-authored-by: Jasmina Malicevic <jasmina.dustinac@gmail.com>

…er_set (backport #9500) (#9516)" This reverts commit 430afb2.

cason · 2022-10-11T12:25:24Z

This is an elegant workaround. As Jasmina mentioned, while I agree we should assess the p2p layer as a whole in large testnet, I don't see how a larger environment would affect this form of attack or compromise the solution.

) **tl;dr DOS mitigation migrated from [tendermint/tendermint/pull/9500](tendermint/tendermint#9500 Validated use LocalNet instructions at [doc/guides/localnet.md](https://github.com/pokt-network/pocket-core/blob/staging/doc/guides/localnet.md) Original PR description: ``` This work is a fix for a bug in the P2P layer. A node can be attacked via the p2p layer by saturating its incoming connection slots and not allowing the node to accept new conditions. This happens when an attacker continuously submits requests to connect with an erroneous message causing the incoming request to error before it has been accepted. The attacked node, tries to remove the peer from its peer set which silently fails (due to the peer not yet being in the peer set). The routine adding a peer into the peer set happens in parallel in the background and will add the peer after the error has been reported. This fix resolves the issue in the following way: We add a field removalAttemptFailed to the Peer datastructure. If removal of this peer fails, we set it to true. When adding a peer into the peer set, the Add function will return an ErrPeerRemoval error if this field was true and not add the peer. Note. This attack does not work if the config flag allow_duplicate_ips is set to false. ```

…216) * security/p2p: prevent peers who errored being added to the peer_set (backport tendermint#9500) (#87) * security/p2p: prevent peers who errored being added to the peer_set (tendermint#9500) * Mark failed removal of peer to address security bug * Added p2p bugfix

jmalicevic added 5 commits September 28, 2022 10:46

Mark failed removal of peer to address security bug

e76f708

Renamed variable

9fb08e7

Added small test for error on removal

845a1f9

Merge remote-tracking branch 'origin/main' into jasmina/p2p-bugfix

6664c67

Changed function names

8432b81

jmalicevic added C:p2p Component: P2P pkg T:security Type: Security (specify priority) priority A high-priority, high-level item to be shown on the priorities project board labels Sep 30, 2022

jmalicevic self-assigned this Sep 30, 2022

jmalicevic requested a review from ebuchman as a code owner September 30, 2022 07:20

jmalicevic requested a review from a team September 30, 2022 07:20

jmalicevic changed the title ~~security/p2p: detect peers silently erroring and being added as peers after the fact~~ security/p2p: prevent peers who errored being added to the peer_set Sep 30, 2022

cmwaters reviewed Sep 30, 2022

View reviewed changes

thanethomson added S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x and removed priority A high-priority, high-level item to be shown on the priorities project board labels Sep 30, 2022

jmalicevic and others added 2 commits September 30, 2022 13:02

Lower log level

c2f23b2

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

Applied PR comment

499804c

thanethomson reviewed Sep 30, 2022

View reviewed changes

p2p: moved setting flag to tbe thread safe

4c5faa7

thanethomson approved these changes Oct 5, 2022

View reviewed changes

Merge branch 'main' into jasmina/p2p-bugfix

4497a68

jmalicevic merged commit c0bdb24 into main Oct 6, 2022

jmalicevic deleted the jasmina/p2p-bugfix branch October 6, 2022 07:02

This was referenced Oct 6, 2022

security/p2p: prevent peers who errored being added to the peer_set (backport #9500) #9515

Draft

security/p2p: prevent peers who errored being added to the peer_set (backport #9500) #9516

Merged

jmalicevic mentioned this pull request Oct 6, 2022

backport: Add p2p security fix to v0.34 and v0.37 #9517

Closed

6 tasks

jmalicevic added a commit that referenced this pull request Oct 10, 2022

Revert "security/p2p: prevent peers who errored being added to the pe…

3b4d8e9

…er_set (backport #9500) (#9516)" This reverts commit 430afb2.

jmalicevic mentioned this pull request Dec 27, 2022

backport: Add p2p security fix to v0.34 and v0.37 #9517 cometbft/cometbft#70

Closed

6 tasks

jmalicevic mentioned this pull request Jan 9, 2023

security/p2p: prevent peers who errored being added to the peer_set (backport #9500) cometbft/cometbft#87

Merged

Olshansk mentioned this pull request Apr 5, 2023

security/p2p: prevent peers who errored being added to the peer_set pokt-network/tendermint#75

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security/p2p: prevent peers who errored being added to the peer_set #9500

security/p2p: prevent peers who errored being added to the peer_set #9500

jmalicevic commented Sep 30, 2022 •

edited by thanethomson

Loading

cmwaters Sep 30, 2022

thanethomson Sep 30, 2022 •

edited

Loading

cmwaters Sep 30, 2022

jmalicevic Sep 30, 2022

jmalicevic Sep 30, 2022

cmwaters Sep 30, 2022

thanethomson Sep 30, 2022

jmalicevic Sep 30, 2022

jmalicevic Sep 30, 2022

jmalicevic Oct 3, 2022

thanethomson left a comment

jmalicevic commented Oct 5, 2022

thanethomson commented Oct 5, 2022

jmalicevic commented Oct 5, 2022

cason commented Oct 11, 2022

security/p2p: prevent peers who errored being added to the peer_set #9500

security/p2p: prevent peers who errored being added to the peer_set #9500

Conversation

jmalicevic commented Sep 30, 2022 • edited by thanethomson Loading

PR checklist

Choose a reason for hiding this comment

thanethomson Sep 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thanethomson left a comment

Choose a reason for hiding this comment

jmalicevic commented Oct 5, 2022

thanethomson commented Oct 5, 2022

jmalicevic commented Oct 5, 2022

cason commented Oct 11, 2022

jmalicevic commented Sep 30, 2022 •

edited by thanethomson

Loading

thanethomson Sep 30, 2022 •

edited

Loading