-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the number of pending un-funded inbound channel #1988
Limit the number of pending un-funded inbound channel #1988
Conversation
Codecov ReportBase: 90.89% // Head: 91.91% // Increases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #1988 +/- ##
==========================================
+ Coverage 90.89% 91.91% +1.02%
==========================================
Files 99 99
Lines 52507 60394 +7887
Branches 52507 60394 +7887
==========================================
+ Hits 47725 55513 +7788
- Misses 4782 4881 +99
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Just looks like some fuzz things need the new param/argument |
8f3fe98
to
dd8215d
Compare
A test for both checks would be nice to add as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If understanding correctly the DoS strategy adopted with this PR, I think we could be more conservative with global MAX_OUTBOUND_PEERS
/ MAX_INBOUND_PEERS
/ MAX_UNFUNDED_ALL_CHANNEL_PER_PEER
indifferently of existent channels state (or the type of them), especially in light of channel counterparties with privileged capabilities (e.g miners).
As a perspective, the top capacity nodes on the network have like ~2k of public channels (and probably few thousands more of private) so probably the same order of magnitude of peers. If a node operator would like to have more peers than the conservative default values, we can have a config setting, with the explicit warning to deploy additional DoS protection (at the system-level or at the application-level). The peers can be "allowlisted" and from then the number of channel opening unbounded.
(If this approach is evaluated as more sound, we can introduce the flexibility latter on).
dd8215d
to
f227010
Compare
Basically rewrote the whole thing, this time with tests. |
f227010
to
7f9ee5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this indeed solves #1889 but also introduces new problems.
- An adversary can take up all our active connection slots so that no one else can connect to us.
- An adversary can take up all our unfunded channel slots so that no one else can open any channels to us.
Both of these things would be trivial for the adversary to do, and with the current low limit of 50 unfunded channels/peers, it would only take a couple seconds.
There may not be a perfect solution here, but perhaps @t-bast's suggestion is an acceptable compromise -- once we hit our channel/peer limits we can still allow new connections and channels from nodes that already have another public channel (i.e. have paid onchain fees).
lightning/src/ln/channelmanager.rs
Outdated
} | ||
node.is_connected | ||
}); | ||
if inbound && !this_peer_has_funded_channels && connected_peers_with_unfunded_channels >= MAX_NO_CHANNEL_PEERS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this enables an adversary to block any new connections to us by simply maintaining 50 open connections with different node IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but that's true no matter what we do, with an fd limit of 1024 that's just what we get. At least this way we get outbound connections and protect our channel peers.
lightning/src/ln/channelmanager.rs
Outdated
/// node has. | ||
/// | ||
/// If the peer in question has funded channels, however, zero will be returned. | ||
fn peers_with_unfunded_channels<Filter>(&self, mut filter: Filter) -> usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is a little too complex with the filter. I found it difficult to follow the code, so I'm afraid bugs may hide in here (if not now, in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, though I really struggle to clean it up and still accomplish the goals - we want to protect our channel and outbound peers, accept connections if we have spare slots, but reject otherwise. On the open channel end we shouldn't let a bunch of inbound no-channel peers slow us down, but we also shouldn't let a peer open an arbitrary number of channels just because they have one. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think peers_with_unfunded_channels
is better or worst than few more critical code paths we have in onchaintx.rs
/ package.rs
and other locations. Code quality sounds good enough to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because other parts of our code are complicate doesn't mean we should make this part complicated too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could improve readability with a couple simple changes. We could create 2 new helper functions has_funded_channel(node_id)
and unfunded_channel_count(node_id)
and use those instead of extracting that information from the filter functions. This would then allow us to simplify the filters to one-line functions that only take a node
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, did basically this, but only added one additional helper, I'm not sure that has_funded_channels
is that much more readable than unfunded_channel_count != channel_by_id.len()
lightning/src/ln/channelmanager.rs
Outdated
|
||
/// The maximum number of peers which we do not have a (funded) channel with. Once we reach this | ||
/// many peers we reject new (inbound) connections. | ||
const MAX_NO_CHANNEL_PEERS: usize = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant seems to be used in 2 different ways:
- the max number of connected peers with only unfunded channels OR with no channels
- the max number of peers with only unfunded channels
This is confusing to me. Perhaps there should be two constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, its now split out, and we even let more peers connect that aren't opening channels.
I think there is another complementary non-perfect solution here -- When we hit the limit of peers without funded channels, evict the peer either randomly or based on some decision criteria (e.g oldest peer, lowest capacity channel) to make room for new connections. Those new connections are expected to announce an inbound channel opening and the corresponding funding UTXOs before a given delay to avoid being bookmarked for eviction. An adversary could still make a dance of UTXOs without confirming transactions, however you can banlist those UTXOs. I think the t-bast solution can be also generalised to any peer announcing the ownership of a public channel. It doesn't have to be a public channel opened with us for the mitigation to give some DoS relief. |
I don't buy that any of these solutions are materially better than the current code - default open files limit on most machines is 1024, so at best we get maybe 500-750 connections before we start stomping on things. Even if we assume we're happy with that many unfounded channels that's still a trivial number. Creating 500 public nodes (with one transaction on-chain, even!) costs almost nothing, so it's not a big barrier. It's a fine trick to add a small barrier, but certainly not worth wiring the network graph up to unrelated parts of the code for. The only thing we can really do here is let the node operator figure it out, or create an outbound/dual-funded channel. |
I think here we have to dissociate concerns: a) potential DoS risks due to number of peers connections/outbound channels and b) an exhaustion of our channels/peers slots preventing material operations of the targeted Lightning node. Or to be more precise, the implementation of limit to address a) might introduce a risk of b). For addressing a) in the default deployment on most standard *NIX boxes, I think we can rely on cgroups/ulimit, and defer to the node operator fine-tuning in function of use-case. I agree with Matt (the blue one), the state of unfunded channels sounds trivial. For addressing b) the node operator can have peers allowlist strategy, or if we would like something more efficient for all implementations it should be thought at the spec-level. Will review code more soon. |
Notably, I'd very much like to bump these limits substantially, which would let us get away with things like accepting tens of thousands of channels at a time from nodes that are existing public nodes, but in order to do so I'd like to do #1987 first. After that we should revisit this and bump the limits substantially as there will be zero persistent state per channel until its funded. |
@ariard: If we can get by for now with peer/channel limits, I think some sort of UTXO proof scheme (as previously discussed for v2 funding) would make these kinds of DoS attacks expensive enough to not be a problem.
@TheBlueMatt: Seems like a reasonable plan to me. Limits in that ballpark along with a "proof of onchain fee" requirement should make DoS expensive enough to not be a problem. I guess the main question is whether that proof of fee would come from spec level changes or querying the network graph for public nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done a coverage, would be good to have another look on test coverage.
@morehouse Yes -- A proof of UTXO scheme could be a viable solution for this type of attacks (in the same way we have dual-funding slots exhaustion issue). However, I think the economic binding of the mitigation would have to be studied carefully as during period of low-fees, an adversary can recycle its UTXO set for a cost under your yield return on your channel slots.
lightning/src/ln/channelmanager.rs
Outdated
/// with unfunded channels. | ||
/// | ||
/// Because it is an indication of trust, inbound channels which we've accepted as 0conf are | ||
/// exempted from the count of unfunded channels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The 0conf bypass could be abused by those peers to inflate the per-peer state, as such you SHOULD monitor your number of accepted 0confs if you're accepting "low-trust" 0conf peers".
I think most of the time, or at least for today's style of deployment with your default LSP, there should not be issue with your 0conf peers. Still I think we should see "0conf" trust as a scale (e.g LSP accumulated years of activity) and not something as binary.
"If you're operating a JIT inbound channel services towards your peers, a service user could abuse this features to inflate the per-peer state, as such you SHOULD deployed additional resources control to prevent such abuses. In the future LDK might offer more fine-tune peers/channels resource control management".
Where the JIT inbound channel model is understood as: https://github.com/BitcoinAndLightningLayerSpecs/lsp/blob/main/channel-request.md
This should address #1995
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty strongly implied. To make it clearer I noted in the in the above paragraph that this is "to avoid DoS"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so certain the implication will be clear for users, and in any case with DoS-related issues, clearer we're with documentation better it is. That's okay, letting #1995 and I'll propose better documentation latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should leave an issue around for this, we already have enough random "improve docs on X" issues :(. I added one more sentence to the second paragraph here, let me know if you have further suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good with the new sentence, it is very clear the users should be careful there. Yeah yeah I know we have a lot of random "improve docs on X", at the same time we're a library and I still have in mind the "safe API", hard to misuses/shootgun yourself standard in mind. Closing #1995.
I think we'd want to do it based on the network graph. Its still a bit awkward because a peer can't open their first channel with us, which is pretty limiting for eg LSP use-cases, but proving existing on-chain channels is similarly limiting. I'm also very much not a fan of piping the network graph through to the |
83d0b1c
to
9a69609
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified the tests cover the newly added paths as expected. Feel free to squash.
d13e42a
to
a972510
Compare
Addressed the feedback and went ahead and squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, just see with new test_0conf_limiting()
we might still have missing a small check.
lightning/src/ln/channelmanager.rs
Outdated
@@ -573,6 +577,15 @@ pub type SimpleRefChannelManager<'a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, M, T, F, L> = C | |||
/// offline for a full minute. In order to track this, you must call | |||
/// timer_tick_occurred roughly once per minute, though it doesn't have to be perfect. | |||
/// | |||
/// To avoid trivial DoS issues, ChannelManager limits the number of inbound connections and | |||
/// inbound channels which have not yet had their funding transaction confirmed. This may result in | |||
/// nodes which we do not have a channel with being unable to connect to us or open new channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"To avoid trivial DoS issues, ChannelManager limits the number of inbound peer connections and inbound channels without funding transaction confirmed.
This may result in nodes(with no channels to us) being unable to connect or open new channels if we currently have many peers with unfunded channels."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this is clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which have not yet had their funding transaction confirmed -> "without funding transaction confirmed"
nodes which we do not have a channel with being unable -> "nodes(with no channels to us) being unable"
I find it a bit shorter and needing less mental cycles. Was bit confused at first glance with current comment.
NBD, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the first suggestion (had missed it), but the second one I'm not sure makes sense - the "we dont have a channel with them" clause is key, not something to move to a parenthetical IMO.
lightning/src/ln/channelmanager.rs
Outdated
// Get the number of peers with channels, but without funded ones. We don't care too much | ||
// about peers that never open a channel, so we filter by peers that have at least one | ||
// channel, and then limit the number of those with unfunded channels. | ||
let peers_without_funded_channels = self.peers_without_funded_channels(|node| !node.channel_by_id.is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per this comment,
this variable and on L:4233 could be better named as peers_with_only_unfunded_channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it could be named better, but I'm not sure "with only unfunded channels" communicates clearer that it has a channel, but only unfunded ones, rather I'd expect "with only unfunded channels" to be like "without funded channels" - implemented as a loop that returns false if there's a funded channel, and true otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean maybe something like peers_with_channels_but_none_funded
, but might be verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried channeled_peers_without_funding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From range-diff, code review ACK 746d5f2
746d5f2
to
3833d70
Compare
Rebased |
3833d70
to
3622d55
Compare
Any chance you want to take another look at this @dunxen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 3622d55
// Get the number of peers with channels, but without funded ones. We don't care too much | ||
// about peers that never open a channel, so we filter by peers that have at least one | ||
// channel, and then limit the number of those with unfunded channels. | ||
let channeled_peers_without_funding = self.peers_without_funded_channels(|node| !node.channel_by_id.is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, when we introduce all the abstractions for collaborative transactions, we might have pending transactions with contributed inputs (i.e funded) but not broadcast yet as we have not received tx_signatures
message. We might have to just to rename all the "funding" wording introduced by this PR, though yeah let's bikesheds when it does happen.
Ping @dunxen |
Hmm not sure how I missed the previous mention 👀 |
Its useful for the message handlers to know if a peer is inbound for DoS decision-making reasons.
3622d55
to
3f3daed
Compare
Rebased. |
Because we store some (not large, but not zero) state per-peer, it's useful to limit the number of peers we have connected, at least with some buffer. Much more importantly, each channel has a relatively large cost, especially around the `ChannelMonitor`s we have to build for each. Thus, here, we limit the number of channels per-peer which aren't (yet) on-chain, as well as limit the number of (inbound) peers which don't have a (funded-on-chain) channel. Fixes lightningdevkit#1889
3f3daed
to
d5fb804
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge Code Review ACK d5fb804
Nice if we can land quickly 0.0.114 with that.
0.0.114 - Mar 3, 2023 - "Faster Async BOLT12 Retries" API Updates =========== * `InvoicePayer` has been removed and its features moved directly into `ChannelManager`. As such it now requires a simplified `Router` and supports `send_payment_with_retry` (and friends). `ChannelManager::retry_payment` was removed in favor of the automated retries. Invoice payment utilities in `lightning-invoice` now call the new code (lightningdevkit#1812, lightningdevkit#1916, lightningdevkit#1929, lightningdevkit#2007, etc). * `Sign`/`BaseSign` has been renamed `ChannelSigner`, with `EcdsaChannelSigner` split out in anticipation of future schnorr/taproot support (lightningdevkit#1967). * The catch-all `KeysInterface` was split into `EntropySource`, `NodeSigner`, and `SignerProvider`. `KeysManager` implements all three (lightningdevkit#1910, lightningdevkit#1930). * `KeysInterface::get_node_secret` is now `KeysManager::get_node_secret_key` and is no longer required for external signers (lightningdevkit#1951, lightningdevkit#2070). * A `lightning-transaction-sync` crate has been added which implements keeping LDK in sync with the chain via an esplora server (lightningdevkit#1870). Note that it can only be used on nodes that *never* ran a previous version of LDK. * `Score` is updated in `BackgroundProcessor` instead of via `Router` (lightningdevkit#1996). * `ChainAccess::get_utxo` (now `UtxoAccess`) can now be resolved async (lightningdevkit#1980). * BOLT12 `Offer`, `InvoiceRequest`, `Invoice` and `Refund` structs as well as associated builders have been added. Such invoices cannot yet be paid due to missing support for blinded path payments (lightningdevkit#1927, lightningdevkit#1908, lightningdevkit#1926). * A `lightning-custom-message` crate has been added to make combining multiple custom messages into one enum/handler easier (lightningdevkit#1832). * `Event::PaymentPathFailure` is now generated for failure to send an HTLC over the first hop on our local channel (lightningdevkit#2014, lightningdevkit#2043). * `lightning-net-tokio` no longer requires an `Arc` on `PeerManager` (lightningdevkit#1968). * `ChannelManager::list_recent_payments` was added (lightningdevkit#1873). * `lightning-background-processor` `std` is now optional in async mode (lightningdevkit#1962). * `create_phantom_invoice` can now be used in `no-std` (lightningdevkit#1985). * The required final CLTV delta on inbound payments is now configurable (lightningdevkit#1878) * bitcoind RPC error code and message are now surfaced in `block-sync` (lightningdevkit#2057). * Get `historical_estimated_channel_liquidity_probabilities` was added (lightningdevkit#1961). * `ChannelManager::fail_htlc_backwards_with_reason` was added (lightningdevkit#1948). * Macros which implement serialization using TLVs or straight writing of struct fields are now public (lightningdevkit#1823, lightningdevkit#1976, lightningdevkit#1977). Backwards Compatibility ======================= * Any inbound payments with a custom final CLTV delta will be rejected by LDK if you downgrade prior to receipt (lightningdevkit#1878). * `Event::PaymentPathFailed::network_update` will always be `None` if an 0.0.114-generated event is read by a prior version of LDK (lightningdevkit#2043). * `Event::PaymentPathFailed::all_paths_removed` will always be false if an 0.0.114-generated event is read by a prior version of LDK. Users who rely on it to determine payment retries should migrate to `Event::PaymentFailed`, in a separate release prior to upgrading to LDK 0.0.114 if downgrading is supported (lightningdevkit#2043). Performance Improvements ======================== * Channel data is now stored per-peer and channel updates across multiple peers can be operated on simultaneously (lightningdevkit#1507). * Routefinding is roughly 1.5x faster (lightningdevkit#1799). * Deserializing a `NetworkGraph` is roughly 6x faster (lightningdevkit#2016). * Memory usage for a `NetworkGraph` has been reduced substantially (lightningdevkit#2040). * `KeysInterface::get_secure_random_bytes` is roughly 200x faster (lightningdevkit#1974). Bug Fixes ========= * Fixed a bug where a delay in processing a `PaymentSent` event longer than the time taken to persist a `ChannelMonitor` update, when occurring immediately prior to a crash, may result in the `PaymentSent` event being lost (lightningdevkit#2048). * Fixed spurious rejections of rapid gossip sync data when the graph has been updated by other means between gossip syncs (lightningdevkit#2046). * Fixed a panic in `KeysManager` when the high bit of `starting_time_nanos` is set (lightningdevkit#1935). * Resolved an issue where the `ChannelManager::get_persistable_update_future` future would fail to wake until a second notification occurs (lightningdevkit#2064). * Resolved a memory leak when using `ChannelManager::send_probe` (lightningdevkit#2037). * Fixed a deadlock on some platforms at least when using async `ChannelMonitor` updating (lightningdevkit#2006). * Removed debug-only assertions which were reachable in threaded code (lightningdevkit#1964). * In some cases when payment sending fails on our local channel retries no longer take the same path and thus never succeed (lightningdevkit#2014). * Retries for spontaneous payments have been fixed (lightningdevkit#2002). * Return an `Err` if `lightning-persister` fails to read the directory listing rather than panicing (lightningdevkit#1943). * `peer_disconnected` will now never be called without `peer_connected` (lightningdevkit#2035) Security ======== 0.0.114 fixes several denial-of-service vulnerabilities which are reachable from untrusted input from channel counterparties or in deployments accepting inbound connections or channels. It also fixes a denial-of-service vulnerability in rare cases in the route finding logic. * The number of pending un-funded channels as well as peers without funded channels is now limited to avoid denial of service (lightningdevkit#1988). * A second `channel_ready` message received immediately after the first could lead to a spurious panic (lightningdevkit#2071). This issue was introduced with 0conf support in LDK 0.0.107. * A division-by-zero issue was fixed in the `ProbabilisticScorer` if the amount being sent (including previous-hop fees) is equal to a channel's capacity while walking the graph (lightningdevkit#2072). The division-by-zero was introduced with historical data tracking in LDK 0.0.112. In total, this release features 130 files changed, 21457 insertions, 10113 deletions in 343 commits from 18 authors, in alphabetical order: * Alec Chen * Allan Douglas R. de Oliveira * Andrei * Arik Sosman * Daniel Granhão * Duncan Dean * Elias Rohrer * Jeffrey Czyz * John Cantrell * Kurtsley * Matt Corallo * Max Fang * Omer Yacine * Valentine Wallace * Viktor Tigerström * Wilmer Paulino * benthecarman * jurvis
Because we store some (not large, but not zero) state per-peer,
it's useful to limit the number of peers we have connected, at
least with some buffer.
Much more importantly, each channel has a relatively large cost,
especially around the
ChannelMonitor
s we have to build for each.Thus, here, we limit the number of channels per-peer which aren't
(yet) on-chain, as well as limit the number of (inbound) peers
which don't have a (funded-on-chain) channel.
Fixes #1889