-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4/5]sweep: deepen Aggregator
interface
#8148
[4/5]sweep: deepen Aggregator
interface
#8148
Conversation
cbd7fb2
to
42627ed
Compare
cc: @Roasbeef for approach ack here |
IIUC, this PR does a few things:
Am I misunderstanding or missing anything? I would find this easier to review if it was broken into smaller PRs. I think we could peel off some straightforward refactoring PRs and leave this PR for the new stuff. |
For the RBF info, why load it from the DB when we can extract it directly from the spend tx in the mempool? I imagine this would be more resilient in situations where someone besides us could broadcast a competing transaction (e.g., HTLC sweeps, anchors). IIUC, the current code just "gives up" if someone else attempts a sweep of the inputs first. What's the reasoning here? |
@morehouse nice summary! Yeah can def create smaller PRs, but want to have a concept ACK first.
We def can, like a combination of |
Concept ACK.
Is this change needed? What alternative
Concept ACK.
Concept ACK.
Concept ACK. I'm not 100% sure about the best approach for implementing this yet -- will think some more.
I think the current implementation also doesn't work for neutrino then, since it relies on checking the mempool to decide whether to load the RBF info from the DB. |
I think we should try to adhere to RBF as closely as possible so we can ensure that we can get our sweep transactions in the mempool. There are some things missing in this PR, I think due to the sweeper's current architecture that prevent us from aligning with Core's RBF policy (https://github.com/bitcoin/bitcoin/blob/master/doc/policy/mempool-replacements.md). This might take several PRs and a bit of a rework to get right.
|
42627ed
to
2c76fc2
Compare
Important Auto Review SkippedAuto reviews are disabled on base/target branches other than the default branch. Please add the base/target branch pattern to the list of additional branches to be reviewed in the settings. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Aggregator
interface
50d7bf7
to
1156e5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another banger. This helps clarify some of my confusion from some previous PRs. I like that we only have the InputSet now instead of both Cluster and InputSet. Had some questions and comments to look at but it's small.
@@ -86,7 +224,19 @@ func (s *SimpleAggregator) ClusterInputs(inputs pendingInputs) []inputCluster { | |||
// Since the inputs that we clustered by fee rate don't commit to a | |||
// specific locktime, we can try to merge a locktime cluster with a fee | |||
// cluster. | |||
return zipClusters(lockTimeClusters, feeClusters) | |||
clusters := zipClusters(lockTimeClusters, feeClusters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the assumed order of these cluster lists? It's not obvious why I should be able to zip these clusters together this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function could be renamed to make it a bit more clear, but what it does today is merged the two lists, preserving the fee rate ordering from the first. In other words, lockTimeClusters
will have merged into the items from feeClusters
that are as high as the element in the lock time cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be careful here as a zip operation (pairwise combination) needs to make sure there is proper index alignment.
Diving into the implementation details it seems that this zipClusters operation isn't actually a zip (at least with the terminology I'm familiar with) because it does two independent sorts by feerate on each vector of inputs. What concerns me further here is that since feerates are not enumerable it means that we can't really reason about the relative densities between the two slices. If we don't have some sort of guarantee on how dense each slice is wrt feerate gain, then a single advancement of the j
cursor per i
cursor advancement may not make sense. I'm still trying to wrap my head around what the cluster operations are. Maybe a high level pseudocode description without loops and cursors is needed.
My understanding here is that clusterByLocktime
allocates a bunch of locktime buckets, into which we are distributing the non-locktime inputs provided they meet some sort of fee distance criteria (still trying to figure out exactly what criteria that is).
that are as high as the element in the lock time cluster
I don't see any guarantee that there is a single element in the locktime cluster, so maybe it is if the input is at least as high as the cluster average feerate? That would make sense to me because we wouldn't want to drag down the feerate of a cluster that had time sensitivity.
@yyforyongyu can you confirm this interpretation for me and then maybe we can work on making it more obvious in code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the current clustering logic is unnecessarily complicated, I'm in favor of removing SimpleAggregator
once the BudgetAggregator
is in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but can we confirm or correct my above interpretation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've learnt from TestZipClusters
, think once talked about this in a study group - my understanding is that, once clustered by locktimes and fee rate buckets, these clusters are merged if the timelock clusters have a smaller fee rate than the fee rate clusters.
The effect is, for locktime clusters, their fee rates will be increased if the non-locktime clusters have higher ones.
sweep/aggregator_test.go
Outdated
|
||
// TestCreateInputSets checks that ... | ||
func TestCreateInputSets(t *testing.T) { | ||
// TODO(yy): add it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙃
sweep/sweeper.go
Outdated
if err != nil { | ||
log.Errorf("input cluster sweep: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we ought to do more than logging here?
sweep/sweeper.go
Outdated
// Reschedule the inputs that we just tried to sweep. This is done in | ||
// case the following publish fails, we'd like to update the inputs' | ||
// publish attempts and rescue them in the next sweep. | ||
s.rescheduleInputs(tx.TxIn) | ||
err = s.markInputsPendingPublish(tr, tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that only the tx record is stored on disk (and not al information for all the inputs), what use is the pending vs published state? We don't attempt to recover the past published inputs from disk on start up, so knowing if something was pending published vs actually published (given if we fail, we'll mark the publish as fail) isn't necessarily an actionable item.
chainntnfs/mocks.go
Outdated
|
||
args := m.Called(op) | ||
|
||
if args.Get(0) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can it be nil if a concrete outpoint is passed in?
1156e5c
to
a7cd955
Compare
421f82d
to
5a14b90
Compare
3b5fcfc
to
25605ac
Compare
5a14b90
to
3e70f2f
Compare
25605ac
to
448867b
Compare
3e70f2f
to
ae42569
Compare
448867b
to
7642d3a
Compare
ae42569
to
dba41a2
Compare
7642d3a
to
f4dc5f5
Compare
dba41a2
to
0f693c4
Compare
f4dc5f5
to
cfed3c0
Compare
This commit changes the source that drives the state changes in the sweeper. Previously we used a ticker with default interval of 30s to trigger sweepings periodically. The assumption is, within this 30s we'd batch multiple inputs into one transaction to maximize profits. However, the efficacy of this batch is questionable. At a high level, we can put our inputs into two categories - one that's forced, and one that's not. For forced inputs, we should sweep them immediately as the force flag indicates they are very urgent, eg, CPFPing the force closing tx. For non-forced inputs, such as anchors or HTLCs with CLTV that's far away, we can wait to sweep them till a new block comes in and triggers the sweeping process. Eventually, all inputs will be deadline-aware, and the sweeper will consult our fee bumper about the most economical fee rate to be used for a given deadline. Since the deadlines here are blockstamp, it's also easier to manage them if the sweeper is also using blockstamp instead of timestamp.
This commit adds a new interface `Cluster` to manage cluster-level inputs grouping. This new interface replaces the `inputCluster` and will be futher refactored so the sweeper can use a much smaller coin selection lock.
Previously the fee rate is tracked at cluster level, which may not be accurate as each cluster is then divided into input sets. And these sets are what's actually included in the final sweeping tx. To properly reflect the final fee rate used in the sweeping tx, `InputSet` is added so more customized clustering logic can be implemented in the future. For intance, atm it's clustered by fee rates, while we may also cluster by deadlines, urgencies, etc.
This commit makes the `ClusterInputs` directly returning the `InputSet` so the sweeper doesn't know about the existence of `Cluster` interface. This way we can have a deeper interface as the sweeper only needs to interact with `Aggregator` only to get the final input sets, leaving the implementation details being managed by `SimpleAggregator` and future aggregators.
This commit changes how `WithCoinSelectLock` is used - previously the lock is held when creating the input sets, now it's only be held after the input sets have been created and explicitly signal they need wallet inputs.
cfed3c0
to
7bb12c5
Compare
af9826c
into
lightningnetwork:elle-sweeper-rbf-2
@@ -320,7 +320,7 @@ func TestClusterByLockTime(t *testing.T) { | |||
) | |||
|
|||
// Create a test aggregator. | |||
s := NewSimpleUtxoAggregator(nil, maxFeeRate) | |||
s := NewSimpleUtxoAggregator(nil, maxFeeRate, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constant here: DefaultMaxInputsPerTx ?
@@ -1609,8 +1609,9 @@ | |||
|
|||
[sweeper] | |||
|
|||
; Duration of the sweep batch window. The sweep is held back during the batch | |||
; window to allow more inputs to be added and thereby lower the fee per input. | |||
; DEPRECATED: Duration of the sweep batch window. The sweep is held back during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even delete it from the sample.conf because it has no effect. DEPRECATED in my understand suggests that it should not be used but will still work correctly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the linter won't pass if it's not removed, I think in 0.19 I'll remove lots of the deprecated options, just need to figure out how to not break user's scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a security concern about reducing the sweeping interval to once per block -- see inline comment.
Perhaps we should keep the 30s sweep ticker.
"sweeping %d inputs", epoch.Height, len(inputs)) | ||
|
||
// Attempt to sweep any pending inputs. | ||
s.sweepPendingInputs(inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By not broadcasting new inputs until the next block confirms, we effectively reduce all confirmation windows by 1. So if the deadline is set to 1 (next block), we will automatically miss it.
Obviously the goal is to offer the sweeper inputs much earlier, but if the node is late for some reason (e.g., was offline) this could be the difference between recovering funds and losing them.
This also may make it significantly easier to steal funds via DoS attacks -- after each restart the victim will wait ~10 minutes (1 block) before attempting to broadcast the sweep transaction instead of 30 seconds. This gives the attacker a lot of time to crash the victim again. Consider the following attack:
- Mallory routes an HTLC through Alice to Bob.
- Bob claims the HTLC from to Alice.
- Mallory ignores Alice's attempt to claim the HTLC.
- 10 blocks before the HTLC times out, Alice's contractcourt decides to go onchain and offers the anchor spend to the sweeper. The sweeper starts waiting for the next block to broadcast.
- Mallory crashes Alice using some unpatched DoS vector.
- Every time Alice comes back online, Mallory starts DoSing her again. As long as Mallory can keep crashing Alice before the next block confirms, Alice's sweeper will never broadcast any transactions.
- After 10 blocks, Mallory claims the HTLC timeout path on chain, successfully stealing the HTLC from Alice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can crash you every block why couldn't I crash you every 30s?
I don't see an issue to moving to a timer, though. If you think that makes a material difference, we can probably change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can crash you every block why couldn't I crash you every 30s?
Maybe, maybe not. Some DoS attacks cause crashes instantly (e.g., nil dereference, OOB), others overload the victim more slowly and take longer to cause crashes (e.g., OOM, disk-based DoS).
I still think our best defense against DoS is #8166. But even if we have that, we still need to ensure that sweep transactions get broadcast immediately-ish on startup. If we have to wait 10 minutes, #8166 doesn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm don't we have the Force
Flag in place for time-sensitive sweeps which would let us sweep the inputs during restarts as soon as the resolvers register it with the sweeper ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fix the DoS and prevent it from happening, rather than trying to build a system assuming there's one - I don't think there's much you can do in that case.
Some users don't use the default 30s, our docs recommend 10m, and some may choose hours. And in the given example, if I'm the attacker, I'd crash Alice's node before step 4, rendering her contractcourt
useless.
after each restart the victim will wait ~10 minutes (1 block) before attempting to broadcast the sweep transaction instead of 30 seconds.
I think what's missing from our sweeper is the ability to recover from restart - which has been discussed multiple times in these PR series. The sweeper is almost stateless atm, and it should pick up the existing pending inputs during restart, and offer them to the fee bumper immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Designing the system to protect funds under DoS is within reach, so let's do it.
Is it? I am not so sure I agree.
#8166 seems well within reach to me. I think significantly less work is needed than has been done on this set of sweeper PRs already.
The premise you're starting with is that there are DoS vectors that are abundant, and if DoS can take the node down, and there exist deadline based mechanisms in LN, funds will always be at stake. What am I missing?
Define abundant? All an attacker needs is 1 DoS vector to cause mayhem with current LND nodes. Are you confident there are 0 DoS vectors in LND today and forever into the future?
The goal is to keep funds safe even if the node is repeatedly crashed.
I definitely agree that we should begin sweeping on startup. There is no reason to delay arbitrarily. This seems like an orthogonal issue to whether subsequent sweep operations are block driven or some other heartbeat driven.
Yes, if we set up the contractcourt and sweep on startup before allowing any peer connections, then #8166 has essentially been accomplished. That's the goal.
Until then, a 30s delay before sweeping on startup seems better than 10 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All an attacker needs is 1 DoS vector to cause mayhem with current LND nodes. Are you confident there are 0 DoS vectors in LND today and forever into the future?
Absolutely not :).
I'm also not really contesting this. What I'm contesting is what can be done about it. I think #8166 is a generally good idea but it ignores the risks incurred by being disconnected from our peers. I think the assumption that we have to be theoretically crash proof at all times is both unrealistic and a distraction from more pressing issues. I think if you want to make things more crash proof, then decoupling the subsystems is yet another way to handle it. This way crashing the peer interface doesn't take down the whole node, just that particular system.
My main question is that if we're starting from having failure assumptions "someone can crash my node", how broken is reasonable for us to design for? Why not design things in a way where each subsystem can't rely on any assumptions from any other system? It quickly gets into territory that doesn't make any sense.
I think fault tolerance is a good goal but I also think we need to be careful to not drive ourselves into spending lots of effort on the ever rarer tail events and not accomplish anything while we try to cover edge cases that don't materialize.
I'm open to having my mind changed here, I just generally don't find myself compelled by the argument. One thing that could help me sway is for you to explain why the line in the sand you've happened to draw is unique, and not just an arbitrary point on the continuum towards maximally fault-tolerant software that doesn't consider other dimensions of the problem. Otherwise the slope to me seems rather slippery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mallory crashes Alice using some unpatched DoS vector.
Every time Alice comes back online, Mallory starts DoSing her again. As long as Mallory can keep crashing Alice before the next block confirms, Alice's sweeper will never broadcast any transactions.
So let's say that Alice had a trigger to not accept any connections until all contracts have been resolved. How do we define "resolved" here? Will Alice wait until ultimate confirmation of her sweep? If so, then we also need to be ready for the possibility that there's some crazy fee spike, and confirmation takes much longer than anticipated. If we don't accept any connections during this interval, then it's possible that other active HTLCs we had with other peers, or even other channels with Alice also times out.
In short, we also need to weigh the opportunity costs of time outs and force closes of other channels if we don't reconnect to any peers until all on chain contracts have been resolved.
I think what we need to move forward here is a concrete proposal w.r.t what the trigger is to resume reconnections re #8166. I think in order to fully isolate we shouldn't even create the listener until we've resolved everything on chain. Of course, kernel level network DoS would still apply, but we only control the application layer in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can re-add the timer in addition to the block based model, but in itests, we'd essentially disbale the timer all together (as they were all updated to be block based after this change).
Before this series of changes, we had the timer as we would at times knowingly broadcast garbage. So we had to keep retrying in the hop that we actually broadcasted something that would confirm, which would then let us retarget to remove the other garbage. With the new approach, we'll no longer broadcast anything that wouldn't be accepted into our node's mempool, so the move to block based (the only time the UTXO state can change) made more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a more concrete explanation to #8166 (comment). Let's move that discussion over there.
Assuming the new sweeper will be merged before any work on #8166, I think it's worth adding the sweep timer back in for now. It can be removed once properly do sweeping on startup.
This PR prepares for #8424 by providing two main changes,
Force
flag andUtxoSweeper
will now sweep them immediately, which is also different as previously thisForce
flag would only trigger the ticker to start.Aggregator
interface is deepened and now return theInputSet
interface. ThisInputSet
gives a set of inputs to be included in the final sweeping transaction. It also tellsUtxoSweeper
whether the current set needs wallet inputs and, if so, provides a method to add them to the set. On the other hand, making this new interface makes it easier to write unit tests.In the final PR, new implementations of
Aggregator
andInputSet
are created to provide the final fee bumper.TODO
Fixes #8391