-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5/5]sweep: introduce budget-based deadline aware fee bumper #8424
[5/5]sweep: introduce budget-based deadline aware fee bumper #8424
Conversation
Important Auto Review SkippedAuto reviews are disabled on base/target branches other than the default branch. Please add the base/target branch pattern to the list of additional branches to be reviewed in the settings. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Looking for concept ACK for the last 9 commits, which are commits after |
cc: @Roasbeef for concept ack |
Fundamentally, how much do we save by doing batched sweeps? Other implementations don't do this at all. I imagine we could make RBFs much simpler and less error-prone without the batching. And there are other reasons to do less batching:
|
Yeah also had the same question - I think it only helps with an FC that has many HTLCs, or multiple FCs happening at the same time. Need more empirical data tho. If the scenario described in
I think the general idea here is we don't need to care about mempool rules anymore if |
But the reason that If we stop batch sweeping, these problems go away. |
Yeah that's the next step, to map all the errors.
So does the benefit? I guess we can test it out, like add a new implementation of On the other hand, I don't think the problems would just go away. As @Crypt-iQ mentioned here, it seems only CPFP-purpose anchor sweeping would cause violation of rule 2, if not aggregated right. Other than that, I think we are facing the same when RBFing a single input? |
If the peer is pinning the commitment transaction, and LND is trying to batch the anchor spend with another anchor spend, CPFP carve-out will not work, and rule 5 can be violated.
We wouldn't have to worry about RBF rules 2 or 5 if we stop batching. The other rules we wouldn't have to worry about either since we can blindly fee bump and try |
Jut to be sure, I'm referring rule numbers here and I think you mean rule 2 right? It seems to me it's the CPCP purpose anchor output is the most complicatedly one, and I think if we are concerned we can do the CPFP in |
Ah, sorry, it's not an RBF rule at all! What I was referring to is the descendant limit (25 transactions). A CPFP will be rejected if it exceeds the descendant limit, unless CPFP carve-out is used.
+1. I've created #8433 for further discussion about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach seems fine, though I have a number of comments on implementation details. Some larger than others.
sweep/tx_input_set.go
Outdated
// txns, the same tx-level `nLockTime` must be the same in order for | ||
// them to be aggregated into the same sweeping tx. This `nLockTime` | ||
// value is implicitly expressed as the deadline height. | ||
if b.deadlineHeight != pi.params.DeadlineHeight { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be >
instead of !=
, even if we choose not to cluster things as aggressively we can safely add an input with a later deadline to a set with a specified earlier one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if its not a covenant where we already have the signature of the peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that includes the ss-htlc txs too... 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I came up with a way to solve the issue, with the help of fn.Option
. It goes like this,
- for ss-htlcs, they must specify deadlines to
fn.Some
, with theirnLocktime
values. - for non-time-sensitive outputs, they specify their deadlines to
fn.None
. - when aggregating, all
fn.Nonce
will be put in one set, allowing us to sweep them in one tx.
This can be extended to allow adding a later deadline input, as long as it's fn.None
.
// the event channel on the record. Any broadcast-related errors will not be | ||
// returned here, instead, they will be put inside the `BumpResult` and | ||
// returned to the caller. | ||
func (t *TxPublisher) broadcast(requestID uint64) (*BumpResult, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing that this publisher doesn't really persist across restarts. Isn't that gonna be a problem? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sweeper doesn't really persist any states either. IIUC, the way this'll recover after restart is by using testmempoolaccept
to determine if a new fee rate step actually needs to happen or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe we should reload the pending sweep txns on restart and re-offer them to the publisher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the downside to doing so. Either they get rejected because they are already there and not sufficiently rbf'ed in which case we say "cool thanks 👍🏼" or it accepts it and we say "cool thanks 👍🏼".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe we should reload the pending sweep txns on restart and re-offer them to the publisher?
So today the it's the job of the resolvers to detect that after a restart their inputs weren't swept to re-offer.
I think the only gap in this would be user issued fee bumping requests.
// the event channel on the record. Any broadcast-related errors will not be | ||
// returned here, instead, they will be put inside the `BumpResult` and | ||
// returned to the caller. | ||
func (t *TxPublisher) broadcast(requestID uint64) (*BumpResult, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sweeper doesn't really persist any states either. IIUC, the way this'll recover after restart is by using testmempoolaccept
to determine if a new fee rate step actually needs to happen or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 🫶
So I see we introduced the BudgetAggregator
in this PR, are we planning to make it already available for to activate it in LND 18. It feels to me that we are not quite there yet.
So I think without being able to identify the inputs which cause a potential broadcast to fail we should not activate the new aggregator wdyt ? Moreover still open question how we specify the deadlines for the non-timesenstive inputs ?
Moreover I wonder whether there is a reliable way to get information why a transaction is cannot be broadcasted using bitcoind or will this new logic need to be integrated in LND?
I think maybe we should first try to introduce the non-aggregator
which might be not super efficient but removes a lot of side effects we must think about when aggregating inputs ?
sweep/tx_input_set.go
Outdated
// txns, the same tx-level `nLockTime` must be the same in order for | ||
// them to be aggregated into the same sweeping tx. This `nLockTime` | ||
// value is implicitly expressed as the deadline height. | ||
if b.deadlineHeight != pi.params.DeadlineHeight { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if its not a covenant where we already have the signature of the peer.
f959242
to
6f4dfaa
Compare
6f4dfaa
to
f941384
Compare
d44ac7b
to
a5af885
Compare
f941384
to
36e8bab
Compare
This way it's easier to pass values to this method in various callsites.
This commit changes `markInputsPendingPublish` to take `InputSet` only. This is needed for the following commits as we won't be able to know the tx being created beforehand, yet we still want to make sure these inputs won't be grouped to another input set as it complicates our RBF process.
This commit adds `BudgetInputSet` which implements `InputSet`. It handles the pending inputs based on the supplied budgets and will be used in the following commit.
This commit adds `BudgetAggregator` as a new implementation of `UtxoAggregator`. This aggregator will group inputs by their deadline heights and create input sets that can be used directly by the fee bumper for fee calculations.
This commit adds a new interface, `Bumper`, to handle RBF for a given input set. It's responsible for creating the sweeping tx using the input set, and monitors its confirmation status to decide whether a RBF should be attempted or not. We leave implementation details to future commits, and focus on mounting this `Bumper` interface to our sweeper in this commit.
As there will be dedicated new tests for them.
As shown in the following commit, fee rate calculation will now be handled by the fee bumper, hence there's no need to expose this on `InputSet` interface.
This commit adds a new interface, `FeeFunction`, to deal with calculating fee rates. In addition, a simple linear function is implemented, hence `LinearFeeFunction`, which will be used to calculate fee rates when bumping fees. Check lightningnetwork#4215 for other type of fee functions that can be implemented.
This commit adds the method `MaxFeeRateAllowed` to calculate the max fee rate. The caller may specify a large MaxFeeRate value, which cannot be cover by the budget. In that case, we default to use the max feerate calculated using `budget/weight`.
This commit adds `TxPublisher` which implements `Bumper` interface. This is part one of the implementation that focuses on implementing the `Broadcast` method which guarantees a tx can be published with RBF-compliant. It does so by leveraging the `testmempoolaccept` API, keep increasing the fee rate until an RBF-compliant tx is made and broadcasts it. This tx will then be monitored by the `TxPublisher` and in the following commit, the monitoring process will be added.
This commit finishes the implementation of `TxPublisher` by adding the monitor process. Whenever a new block arrives, the publisher will check all its monitored records and attempt fee bumping them if necessary.
This commit adds a private type `mSatPerKWeight` that expresses a given fee rate in millisatoshi per kw. This is needed to increase the precision of the fee function. When sweeping anchor inputs, if using a deadline delta of over 1000, it's likely the delta will be 0 sat/kw due to precision.
So these inputs can be retried by the sweeper.
e45ad10
to
4bf98e7
Compare
6d64a51
to
7295ca7
Compare
Detecting bad inputs was sort of a side effect of the old back off process. With the new approach, we'll use Agreed though that more precise bad input detection can be added in follow up PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 51 of 51 files at r8, 26 of 32 files at r10, 10 of 12 files at r11, 2 of 2 files at r12, 10 of 10 files at r13, 7 of 7 files at r14, all commit messages.
Reviewable status: all files reviewed, 138 unresolved discussions (waiting on @Crypt-iQ, @ProofOfKeags, @yyforyongyu, and @ziggie1984)
With the current set of PRs re-clustering will generally group the same inputs together again (or add more inputs!), so the bad input will block the others forever. If we never cluster bad inputs with other inputs, then there's no problem. But how confident are we that we can do that? |
Here's my current understanding:
So if we don't bisect, then unless those bad inputs are actually swept by a 3rd party somehow, they'll keep on being churned through. One alternative to bisecting is that we modify the clustering to put inputs that have failed in their own cluster. This has the down side of still consuming wallet inputs at time to help to sweep those at a target fee rate. So perhaps bisection really is just the best way forward. A sketch of bisection would look like:
|
This PR introduces a budget-based deadline-aware fee bumper that solves the following issues,
In addition, this PR and its dependencies have refactored the sweeper so it's easier to question its architecture and implement more advanced fee bumping strategies in the future. The dependencies,
UtxoAggregator
#8422Aggregator
interface #8148rpcclient
#8554Replaces #7549 and fixes #4215
Definitions
A budget is the amount of sats you are willing to pay when sweeping a given input. Often the case, it's proportional to the bitcoin value of the input. For instance, say you are willing to pay 50% to sweep an HTLC of 100,000 sats, the budget would be 50,000. On the other hand, the budget is dependent on the context, or the type of inputs,
Because we define the budget beforehand, it makes sure we won't end up paying more fees than the inputs value.
A deadline defines a time preference, expressed in blocks, that an input must be confirmed by, as otherwise we may lose money. Among all the outputs generated from an FC, only three of them are time-sensitive, detailed below,
to_local
output, which is currently swept using a conf target of 6. There's no time pressure on sweeping this UTXO as it can only be swept by us.nLockTime
field as the deadline, as the deadline defines how different inputs can be grouped together, and because theSINGLE|ANYONECANPAY
sighash flag, only the txns sharing the samenLockTime
can exist in the same tx, thus we use this field as the deadline for outgoing HTLCs.Design
On a high level, when inputs are added to the
UtxoSweeper
via the entry pointSweepInput
, the sweeper will periodically check their states and ask theUtxoAggregator
to group these inputs into different input sets. Each set is sent to theBumper
to construct an RBF-compliant tx, broadcast it, and monitor it and perform fee bumps if needed. TheBumper
will consult theFeeFunction
to get a fee rate to use. In details,UtxoAggregator
is an interface that defines aClusterInputs
method which returns a list ofInputSet
interfaces. There are two implementations ofUtxoAggregator
,SimpleAggregator
which retains the old clustering logic, and a newBudgetAggregator
which groups inputs based on budget and deadline.Bumper
is implemented byTxPublisher
, which takes care of tx creation and broadcast. It leveragesTestMempoolAccept
to make the broadcast RBF-aware.FeeFunction
is implemented byLinearFeeFunction
, which is a simple function that increases the fee rate linearly based on the blocks left till deadline. There are more advanced algorithms in sweeper+contractcourt: deadline aware HTLC claims & commitment confirmation #4215, which can be easily implemented in the future.RBF tx, not inputs
There's an alternative approach which quickly ditched due to its high level of complexity. In this PR, the
Bumper
will take the input set packaged byUtxoAggregator
and perform an RBF on this set. Suppose a few blocks pass by, theUtxoAggregator
may choose to package the set differently based on current mempool condition and the budget supplied. For instance, if a set of three inputs are made, but later onUtxoAggregator
decides to create three sets, each containing one of the inputs. It would then be very difficult for theBumper
to handle the RBF. However, I want to mention it here as this can be a future optimization task.No more repeated RBF logic
Previously we have implemented simple RBF logic and CPFP rules in
lnd
, while the actual rules can be very complicated. With the help ofTestMempoolAccept
, we no longer need to care about the implementation details about these rules as we can now use this RPC as a validator to guide us creating an RBF-compliant txns. The only downside here is for neutrino backends, we don't have this RPC available - we shouldn't give up tho, with the extra fee info stored in sweeper, we can still perform a native RBF with limited efficacy, since without a mempool, there's little we can do.TODO
BumpFee
RPCThis change is