-
Notifications
You must be signed in to change notification settings - Fork 402
Fail HTLC backwards before upstream claims on-chain #2457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail HTLC backwards before upstream claims on-chain #2457
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2457 +/- ##
==========================================
+ Coverage 90.54% 91.38% +0.83%
==========================================
Files 109 111 +2
Lines 57188 68070 +10882
Branches 57188 68070 +10882
==========================================
+ Hits 51780 62203 +10423
- Misses 5408 5867 +459 ☔ View full report in Codecov by Sentry. |
18617bb
to
861baab
Compare
Added one more to do in the PR description, will make a comment when I complete it |
861baab
to
011a41a
Compare
Added last test cases to handle getting update_fail/fulfill_htlc after failing back. Once we get to a point where we fail back the backwards channel, we've already force closed our forwards channel with our timeout, so we just handle these messages as if the channel doesn't exist, which makes sense to me. |
011a41a
to
df0bdce
Compare
Squashed + rebased |
let duplicate_event = self.pending_monitor_events.iter().any( | ||
|update| if let &MonitorEvent::HTLCEvent(ref upd) = update { | ||
upd.source == *source | ||
} else { false }); | ||
if !duplicate_event { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem enough? If we process events before the next block the duplicate event will be gone and we'll queue another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't matter though as long as we can handle duplicate HTLCUpdate
events at the ChannelManager
level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I added this before I handled the duplicate HTLCUpdate events in ChannelManager - I guess maybe I should get rid of it since it's not really doing anything meaningful now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we should. We should also limit the amount of times these MonitorEvent
s will be queued somehow, as it seems we'll just continue to do so endlessly on every new block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we should.
Similar case to #2457 (comment), am I just taking debug asserts too seriously..? There's another debug assert that just prevents us from failing the same HTLC twice in one process-pending-htlc cycle (which was what this code block covered), so should I just remove it since we actually handle them fine?
We should also limit the amount of times these MonitorEvents will be queued somehow, as it seems we'll just continue to do so endlessly on every new block?
Oof yea. Added just a set to keep track of HTLC ids we've failed already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of the logic preventing generating a duplicate HTLCEvent (if we saw an onchain claim after doing our early fail back) especially now that we may not always fail back (based on the new config flag/threshold) based on this event, if we see the downstream claim, we should push that event to the channel manager.
We should also limit the amount of times these MonitorEvents will be queued somehow, as it seems we'll just continue to do so endlessly on every new block?
Added just a simple vec to keep track of HTLC ids we've failed already.
Though, now that whether we fail back an HTLC based on the new threshold could change with the feerate, should we still prevent regenerating these events on new blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I definitely agree we should do this, I do wonder if we should gate it on HTLCs only being below some threshold, or only with some config knob. Its certainly surprising to users that we'd do this.
df0bdce
to
f35e75c
Compare
Just pushed because I was sitting on some changes waiting till I could address everything - just going to add todos in the PR description and get to them soon. Also ignore 2aad324 pending me possibly just removing the debug asserts |
I was initially thinking we would always just want to keep the channel open because if we've reached our upstream timeout, we've lost that HTLC, and who knows how fast/slow their tx will confirm. But then again...we reached our own timeout downstream and we don't consider that our downstream counterparty has lost that HTLC just yet. And if we do want to risk the channel for any HTLC, we probably only want to for larger ones, and not for smaller, so I'm leaning towards an amount threshold. Although determining a value for which a channel is worth closing depends on at least feerate (if we're the funder, how much we're paying on the commitment, and then how much it might be to open a new channel for this liquidity) so a fixed amount threshold might not be great? |
Heh, duh, that's what I meant.
Yea, I mean we could def poll the fee estimator when deciding, we'd make it a multiple on the high priority feerate and default it to the size of a commitment transaction plus one HTLC plus an HTLC tx claim with a P2WPKH output used as the anchor? |
In previous commits, we added a config flag to allow a user to control a multiplier on a threshold for whether to fail back an HTLC early if we're waiting on a downstream confirmation, depending on how much we'd save if we ended up successfully claimed on-chain. This commit adds the logic to calculate this threshold, and adds the check when failing HTLCs back from monitor events (although this won't do anything until we actually generate new monitor events for this situation in upcoming commits).
f35e75c
to
a18b940
Compare
Rebased because of conflicts, and added 3 new commits to the beginning to put this logic behind a config flag/threshold. Also simplified/added to tests a bit, and addressed the couple things about the grace period, debug asserts, and duplicate events (see other comments) |
Fail inbound HTLCs if they expire within a certain number of blocks from the current height. If we haven't seen the preimage for an HTLC by the time the previous hop's timeout expires, we've lost that HTLC, so we might as well fail it back instead of having our counterparty force-close the channel.
a18b940
to
e5a918d
Compare
|
||
let htlc_weight = htlc_success_tx_weight(self.context.get_channel_type()); | ||
let p2wpkh_weight = 31 * 4; | ||
let input_weight = 272; // Including witness, assuming it spends from a p2wpkh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI some of these are already defined in bump_transaction.rs
.
let total_weight = commitment_weight + htlc_weight + p2wpkh_weight + input_weight; | ||
let total_fee = feerate_per_kw as u64 * total_weight / 1000; | ||
total_fee | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh also, for now I’ve only calculated the total fee if we were to claim just the one HTLC tx, but we may also want to consider all the fees we'd pay if we were to claim other pending HTLCs (though it's not clear which ones we'll end up having to claim ourselves vs our counterparty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though it's not clear which ones we'll end up having to claim ourselves vs our counterparty
Ah because there could be HTLCs that expire much later than the HTLC we're attempting to fail back and the counterparty still has a chance to claim those onchain?
* Explicitly assert no hanging messages after off-chain actions * Get rid of manual user force-close for timeout * Connect blocks on nodes[0] to make sure they don't go on-chain * Add comment on syncing block height on nodes * Simplify and wrap especially long lines * Add more confirmations to onchain test cases * Increase feerate to hit fail back code path now that we have a threshold for HTLC value * Exepct new fail events now that we don't prevent duplicate fail events * Replace timeout buffer with grace period
e5a918d
to
597111b
Compare
Small change: changed the collection to keep track of already early-failed htlcs in the channel monitor from a vec to a hashset--at first I thought a hashset might have some more memory overhead, but after some further reading it doesn't seem like it really would, and either way it shouldn't make a significant difference--I figured I should just use the one that makes more sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash
let commitment_weight = if self.context.is_outbound() { | ||
let keys = self.context.build_holder_transaction_keys(self.context.cur_holder_commitment_transaction_number); | ||
let commitment_stats = self.context.build_commitment_transaction(self.context.cur_holder_commitment_transaction_number, &keys, true, true, logger); | ||
commitment_stats.total_fee_sat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be returning the weight here, not the fee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to compute the weight without re-building the transaction, you may need a few more consts than we have defined, and it'll need to depend on the channel type.
|
||
let htlc_weight = htlc_success_tx_weight(self.context.get_channel_type()); | ||
let p2wpkh_weight = 31 * 4; | ||
let input_weight = 272; // Including witness, assuming it spends from a p2wpkh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI some of these are already defined in bump_transaction.rs
.
let p2wpkh_weight = 31 * 4; | ||
let input_weight = 272; // Including witness, assuming it spends from a p2wpkh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These only apply with anchors, otherwise the HTLC success transaction has the same feerate as the commitment transaction.
let total_weight = commitment_weight + htlc_weight + p2wpkh_weight + input_weight; | ||
let total_fee = feerate_per_kw as u64 * total_weight / 1000; | ||
total_fee | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though it's not clear which ones we'll end up having to claim ourselves vs our counterparty
Ah because there could be HTLCs that expire much later than the HTLC we're attempting to fail back and the counterparty still has a chance to claim those onchain?
Superseded by #3556. |
Fail inbound HTLCs if they expire within a certain number of blocks from the current height. If we haven't seen the preimage for an HTLC by the time the previous hop's timeout expires, we've lost that inbound HTLC, so we might as well fail it back instead of having our counterparty force-close the channel.
In the occasion that we both 1) see our downstream timeout HTLC confirm and 2) we pass the height where we want to fail back before the upstream expires, we add a check to avoid duplicate events that would lead to failing back the same HTLC twice.
To do
LATENCY_GRACE_PERIOD_BLOCKS
Addresses #2275.