Skip to content

Fail HTLC backwards before upstream claims on-chain #2457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alecchendev
Copy link
Contributor

@alecchendev alecchendev commented Jul 27, 2023

Fail inbound HTLCs if they expire within a certain number of blocks from the current height. If we haven't seen the preimage for an HTLC by the time the previous hop's timeout expires, we've lost that inbound HTLC, so we might as well fail it back instead of having our counterparty force-close the channel.

In the occasion that we both 1) see our downstream timeout HTLC confirm and 2) we pass the height where we want to fail back before the upstream expires, we add a check to avoid duplicate events that would lead to failing back the same HTLC twice.

To do

  • Handle what happens if we receive a update_fulfill_htlc/update_fail_htlc message after failing back ourselves
  • Look at other implementations if they have grace period on force-closing via timeouts -> if so, remove buffer, if not, replace buffer with LATENCY_GRACE_PERIOD_BLOCKS
  • Add config knob or htlc amount threshold for this?

Addresses #2275.

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2023

Codecov Report

Attention: Patch coverage is 94.47236% with 11 lines in your changes missing coverage. Please review.

Project coverage is 91.38%. Comparing base (bbe20c3) to head (597111b).
Report is 2796 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channel.rs 82.14% 5 Missing ⚠️
lightning/src/ln/channelmanager.rs 89.36% 5 Missing ⚠️
lightning/src/chain/channelmonitor.rs 97.05% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2457      +/-   ##
==========================================
+ Coverage   90.54%   91.38%   +0.83%     
==========================================
  Files         109      111       +2     
  Lines       57188    68070   +10882     
  Branches    57188    68070   +10882     
==========================================
+ Hits        51780    62203   +10423     
- Misses       5408     5867     +459     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from 18617bb to 861baab Compare August 3, 2023 21:15
@alecchendev
Copy link
Contributor Author

Added one more to do in the PR description, will make a comment when I complete it

@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from 861baab to 011a41a Compare August 8, 2023 19:26
@alecchendev
Copy link
Contributor Author

Added last test cases to handle getting update_fail/fulfill_htlc after failing back. Once we get to a point where we fail back the backwards channel, we've already force closed our forwards channel with our timeout, so we just handle these messages as if the channel doesn't exist, which makes sense to me.

@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from 011a41a to df0bdce Compare August 10, 2023 20:51
@alecchendev
Copy link
Contributor Author

Squashed + rebased

Comment on lines 3333 to 3344
let duplicate_event = self.pending_monitor_events.iter().any(
|update| if let &MonitorEvent::HTLCEvent(ref upd) = update {
upd.source == *source
} else { false });
if !duplicate_event {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem enough? If we process events before the next block the duplicate event will be gone and we'll queue another one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't matter though as long as we can handle duplicate HTLCUpdate events at the ChannelManager level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I added this before I handled the duplicate HTLCUpdate events in ChannelManager - I guess maybe I should get rid of it since it's not really doing anything meaningful now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should. We should also limit the amount of times these MonitorEvents will be queued somehow, as it seems we'll just continue to do so endlessly on every new block?

Copy link
Contributor Author

@alecchendev alecchendev Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should.

Similar case to #2457 (comment), am I just taking debug asserts too seriously..? There's another debug assert that just prevents us from failing the same HTLC twice in one process-pending-htlc cycle (which was what this code block covered), so should I just remove it since we actually handle them fine?

We should also limit the amount of times these MonitorEvents will be queued somehow, as it seems we'll just continue to do so endlessly on every new block?

Oof yea. Added just a set to keep track of HTLC ids we've failed already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of the logic preventing generating a duplicate HTLCEvent (if we saw an onchain claim after doing our early fail back) especially now that we may not always fail back (based on the new config flag/threshold) based on this event, if we see the downstream claim, we should push that event to the channel manager.

We should also limit the amount of times these MonitorEvents will be queued somehow, as it seems we'll just continue to do so endlessly on every new block?

Added just a simple vec to keep track of HTLC ids we've failed already.

Though, now that whether we fail back an HTLC based on the new threshold could change with the feerate, should we still prevent regenerating these events on new blocks?

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I definitely agree we should do this, I do wonder if we should gate it on HTLCs only being below some threshold, or only with some config knob. Its certainly surprising to users that we'd do this.

@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from df0bdce to f35e75c Compare August 16, 2023 04:01
@alecchendev
Copy link
Contributor Author

alecchendev commented Aug 16, 2023

Just pushed because I was sitting on some changes waiting till I could address everything - just going to add todos in the PR description and get to them soon. Also ignore 2aad324 pending me possibly just removing the debug asserts

@alecchendev
Copy link
Contributor Author

While I definitely agree we should do this, I do wonder if we should gate it on HTLCs only being below some threshold, or only with some config knob. Its certainly surprising to users that we'd do this.

I was initially thinking we would always just want to keep the channel open because if we've reached our upstream timeout, we've lost that HTLC, and who knows how fast/slow their tx will confirm. But then again...we reached our own timeout downstream and we don't consider that our downstream counterparty has lost that HTLC just yet.

And if we do want to risk the channel for any HTLC, we probably only want to for larger ones, and not for smaller, so I'm leaning towards an amount threshold. Although determining a value for which a channel is worth closing depends on at least feerate (if we're the funder, how much we're paying on the commitment, and then how much it might be to open a new channel for this liquidity) so a fixed amount threshold might not be great?

@TheBlueMatt
Copy link
Collaborator

And if we do want to risk the channel for any HTLC, we probably only want to for larger ones, and not for smaller, so I'm leaning towards an amount threshold.

Heh, duh, that's what I meant.

Although determining a value for which a channel is worth closing depends on at least feerate (if we're the funder, how much we're paying on the commitment, and then how much it might be to open a new channel for this liquidity) so a fixed amount threshold might not be great?

Yea, I mean we could def poll the fee estimator when deciding, we'd make it a multiple on the high priority feerate and default it to the size of a commitment transaction plus one HTLC plus an HTLC tx claim with a P2WPKH output used as the anchor?

In previous commits, we added a config flag to allow a user to control a
multiplier on a threshold for whether to fail back an HTLC early if
we're waiting on a downstream confirmation, depending on how much we'd
save if we ended up successfully claimed on-chain. This commit adds the
logic to calculate this threshold, and adds the check when failing HTLCs
back from monitor events (although this won't do anything until we
actually generate new monitor events for this situation in upcoming
commits).
@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from f35e75c to a18b940 Compare August 28, 2023 17:20
@alecchendev
Copy link
Contributor Author

alecchendev commented Aug 28, 2023

Rebased because of conflicts, and added 3 new commits to the beginning to put this logic behind a config flag/threshold. Also simplified/added to tests a bit, and addressed the couple things about the grace period, debug asserts, and duplicate events (see other comments)

Fail inbound HTLCs if they expire within a certain number of blocks from
the current height. If we haven't seen the preimage for an HTLC by the
time the previous hop's timeout expires, we've lost that HTLC, so we
might as well fail it back instead of having our counterparty
force-close the channel.
@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from a18b940 to e5a918d Compare August 28, 2023 17:44

let htlc_weight = htlc_success_tx_weight(self.context.get_channel_type());
let p2wpkh_weight = 31 * 4;
let input_weight = 272; // Including witness, assuming it spends from a p2wpkh
Copy link
Contributor Author

@alecchendev alecchendev Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I got this number from this post. Edit: oh it's in the spec too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI some of these are already defined in bump_transaction.rs.

Comment on lines +2118 to +2121
let total_weight = commitment_weight + htlc_weight + p2wpkh_weight + input_weight;
let total_fee = feerate_per_kw as u64 * total_weight / 1000;
total_fee
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also, for now I’ve only calculated the total fee if we were to claim just the one HTLC tx, but we may also want to consider all the fees we'd pay if we were to claim other pending HTLCs (though it's not clear which ones we'll end up having to claim ourselves vs our counterparty)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though it's not clear which ones we'll end up having to claim ourselves vs our counterparty

Ah because there could be HTLCs that expire much later than the HTLC we're attempting to fail back and the counterparty still has a chance to claim those onchain?

* Explicitly assert no hanging messages after off-chain actions
* Get rid of manual user force-close for timeout
* Connect blocks on nodes[0] to make sure they don't go on-chain
* Add comment on syncing block height on nodes
* Simplify and wrap especially long lines
* Add more confirmations to onchain test cases
* Increase feerate to hit fail back code path now that we have
a threshold for HTLC value
* Exepct new fail events now that we don't prevent duplicate fail
events
* Replace timeout buffer with grace period
@alecchendev alecchendev force-pushed the 2023-07-fail-upstream-before-claim branch from e5a918d to 597111b Compare August 31, 2023 04:21
@alecchendev
Copy link
Contributor Author

Small change: changed the collection to keep track of already early-failed htlcs in the channel monitor from a vec to a hashset--at first I thought a hashset might have some more memory overhead, but after some further reading it doesn't seem like it really would, and either way it shouldn't make a significant difference--I figured I should just use the one that makes more sense

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to squash

let commitment_weight = if self.context.is_outbound() {
let keys = self.context.build_holder_transaction_keys(self.context.cur_holder_commitment_transaction_number);
let commitment_stats = self.context.build_commitment_transaction(self.context.cur_holder_commitment_transaction_number, &keys, true, true, logger);
commitment_stats.total_fee_sat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be returning the weight here, not the fee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to compute the weight without re-building the transaction, you may need a few more consts than we have defined, and it'll need to depend on the channel type.


let htlc_weight = htlc_success_tx_weight(self.context.get_channel_type());
let p2wpkh_weight = 31 * 4;
let input_weight = 272; // Including witness, assuming it spends from a p2wpkh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI some of these are already defined in bump_transaction.rs.

Comment on lines +2115 to +2116
let p2wpkh_weight = 31 * 4;
let input_weight = 272; // Including witness, assuming it spends from a p2wpkh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These only apply with anchors, otherwise the HTLC success transaction has the same feerate as the commitment transaction.

Comment on lines +2118 to +2121
let total_weight = commitment_weight + htlc_weight + p2wpkh_weight + input_weight;
let total_fee = feerate_per_kw as u64 * total_weight / 1000;
total_fee
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though it's not clear which ones we'll end up having to claim ourselves vs our counterparty

Ah because there could be HTLCs that expire much later than the HTLC we're attempting to fail back and the counterparty still has a chance to claim those onchain?

@TheBlueMatt
Copy link
Collaborator

Superseded by #3556.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants