Spurious Forwarding Failures in Async Monitor Update Clients #661

TheBlueMatt · 2020-07-30T16:04:44Z

We use Channel::is_live() for a few things that imply "should we consider this channel available for forwarding HTLCs and sending payments", which is great, except it implies races for clients which use async monitor updates. Such clients will always return a TemporaryFailure on monitor updates, leaving the channel in ChannelState::MonitorUpdateFailed until the monitor updates completes. This implies !is_live() which means such clients will refuse to send or forward HTLCs during monitor updates, which they likely should not. The likely fix would be to only !is_live() a channel if the monitor updating has been running for some time without completion, but this probably has implications for the channel state machine around placing such HTLCs in the holding cell in a new case.

The text was updated successfully, but these errors were encountered:

…ightningdevkit#661

valentinewallace · 2020-11-28T20:47:45Z

Taking a look at #756 and wondering about

but this probably has implications for the channel state machine around placing such HTLCs in the holding cell in a new case.

Could you clarify what implications? (and is "a new case" = the case where we now only return !is_live() if updating has run some time without completion?)

TheBlueMatt · 2020-11-29T19:28:55Z

Could you clarify what implications? (and is "a new case" = the case where we now only return !is_live() if updating has run some time without completion?)

Basically, any time we place something in the holding cell, we have to make sure we call free_holding_cell when the state transitions away from whatever the requirement was that resulted in it going into the holding cell. You can see this in #756 where, once we stop dropping the HTLCs going into the holding cell we now have to make sure we free_holding_cell when we get a channel_reestablish, as otherwise we have stuff sitting in the holding cell forever. It may be that after 756 all the relevant cases are handled, but I haven't done a careful scan.

Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).

We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. XXX Fixes lightningdevkit#661.

We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this much more trivially correct - instead of having to ensure that we always have a matching holding cell free any time we add something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.

We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we should consider a channel `is_live()` when they are pending a monitor update. Some clients may update monitors asynchronously, thus we may simply be waiting a short duration for a monitor update to complete, and shouldn't fail all forwarding HTLCs during that time. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this change much more trivially correct - instead of having to ensure that we always free the holding cell when a channel becomes live again after adding something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.

TheBlueMatt mentioned this issue Jul 30, 2020

Fail back HTLCs that fail to be freed from the holding cell #640

Merged

TheBlueMatt mentioned this issue Nov 20, 2020

Fail holding-cell AddHTLCs on Channel deser to match disconnection #754

Closed

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Nov 20, 2020

Stop failing back on monitor udpate fails, moving towards addressing l…

3f2decb

…ightningdevkit#661

TheBlueMatt mentioned this issue Nov 20, 2020

Clean up and more liberally free holding cell HTLCs #756

Closed

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Nov 23, 2020

Stop failing back on monitor udpate fails, moving towards addressing l…

cee6221

…ightningdevkit#661

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Nov 24, 2020

Stop failing back on monitor udpate fails, moving towards addressing l…

54a601c

…ightningdevkit#661

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Nov 24, 2020

Stop failing back on monitor udpate fails, moving towards addressing l…

03d78ed

…ightningdevkit#661

TheBlueMatt mentioned this issue Jun 17, 2021

Consider channels "live" even if they are awaiting a monitor update #954

Merged

TheBlueMatt closed this as completed in #954 Jul 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spurious Forwarding Failures in Async Monitor Update Clients #661

Spurious Forwarding Failures in Async Monitor Update Clients #661

TheBlueMatt commented Jul 30, 2020

valentinewallace commented Nov 28, 2020

Uh oh!

TheBlueMatt commented Nov 29, 2020

Uh oh!

Spurious Forwarding Failures in Async Monitor Update Clients #661

Spurious Forwarding Failures in Async Monitor Update Clients #661

Comments

TheBlueMatt commented Jul 30, 2020

valentinewallace commented Nov 28, 2020

Uh oh!

TheBlueMatt commented Nov 29, 2020

Uh oh!