Skip to content

Spurious Forwarding Failures in Async Monitor Update Clients #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TheBlueMatt opened this issue Jul 30, 2020 · 2 comments · Fixed by #954
Closed

Spurious Forwarding Failures in Async Monitor Update Clients #661

TheBlueMatt opened this issue Jul 30, 2020 · 2 comments · Fixed by #954

Comments

@TheBlueMatt
Copy link
Collaborator

We use Channel::is_live() for a few things that imply "should we consider this channel available for forwarding HTLCs and sending payments", which is great, except it implies races for clients which use async monitor updates. Such clients will always return a TemporaryFailure on monitor updates, leaving the channel in ChannelState::MonitorUpdateFailed until the monitor updates completes. This implies !is_live() which means such clients will refuse to send or forward HTLCs during monitor updates, which they likely should not. The likely fix would be to only !is_live() a channel if the monitor updating has been running for some time without completion, but this probably has implications for the channel state machine around placing such HTLCs in the holding cell in a new case.

@valentinewallace
Copy link
Contributor

Taking a look at #756 and wondering about

but this probably has implications for the channel state machine around placing such HTLCs in the holding cell in a new case.

Could you clarify what implications? (and is "a new case" = the case where we now only return !is_live() if updating has run some time without completion?)

@TheBlueMatt
Copy link
Collaborator Author

Could you clarify what implications? (and is "a new case" = the case where we now only return !is_live() if updating has run some time without completion?)

Basically, any time we place something in the holding cell, we have to make sure we call free_holding_cell when the state transitions away from whatever the requirement was that resulted in it going into the holding cell. You can see this in #756 where, once we stop dropping the HTLCs going into the holding cell we now have to make sure we free_holding_cell when we get a channel_reestablish, as otherwise we have stuff sitting in the holding cell forever. It may be that after 756 all the relevant cases are handled, but I haven't done a careful scan.

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Dec 15, 2020
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Dec 15, 2020
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jan 27, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Feb 9, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Feb 21, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Mar 1, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Mar 18, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Mar 25, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Mar 26, 2021
Previously, if we get a temporary monitor update failure while
there were HTLCs pending forwarding in the holding cell, we'd clear
them and fail them all backwards. This makes sense if temporary
failures are rare, but in an async environment, temporary monitor
update failures may be the normal case. In such a world, this
results in potentially a lot of spurious HTLC forwarding failures
(which is the topic of lightningdevkit#661).
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 16, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
`Channel::is_live()` to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

XXX

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 17, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
`Channel::is_live()` to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 17, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
`Channel::is_live()` to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 24, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
`Channel::is_live()` to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 29, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
`Channel::is_live()` to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 30, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
`Channel::is_live()` to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 30, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we should consider a channel `is_live()` when
they are pending a monitor update. Some clients may update monitors
asynchronously, thus we may simply be waiting a short duration for a
monitor update to complete, and shouldn't fail all forwarding HTLCs
during that time.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this change much more trivially
correct - instead of having to ensure that we always free the holding
cell when a channel becomes live again after adding something to the
holding cell, we can simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this issue Jun 30, 2021
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we should consider a channel `is_live()` when
they are pending a monitor update. Some clients may update monitors
asynchronously, thus we may simply be waiting a short duration for a
monitor update to complete, and shouldn't fail all forwarding HTLCs
during that time.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this change much more trivially
correct - instead of having to ensure that we always free the holding
cell when a channel becomes live again after adding something to the
holding cell, we can simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants