-
Notifications
You must be signed in to change notification settings - Fork 409
Spurious Forwarding Failures in Async Monitor Update Clients #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Taking a look at #756 and wondering about
Could you clarify what implications? (and is "a new case" = the case where we now only return |
Basically, any time we place something in the holding cell, we have to make sure we call free_holding_cell when the state transitions away from whatever the requirement was that resulted in it going into the holding cell. You can see this in #756 where, once we stop dropping the HTLCs going into the holding cell we now have to make sure we free_holding_cell when we get a channel_reestablish, as otherwise we have stuff sitting in the holding cell forever. It may be that after 756 all the relevant cases are handled, but I haven't done a careful scan. |
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
Previously, if we get a temporary monitor update failure while there were HTLCs pending forwarding in the holding cell, we'd clear them and fail them all backwards. This makes sense if temporary failures are rare, but in an async environment, temporary monitor update failures may be the normal case. In such a world, this results in potentially a lot of spurious HTLC forwarding failures (which is the topic of lightningdevkit#661).
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. XXX Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this much more trivially correct - instead of having to ensure that we always have a matching holding cell free any time we add something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this much more trivially correct - instead of having to ensure that we always have a matching holding cell free any time we add something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this much more trivially correct - instead of having to ensure that we always have a matching holding cell free any time we add something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this much more trivially correct - instead of having to ensure that we always have a matching holding cell free any time we add something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we almost certainly want `Channel::is_live()` to include channels which are simply pending a monitor update, as some clients may update monitors asynchronously, thus any rejection of HTLCs based on a monitor update still pending causing a race condition. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this much more trivially correct - instead of having to ensure that we always have a matching holding cell free any time we add something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we should consider a channel `is_live()` when they are pending a monitor update. Some clients may update monitors asynchronously, thus we may simply be waiting a short duration for a monitor update to complete, and shouldn't fail all forwarding HTLCs during that time. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this change much more trivially correct - instead of having to ensure that we always free the holding cell when a channel becomes live again after adding something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we should consider a channel `is_live()` when they are pending a monitor update. Some clients may update monitors asynchronously, thus we may simply be waiting a short duration for a monitor update to complete, and shouldn't fail all forwarding HTLCs during that time. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this change much more trivially correct - instead of having to ensure that we always free the holding cell when a channel becomes live again after adding something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.
We use
Channel::is_live()
for a few things that imply "should we consider this channel available for forwarding HTLCs and sending payments", which is great, except it implies races for clients which use async monitor updates. Such clients will always return a TemporaryFailure on monitor updates, leaving the channel inChannelState::MonitorUpdateFailed
until the monitor updates completes. This implies!is_live()
which means such clients will refuse to send or forward HTLCs during monitor updates, which they likely should not. The likely fix would be to only!is_live()
a channel if the monitor updating has been running for some time without completion, but this probably has implications for the channel state machine around placing such HTLCs in the holding cell in a new case.The text was updated successfully, but these errors were encountered: