multi: allow remote deletion of local payment attempt results #9289

calvinrzachman · 2024-11-20T20:22:21Z

Change Description

The (lnd) ChannelRouter cleans up the results store maintained by the HTLCSwitch when it restarts via the CleanStore method. Any network results stored for attempts that are no longer IN-FLIGHT are removed from disk. This presents some difficulty when attempting to deploy the ChannelRouter in a separate process from the HTLCSwitch and allowing communication via RPC. Specifically, we may encounter the following scenario:

The remote ChannelRouter dispatches a payment attempt (HTLC) via some remote lnd backend.
The payment is successfully forwarded to the network so the HTLC is in-flight from both router and lnd perspectives.
The remote ChannelRouter goes down. This means that there is no go-routine waiting for the attempt result.
The lnd instance restarts AFTER receiving the pre-image (or failure result). This meets the condition under which we’ll clean the result from the store on (lnd) ChannelRouter startup.
The remote ChannelRouter comes back online. It restarts a go-routine to track the payment. The payment is considered settled by the recipient and lnd backend, but from the perspective of the ChannelRouter the payment is failed/in-flight!

The issue is that we're relying on state kept by an external entity (lnd) to drive state transitions about payment status within the ChannelRouter and that external entity has automated clean up of that state (at least with the way SendOnion RPC works now). There is a slippery scenario in which the lnd instance hits the logic to automate the cleanup of state needed by ChannelRouter to track the payment to completion before the ChannelRouter has actually read it.

In the same way that lnd does not run CleanStore while payments are being processed, a remote ChannelRouter cannot allow lnd to run CleanStore while payments are being processed. There must be synchronization between ChannelRouter and lnd with respect to the maintenance of state in this store.

Solution

For this we can allow lnd to disable automatic cleanup of the Switch result store and add a switchrpc sub-server which supports remote deletion from this store. Then the remote ChannelRouter can clean the result stores of its switches when it restarts by providing an implementation of CleanStore which leverages these RPCs.

A DeletePaymentResult RPC to actually delete the relevant information from the Switch store directly. If we eventually allow for the wholesale removal of the ChannelRouter from an lnd deployment, we will need a way to delete information from the Switch’s store remotely.

NOTE: This has an advantage over adding a "tracked remotely" flag to network results and skipping the deletion of any result with such a flag in that I think that requires a DB migration, where-as suspending automated local deletion in favor of remote deletion via RPC does not.

Will probably relocate SendOnion/TrackOnion rpcs to switchrpc sub-server.

Steps to Test

make itest icase=switch_store_rpc

Questions

Would it be better to try to transport the toKeep style map which specifies the list of results to keep since the list of in-flight payments is likely to be shorter than the list of completed payments. This would mean that, instead of multiple calls to DeleteAttemptResult(s), a single call would be delete multiple results and could probably be named CleanResultStore.

Allow node configuration to specify whether to treat HTLC attempts in the normal way or as being tracked by a remote entity. This has the practical effect of disabling the automatic cleanup of network result state within the Switch. Instead, state deletion must be completed via RPC. This is an all or nothing designation until we can determine whether it can be set on a per-HTLC attempt basis.

The SwitchRPC server will be hidden behind a build tag.

This will faciliate the coordinated deletion of results from local payment attempts in scenarios in which the ChannelRouter runs remotely from the Switch.

Add FetchAttemptResults and DeleteAttemptResult to harness.

coderabbitai · 2024-11-20T20:22:29Z

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

calvinrzachman added 8 commits November 20, 2024 12:38

switchrpc: add and implement switchrpc server

582611d

The SwitchRPC server will be hidden behind a build tag.

switchrpc: add new FetchAttemptResults rpc proto

ee0a26c

switchrpc: add new DeleteAttemptResult rpc proto

6a04581

This will faciliate the coordinated deletion of results from local payment attempts in scenarios in which the ChannelRouter runs remotely from the Switch.

switchrpc: add DeleteAttemptResult rpc

4187df9

switchrpc: add FetchAttemptResults rpc

8980016

lntest: add switchrpc methods

f413265

Add FetchAttemptResults and DeleteAttemptResult to harness.

itest: add test for new switchrpc methods

1729c28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi: allow remote deletion of local payment attempt results #9289

multi: allow remote deletion of local payment attempt results #9289

calvinrzachman commented Nov 20, 2024

coderabbitai bot commented Nov 20, 2024

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

multi: allow remote deletion of local payment attempt results #9289

Are you sure you want to change the base?

multi: allow remote deletion of local payment attempt results #9289

Conversation

calvinrzachman commented Nov 20, 2024

Change Description

Solution

Steps to Test

Questions

coderabbitai bot commented Nov 20, 2024

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

CodeRabbit Configuration File (`.coderabbit.yaml`)