-
Notifications
You must be signed in to change notification settings - Fork 116
Parallelize init further #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
TheBlueMatt
wants to merge
7
commits into
lightningdevkit:main
Choose a base branch
from
TheBlueMatt:2026-01-parallel-monitors
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Parallelize init further #747
TheBlueMatt
wants to merge
7
commits into
lightningdevkit:main
from
TheBlueMatt:2026-01-parallel-monitors
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`LiquiditySource` takes a reference to our `PeerManager` but the `PeerManager` holds an indirect reference to the `LiquiditySource`. As a result, after our `Node` instance is `stop`ped and the `Node` `drop`ped, much of the node's memory will stick around, including the `NetworkGraph`. Here we fix this issue by using `Weak` pointers, though note that there is another issue caused by LDK's gossip validation API.
In added logic to use the HRN resolver from `bitcoin-payment-instructions`, we created a circular `Arc` reference - the `LDKOnionMessageDNSSECHrnResolver` is used as a handler for the `OnionMessenger` but we also set a post-queue-action which holds a reference to the `PeerManager`. As a result, after our `Node` instance is `stop`ped and the `Node` `drop`ped, much of the node's memory will stick around, including the `NetworkGraph`. Here we fix this issue by using `Weak` pointers.
LDK's gossip validation API basically forced us to have a circular `Arc` reference, leading to memory leaks after `drop`ping an instance of `Node`. This is fixed upstream in LDK PR #4294 which we update to here.
Due to two circular `Arc` references, after `stop`ping and `drop`ping the `Node` instance the bulk of ldk-node's memory (in the form of the `NetworkGraph`) would hang around. Here we add a test for this in our integration tests, checking if the `NetworkGraph` (as a proxy for other objects held in reference by the `PeerManager`) hangs around after `Node`s are `drop`ped.
Upstream LDK added the ability to read `ChannelMonitor`s from storage in parallel, which we switch to here.
Since I was editing the init logic anyway I couldn't resist going ahead and parallelizing various read calls. Since we added support for an async `KVStore` in LDK 0.2/ldk-node 0.7, we can now practically do initialization reads in parallel. Thus, rather than making a long series of read calls in `build`, we use `tokio::join` to reduce the number of round-trips to our backing store, which should be a very large win for initialization cost on those using remote storage (e.g. VSS). Sadly we can't trivially do all our reads in one go, we need the payment history to initialize the BDK wallet, which is used in the `Walet` object which is referenced in our `KeysManager`. Thus we first read the payment store and node metrics before moving on. Then, we need a reference to the `NetworkGraph` when we build the scorer. While we could/eventually should move to reading the *bytes* for the scorer while reading the graph and only building the scorer later, that's a larger refactor we leave for later. In the end, we end up with: * 1 round-trip to load the payment history and node metrics, * 2 round-trips to load ChannelMonitors and NetworkGraph (where there's an internal extra round-trip after listing the monitor updates for a monitor), * 1 round-trip to validate bitcoind RPC/REST access for those using bitcoind as a chain source, * 1 round-trip to load various smaller LDK and ldk-node objects, * and 1 additional round-trip to drop the rgs snapshot timestamp for nodes using P2P network gossip syncing for a total of 4 round-trips in the common case and 6 for nodes using less common chain source and gossip sync sources. We then have additional round-trips to our storage and chain source during node start, but those are in many cases already async.
|
I've assigned @tnull as a reviewer! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #736, this uses the freshly-merged lightningdevkit/rust-lightning#4147 to parallelize
ChannelMonitorloading, and while we're here because I couldn't avoid taking the fun part (sorry @tnull) I went ahead andtokio::joined some additional reading during init.