kvdb: add postgres #5366

joostjager · 2021-06-08T11:57:39Z

The Bottlepay team has been working on Postgres storage backend support in lnd.

Advantages of using postgres over the default bbolt storage backend include:

Postgres is widely used and has probably survived more battles than bbolt
Inspect data while lnd is running
On-the-fly database compaction (VACUUM)
Database replication
Usage of industry-standard tools to manage the stored data

However, in our opinion the greatest advantage of switching to Postgres is that it allows data structures to be migrated over to structured sql tables in a gradual way. The initial version will only be a single key-value table that holds all of the data that is currently in bbolt. Once on the Postgres platform, migrations can be carried out that move specific data to dedicated tables when the need arises.

Structured sql tables reduce the need for custom serialize/deserialize code in lnd itself, freeing up precious developer resources to work on less mundane logic, and also allow adding indexes and constraints to protect data integrity and improve performance.

One particular bottleneck in the current bbolt implementation is the global write lock. There can only be a single writer active at any time. Postgres offers elaborate locking models that hold great promise to reduce lock contention in high-performance scenarios.

Running

To run with Postgres, add the following command line flags. The specified database needs to be present, but tables will be created if they don't exist already.

lnd --db.backend=postgres --db.postgres.dsn=postgres://lnd:lnd@localhost:45432/lnd?sslmode=disable

Implementation

Thanks to the already abstracted kvdb interface, the change set can remain small and no application-level changes are needed. The key-value table has the following format:

id | key | value | parent_id | sequence

Rows representing a bucket have nil in the value column. Items reference their parent bucket via parent_id. The sequence column tracks the current sequence number for bucket rows.

Performance

We ran this branch on the Lightning benchmark rig that we released earlier. The benchmark application opens 10 channels between a pair of nodes and continuously executes keysend payments in 100 parallel threads.

Previous runs resulted in 35 tps for bbolt. With postgres, we measured 15 tps. This is a reduction that may or may not be relevant depending on how busy your node is. There is potential for considerable optimization still. Two main areas to look into are:

Global write lock: either move to structured sql tables with granular locking or switch over to Postgres' serializable locking model for the kv table (kvdb: add postgres #5366 (comment)).
Round-trips: network communication is slow and lnd sometimes needs many queries to reach certain data. By batching those queries, the number of round trips can be reduced greatly (kvdb: add postgres #5366 (comment)).

Update 2021-09-20: updated PR desc to reflect latest state of this PR and fresh performance numbers.

guggero · 2021-06-08T14:41:46Z

Awesome work, this is really cool! Wouldn't have imagined this could be possible with such a small diff.
This definitely gets a concept ACK from me 💯

Two thoughts after a quick scan of the code:

My main concern here is the one single large table. Did you think about dynamically creating named tables for each CreateTopLevelBucket() call? That way we'd get probably much better write performance for smaller tables (such as the channel state) if that doesn't contain all payment/forwarding history as well.
Integration testing should be as easy as spinning up a docker image running PostgreSQL from the scripts/itest_part.sh file (so each parallel tranche can have their own instance).

joostjager · 2021-06-08T15:12:40Z

My main concern here is the one single large table. Did you think about dynamically creating named tables for each CreateTopLevelBucket() call? That way we'd get probably much better write performance for smaller tables (such as the channel state) if that doesn't contain all payment/forwarding history as well.

Yes, I described that in the PR description as the 'hybrid approach'. Will be interesting to see what the actual gain is with that.

One hard choice will be to eventually drop bbolt and etcd support. Otherwise it still isn't possible to reap all the benefits. Interested to hear what the Lightning Labs stance is on this.

guggero · 2021-06-08T17:00:14Z

I described that in the PR description

Oh, sorry, I must've skipped that. I think that hybrid approach is probably worth looking into, especially performance wise. But having data spread out over multiple tables might also have other benefits.

yaslama · 2021-06-10T04:41:25Z

My main concern here is the one single large table. Did you think about dynamically creating named tables for each CreateTopLevelBucket() call? That way we'd get probably much better write performance for smaller tables (such as the channel state) if that doesn't contain all payment/forwarding history as well.

Indexes on Expressions can perhaps be used to reach the same performance as using one table per top level bucket.
I think it's better to clearly isolate DDL queries to initialization and not in the app itself.
Another advantage of using one table only to replace bbolt is to make clear that this is "legacy".
When I see the schema in https://github.com/ElementsProject/lightning/blob/76b8eb3afda4905dbeb059f88b2bbe994cbaa50a/wallet/db_postgres_sqlgen.c and compare it with the current bbolt structure, I really hope that one day lnd data will be as clean.

yaslama · 2021-06-10T04:45:05Z

Performance

We ran this branch on the Lightning benchmark rig that we released earlier. The benchmark application opens 10 channels between a pair of nodes and continuously executes keysend payments in 100 parallel threads.

Previous runs resulted in 35 tps for bbolt and 4 tps for etcd. With postgres, we measured 25 tps. To us this looks like an excellent result given the added overhead for network communication. To take your own measurement, run https://github.com/bottlepay/lightning-benchmark/tree/postgres with ./run.sh lnd-postgres.

This is an excellent result! Using sqlite (for small client) can perhaps reach the same performance level as bbolt.

joostjager · 2021-06-10T11:47:43Z

@guggero pushed commits that split the top level buckets into distinct tables. Ran the benchmark and observed that there is no additional performance increase with split tables.

I think I do favor this approach though, because when the data is already in separate tables it may be easier/faster to upgrade to a structured table. Not sure if it is a real advantage in practice.

This definitely gets a concept ACK from me

That is good to hear. We are willing to push forward with this PR, but is there sufficient review capacity available to get this landed in 0.14?

joostjager · 2021-06-14T13:31:00Z

Downside of split tables is that there are some nasty top-level buckets like chainHash || chanPoint (briefcase.go) that exceed the max table name length of 63 chars.

Some options:

Truncate table name and hope for no collisions
Hash table name and encode in a format that can store 32 bytes in no more than 63 bytes
Hash table name and truncate
Use special single table that stores all top-level buckets with too long keys. The other buckets each get their own table
Perform a migration first to move long top-level buckets into a sub-bucket

joostjager · 2021-06-16T12:35:59Z

Integration tests pass with postgres

joostjager · 2021-06-17T12:38:54Z

Also tried out this branch with sqlite: https://github.com/bottlepay/lnd/tree/sqlite in the single-table model.

Ran the usual benchmark on it and got a stable 18 tps.

Bbolt is undeniably faster at 35 tps, but I doubt that it matters much. In a hypothetical future where lnd users can choose between sqlite and postgres as their backend, power users will likely chose postgres anyway. How many tps does a phone need to handle?

joostjager · 2021-06-17T18:14:38Z

Tested pure go sqlite https://pkg.go.dev/modernc.org/sqlite - it works as well.

Performance it slightly below the C version of sqlite, 14 tps.

joostjager · 2021-06-28T12:27:33Z

Discussed a possible game plan with @guggero offline:

Steps:

Test performance of high volumes payments and graph operations with all data moved over to postgres (not just channel db). See if it holds up also then. Code: https://github.com/guggero/lnd/tree/lndinit-migrate-db. (@joostjager)
Create fully-remote pr based on branch above (@guggero)
Refactor postgres PR for the following database schema: (@joostjager)
- Split human-readable top level keys into separate tables
- Aggregate all non-human-readable top level keys (such as chainhash|chanpt) in a single sql table
- Prefix each table name with its namespace (channeldb, graphdb, macaroon, etc)
Also:
- Use the default golang sql infra (database/sql)
- Abstract backend-specifics behind an interface, use sqlite as the test case to test the abstraction.
Create a migration tool to move from bbolt over to postgres. (@guggero + @joostjager)

~~Timeline: All merged on Jul 23 at the latest.~~ (reminder to never put dates in PRs!)

joostjager · 2021-06-28T13:24:23Z

Ran the bottlepay benchmark with all data moved over to postgres (https://github.com/guggero/lnd/tree/lndinit-migrate-db) and got 22 tps. No surprise that moving walletdb to postgres has some impact on performance, because it is used heavily to retrieve private keys.

joostjager · 2021-06-29T08:04:55Z

Did some rough benchmarking on graph operations for bbolt and postgres.

Setup: lnd 0.13.0-beta connecting to yalls.org on testnet.

Initial graph download: Received 2413 channels from peer. This takes about 2 minutes on my machine. No noticeable difference between bbolt and postgres. I suppose that makes sense because network is the bottleneck here.

Pathfinding: Modified lnd to skip the local liquidity check. If a QueryRoutes command is then issued to a random node on the network (I used yalls.org), it will explore a large part of the graph before it concludes that there is no route. In the case of the test, lnd reports exploring 63 nodes and 774 edges. Here the difference between bbolt and postgres huge: 25 ms for bbolt vs 800 ms for postgres.

Looks unacceptable.

joostjager · 2021-06-29T14:38:11Z

For pathfinding, there doesn't seem to be a way around caching data to improve performance. Already with bbolt, pathfinding is pretty slow on low-end devices, so a cache would be an improvement for all backends.

~~Going to explore the option to provide a write-through cache on the kvdb level. A sort of adapter that wraps a backend with the same interface and caches (some) data in memory.~~

A write-through cache on the kvdb level is probably going to be difficult because it needs to support rolling back (large) transactions.

Perhaps there is a simple way to just optimize this part (similar to etcd key prefetching for example).

joostjager · 2021-06-30T08:17:49Z

Also ran a pathfinding test with postgres on mainnet with a larger graph (~50k channels). The 'path not found' test described above gave me a run time of 40s on my machine:

lnd-alice_1 | 2021-06-30 09:33:21.368 [DBG] CRTR: Pathfinding perf metrics: nodes=4831, edges=57105, time=40.880812351s

The same test with a bbolt instance:

lnd-alice_1 | 2021-06-30 09:31:51.674 [DBG] CRTR: Pathfinding perf metrics: nodes=4836, edges=57075, time=1.196043019s

Round about the same 30x factor as above.

joostjager · 2021-06-30T09:45:47Z

One simple way to fix this is to just retrieve the full graph for the database at the start of every pathfinding operation. An extreme version of prefetching. Pathfinding only uses ForEachNodeChannel. This method would need a second implementation that operates on the prefetched data.

At first this may sound bad, but one also must keep in mind that the pathfinding access pattern is very inefficient. It needs to retrieve keys from all over the database. Loading it all in memory in an efficient way at the start cuts out the repeated network and database overhead for all those lookups.

This prefetching would be optional and recommended for remote databases, so low-end devices (cpu, memory) with bbolt have nothing to do with it.

I did a benchmark on fetching the full graph (complete contents of the nodes, edges and edge-index buckets) on mainnet. This took 200 ms.

Doesn't seem unreasonable especially in relation to the typical overall payment latency on lightning. For a 'route not found' scenario it is even faster than bbolt (1.2s, see above).

A proper write-through cache for the graph is definitely the preferred solution. But if complications arise (in-memory graph has been an elusive goal so far) or there isn't enough engineering capacity available, this could be a simpler alternative to bridge the gap.

guggero

Did a high-level pass and left a few comments. Still a lot of work that needs to be done to get this into a mergeable state, but looks very promising!

kvdb/go.sum

kvdb/postgres/db.go

kvdb/postgres/readwrite_bucket.go

kvdb/postgres/readwrite_tx.go

lntest/itest/lnd_test_list_on_test.go

scripts/itest_part.sh

guggero · 2021-09-16T10:32:32Z

As discussed offline, I think using sub queries for implementing the prefetch in Postgres is a great idea. And seeing that a 4 level deep sub query doesn't even double the execution time but replaces the need for 3 round trips seems excellent.

Is the idea to add the prefetch right after merging this PR and #5640 and possibly getting it into 0.14.0 or more something for 0.14.1 as a performance boost after seeing things in action for a while?

joostjager · 2021-09-17T06:22:58Z

Is the idea to add the prefetch right after merging this PR and #5640 and possibly getting it into 0.14.0 or more something for 0.14.1 as a performance boost after seeing things in action for a while?

I'd love to, but it doesn't seem to be possible with the kvdb extension that is proposed in #5640. More details in this thread #5640 (comment)

bhandras · 2021-09-17T15:43:11Z

I am still undecided about adding a column for a hashed bucket id. Without it, it won't be easy to get a nested bucket value in a single roundtrip. This means that optimal performance cannot be achieved.

The direction in which #5640 (comment) goes is important too. If that results in an etcd-only interface, it will be more costly to implement a similar optimization for Postgres.

I think there's a small misunderstanding regarding #5640. Prefetch is optional and it's perfectly fine if the Postgres driver doesn't implement it. We can still add bucket.GetAll() as an additional API unlocking more performance gains for Postgres.

joostjager · 2021-09-17T15:55:58Z

I think there's a small misunderstanding regarding #5640. Prefetch is optional and it's perfectly fine if the Postgres driver doesn't implement it. We can still add bucket.GetAll() as an additional API unlocking more performance gains for Postgres.

It was clear from the start that it is optional and that the pg driver doesn't need to implement it. But the thing is that I do want to implement it to reach the full performance potential also on pg. Only I can't because the interface proposed in #5640 isn't compatible with server-side txes. GetAll would be an improvement, but still doesn't minimize round-trips.

What I would have done for prefetch is implement a clean batch interface that allows single round-trip queries and is compatible with both etcd and postgres.

joostjager · 2021-09-20T05:46:45Z

It looks like the Postgres itest is finally passing. One last modification is that I increased the remote db async payments timeout from 2 to 3 minutes.

guggero

Great to see this PR pass the itests! Excited to get this in, thanks for your work on this 🎉

Only a few nits left and needs a rebase, but otherwise LGTM!

docs/postgres.md

sample-lnd.conf

lntest/harness.go

Roasbeef · 2021-09-21T20:39:35Z

Coveralls is still down (the reason why that unit test is failing): https://status.coveralls.io/incidents/8zbjrwpb4frv

Roasbeef

LGTM 🦚

Roasbeef · 2021-09-21T20:45:36Z

kvdb/postgres/db.go

+	// cfg is the postgres connection config.
+	cfg *Config
+
+	// prefix is the table name prefix that is used to simulate namespaces.


Makes sense, had initially missed that!

joostjager · 2021-09-22T05:46:55Z

Thanks for your thoughts and reviews @guggero @bhandras @Roasbeef!

djkazic · 2021-09-29T15:57:35Z

Will the migration tool for etcd -> postgres be in a separate PR?

joostjager · 2021-09-29T15:58:57Z

The migration tool PR is here: #5561

Crypt-iQ added the database Related to the database/storage of LND label Jun 8, 2021

joostjager force-pushed the postgres branch from 41a5812 to 71836c6 Compare June 10, 2021 11:40

joostjager force-pushed the postgres branch 4 times, most recently from 9c41c38 to 7f550a0 Compare June 15, 2021 09:08

bhandras self-requested a review June 15, 2021 10:57

joostjager force-pushed the postgres branch 2 times, most recently from 706613b to 69cd6a5 Compare June 16, 2021 11:32

guggero self-requested a review June 25, 2021 18:44

Roasbeef added this to the v0.14.0 milestone Jul 1, 2021

guggero reviewed Jul 1, 2021

View reviewed changes

joostjager force-pushed the postgres branch 3 times, most recently from 394b065 to db5fdd8 Compare September 17, 2021 13:59

joostjager force-pushed the postgres branch 2 times, most recently from 60e0fc9 to eb0245d Compare September 19, 2021 20:49

guggero approved these changes Sep 20, 2021

View reviewed changes

docs/postgres.md Outdated Show resolved Hide resolved

sample-lnd.conf Outdated Show resolved Hide resolved

lntest/harness.go Outdated Show resolved Hide resolved

joostjager force-pushed the postgres branch 3 times, most recently from 2f0fadf to a112ccb Compare September 20, 2021 12:06

joostjager added 3 commits September 21, 2021 10:44

kvdb: add postgres

3eb80ca

lncfg: add postgres

ae6cf72

postgres: add itest

daeb96f

joostjager force-pushed the postgres branch from a112ccb to 191ed3c Compare September 21, 2021 08:44

joostjager force-pushed the postgres branch from d91dfed to daeb96f Compare September 21, 2021 13:54

Roasbeef approved these changes Sep 21, 2021

View reviewed changes

Roasbeef merged commit 7970ffc into lightningnetwork:master Sep 21, 2021

joostjager mentioned this pull request Dec 20, 2021

Postgres v13.3 or v14.1 backend failing with LND v0.14.1-beta #6103

Closed

joostjager mentioned this pull request Jan 18, 2022

add SQLite3 backend option #6176

Closed

twofaktor mentioned this pull request Apr 15, 2024

[NEW BONUS GUIDE + REESTRUCTURE] PostgreSQL + other related minibolt-guide/minibolt#93

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvdb: add postgres #5366

kvdb: add postgres #5366

joostjager commented Jun 8, 2021 •

edited

Loading

guggero commented Jun 8, 2021

joostjager commented Jun 8, 2021

guggero commented Jun 8, 2021

yaslama commented Jun 10, 2021

yaslama commented Jun 10, 2021

Performance

joostjager commented Jun 10, 2021 •

edited

Loading

joostjager commented Jun 14, 2021 •

edited

Loading

joostjager commented Jun 16, 2021

joostjager commented Jun 17, 2021

joostjager commented Jun 17, 2021

joostjager commented Jun 28, 2021 •

edited

Loading

joostjager commented Jun 28, 2021

joostjager commented Jun 29, 2021

joostjager commented Jun 29, 2021 •

edited

Loading

joostjager commented Jun 30, 2021 •

edited

Loading

joostjager commented Jun 30, 2021 •

edited

Loading

guggero left a comment

guggero commented Sep 16, 2021

joostjager commented Sep 17, 2021

bhandras commented Sep 17, 2021 •

edited

Loading

joostjager commented Sep 17, 2021 •

edited

Loading

joostjager commented Sep 20, 2021

guggero left a comment

Roasbeef commented Sep 21, 2021

Roasbeef left a comment

Roasbeef Sep 21, 2021

joostjager commented Sep 22, 2021

djkazic commented Sep 29, 2021

joostjager commented Sep 29, 2021

kvdb: add postgres #5366

kvdb: add postgres #5366

Conversation

joostjager commented Jun 8, 2021 • edited Loading

Running

Implementation

Performance

guggero commented Jun 8, 2021

joostjager commented Jun 8, 2021

guggero commented Jun 8, 2021

yaslama commented Jun 10, 2021

yaslama commented Jun 10, 2021

Performance

joostjager commented Jun 10, 2021 • edited Loading

joostjager commented Jun 14, 2021 • edited Loading

joostjager commented Jun 16, 2021

joostjager commented Jun 17, 2021

joostjager commented Jun 17, 2021

joostjager commented Jun 28, 2021 • edited Loading

joostjager commented Jun 28, 2021

joostjager commented Jun 29, 2021

joostjager commented Jun 29, 2021 • edited Loading

joostjager commented Jun 30, 2021 • edited Loading

joostjager commented Jun 30, 2021 • edited Loading

guggero left a comment

Choose a reason for hiding this comment

guggero commented Sep 16, 2021

joostjager commented Sep 17, 2021

bhandras commented Sep 17, 2021 • edited Loading

joostjager commented Sep 17, 2021 • edited Loading

joostjager commented Sep 20, 2021

guggero left a comment

Choose a reason for hiding this comment

Roasbeef commented Sep 21, 2021

Roasbeef left a comment

Choose a reason for hiding this comment

Roasbeef Sep 21, 2021

Choose a reason for hiding this comment

joostjager commented Sep 22, 2021

djkazic commented Sep 29, 2021

joostjager commented Sep 29, 2021

joostjager commented Jun 8, 2021 •

edited

Loading

joostjager commented Jun 10, 2021 •

edited

Loading

joostjager commented Jun 14, 2021 •

edited

Loading

joostjager commented Jun 28, 2021 •

edited

Loading

joostjager commented Jun 29, 2021 •

edited

Loading

joostjager commented Jun 30, 2021 •

edited

Loading

joostjager commented Jun 30, 2021 •

edited

Loading

bhandras commented Sep 17, 2021 •

edited

Loading

joostjager commented Sep 17, 2021 •

edited

Loading