-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvdb: add postgres #5366
kvdb: add postgres #5366
Conversation
Awesome work, this is really cool! Wouldn't have imagined this could be possible with such a small diff. Two thoughts after a quick scan of the code:
|
Yes, I described that in the PR description as the 'hybrid approach'. Will be interesting to see what the actual gain is with that. One hard choice will be to eventually drop bbolt and etcd support. Otherwise it still isn't possible to reap all the benefits. Interested to hear what the Lightning Labs stance is on this. |
Oh, sorry, I must've skipped that. I think that hybrid approach is probably worth looking into, especially performance wise. But having data spread out over multiple tables might also have other benefits. |
Indexes on Expressions can perhaps be used to reach the same performance as using one table per top level bucket. |
This is an excellent result! Using sqlite (for small client) can perhaps reach the same performance level as bbolt. |
@guggero pushed commits that split the top level buckets into distinct tables. Ran the benchmark and observed that there is no additional performance increase with split tables. I think I do favor this approach though, because when the data is already in separate tables it may be easier/faster to upgrade to a structured table. Not sure if it is a real advantage in practice.
That is good to hear. We are willing to push forward with this PR, but is there sufficient review capacity available to get this landed in 0.14? |
Downside of split tables is that there are some nasty top-level buckets like Some options:
|
9c41c38
to
7f550a0
Compare
706613b
to
69cd6a5
Compare
Integration tests pass with postgres |
Also tried out this branch with sqlite: https://github.com/bottlepay/lnd/tree/sqlite in the single-table model. Ran the usual benchmark on it and got a stable 18 tps. Bbolt is undeniably faster at 35 tps, but I doubt that it matters much. In a hypothetical future where lnd users can choose between sqlite and postgres as their backend, power users will likely chose postgres anyway. How many tps does a phone need to handle? |
Tested pure go sqlite https://pkg.go.dev/modernc.org/sqlite - it works as well. Performance it slightly below the C version of sqlite, 14 tps. |
Discussed a possible game plan with @guggero offline: Steps:
|
Ran the bottlepay benchmark with all data moved over to postgres (https://github.com/guggero/lnd/tree/lndinit-migrate-db) and got 22 tps. No surprise that moving walletdb to postgres has some impact on performance, because it is used heavily to retrieve private keys. |
Did some rough benchmarking on graph operations for bbolt and postgres. Setup: lnd 0.13.0-beta connecting to yalls.org on testnet. Initial graph download: Received 2413 channels from peer. This takes about 2 minutes on my machine. No noticeable difference between bbolt and postgres. I suppose that makes sense because network is the bottleneck here. Pathfinding: Modified lnd to skip the local liquidity check. If a Looks unacceptable. |
For pathfinding, there doesn't seem to be a way around caching data to improve performance. Already with bbolt, pathfinding is pretty slow on low-end devices, so a cache would be an improvement for all backends.
A write-through cache on the kvdb level is probably going to be difficult because it needs to support rolling back (large) transactions. Perhaps there is a simple way to just optimize this part (similar to etcd key prefetching for example). |
Also ran a pathfinding test with postgres on mainnet with a larger graph (~50k channels). The 'path not found' test described above gave me a run time of 40s on my machine:
The same test with a bbolt instance:
Round about the same 30x factor as above. |
One simple way to fix this is to just retrieve the full graph for the database at the start of every pathfinding operation. An extreme version of prefetching. Pathfinding only uses At first this may sound bad, but one also must keep in mind that the pathfinding access pattern is very inefficient. It needs to retrieve keys from all over the database. Loading it all in memory in an efficient way at the start cuts out the repeated network and database overhead for all those lookups. This prefetching would be optional and recommended for remote databases, so low-end devices (cpu, memory) with bbolt have nothing to do with it. I did a benchmark on fetching the full graph (complete contents of the Doesn't seem unreasonable especially in relation to the typical overall payment latency on lightning. For a 'route not found' scenario it is even faster than bbolt (1.2s, see above). A proper write-through cache for the graph is definitely the preferred solution. But if complications arise (in-memory graph has been an elusive goal so far) or there isn't enough engineering capacity available, this could be a simpler alternative to bridge the gap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a high-level pass and left a few comments. Still a lot of work that needs to be done to get this into a mergeable state, but looks very promising!
As discussed offline, I think using sub queries for implementing the prefetch in Postgres is a great idea. And seeing that a 4 level deep sub query doesn't even double the execution time but replaces the need for 3 round trips seems excellent. Is the idea to add the prefetch right after merging this PR and #5640 and possibly getting it into 0.14.0 or more something for 0.14.1 as a performance boost after seeing things in action for a while? |
I'd love to, but it doesn't seem to be possible with the kvdb extension that is proposed in #5640. More details in this thread #5640 (comment) |
394b065
to
db5fdd8
Compare
I think there's a small misunderstanding regarding #5640. |
It was clear from the start that it is optional and that the pg driver doesn't need to implement it. But the thing is that I do want to implement it to reach the full performance potential also on pg. Only I can't because the interface proposed in #5640 isn't compatible with server-side txes. What I would have done for prefetch is implement a clean batch interface that allows single round-trip queries and is compatible with both etcd and postgres. |
60e0fc9
to
eb0245d
Compare
It looks like the Postgres itest is finally passing. One last modification is that I increased the remote db async payments timeout from 2 to 3 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this PR pass the itests! Excited to get this in, thanks for your work on this 🎉
Only a few nits left and needs a rebase, but otherwise LGTM!
2f0fadf
to
a112ccb
Compare
a112ccb
to
191ed3c
Compare
d91dfed
to
daeb96f
Compare
Coveralls is still down (the reason why that unit test is failing): https://status.coveralls.io/incidents/8zbjrwpb4frv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🦚
// cfg is the postgres connection config. | ||
cfg *Config | ||
|
||
// prefix is the table name prefix that is used to simulate namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, had initially missed that!
Will the migration tool for etcd -> postgres be in a separate PR? |
The migration tool PR is here: #5561 |
The Bottlepay team has been working on Postgres storage backend support in
lnd
.Advantages of using postgres over the default bbolt storage backend include:
lnd
is runningVACUUM
)However, in our opinion the greatest advantage of switching to Postgres is that it allows data structures to be migrated over to structured sql tables in a gradual way. The initial version will only be a single key-value table that holds all of the data that is currently in bbolt. Once on the Postgres platform, migrations can be carried out that move specific data to dedicated tables when the need arises.
Structured sql tables reduce the need for custom serialize/deserialize code in
lnd
itself, freeing up precious developer resources to work on less mundane logic, and also allow adding indexes and constraints to protect data integrity and improve performance.One particular bottleneck in the current bbolt implementation is the global write lock. There can only be a single writer active at any time. Postgres offers elaborate locking models that hold great promise to reduce lock contention in high-performance scenarios.
Running
To run with Postgres, add the following command line flags. The specified database needs to be present, but tables will be created if they don't exist already.
lnd --db.backend=postgres --db.postgres.dsn=postgres://lnd:lnd@localhost:45432/lnd?sslmode=disable
Implementation
Thanks to the already abstracted kvdb interface, the change set can remain small and no application-level changes are needed. The key-value table has the following format:
id | key | value | parent_id | sequence
Rows representing a bucket have nil in the
value
column. Items reference their parent bucket viaparent_id
. Thesequence
column tracks the current sequence number for bucket rows.Performance
We ran this branch on the Lightning benchmark rig that we released earlier. The benchmark application opens 10 channels between a pair of nodes and continuously executes keysend payments in 100 parallel threads.
Previous runs resulted in 35 tps for bbolt. With postgres, we measured 15 tps. This is a reduction that may or may not be relevant depending on how busy your node is. There is potential for considerable optimization still. Two main areas to look into are:
Update 2021-09-20: updated PR desc to reflect latest state of this PR and fresh performance numbers.