channeldb: reduce DB size for node with large number of payments #3703

guggero · 2019-11-11T08:52:42Z

We've observed that nodes that do many payments (which includes failed payments or probes) end up with very large channel.db files (in some cases more than 20 GB).
I created a fork of boltbrowser that can read our channel DB file and displays a nice human-readable memory usage.

As an example, I have a medium-sized DB file (183 MB) that @joostjager provided.
Here are some of the bigger buckets:

/graph-edge                   alloc: 18.2 MiB, use: 9.2 MiB
/network-result-store-bucket  alloc: 12.7 MiB, use: 5.7 MiB
/open-chan-bucket             alloc: 122.6 MiB, use: 89.5 MiB
/payments-root-bucket         alloc: 15.7 MiB, use: 10.2 MiB

If we expand the open-chan-bucket further, we notice that there is only one channel and that all of the data is in the sub bucket revocation-log-key:

Path: open-chan-bucket -> 027xxxx -> 434xxxx -> revocation-log-key
Buckets: 0
Pairs: 32828
Leaf size allocated: 122.6 MiB
Leaf size in use: 89.5 MiB

Is there any data in the revocation-log-key that we can purge to reduce the database?

The text was updated successfully, but these errors were encountered:

halseth · 2019-11-11T09:50:17Z

PR for cleaning network-result-store-bucket (blocked by MPP changes): #3131

Crypt-iQ · 2019-11-11T18:17:48Z

@guggero I think it may be possible to store some sort of diff between commitments so that revocation-log-key doesn't take up so much space. Unfortunately, even deleting records via a Delete call won't actually free up space, so a user would have to run bbolt compact.

Roasbeef · 2019-11-11T23:32:25Z

The biggest offender is the revocation log. It ends up redundantly storing all the HTLC blobs as long as they're present within a commitment transaction. At first, I thought we could just blank them all out, but it seems possible that in the future we have some additional redemption clauses that rely on information provided within the onion blob. In this case, if they were removed for a breached state, then we may not be able to sweep the funds. However, we have the revocation clause as is, which doesn't require any additional material to unlock, so perhaps we don't need to worry at all.

In that case, we can just blank out the HTLC onion blob before it's written to the revocation log, this should save a large amount of space.

Longer term, we could just reverse the design pattern of the revocation log itself, which was predicated on the following assumption: we could reconstruct all this at runtime, but it seems safer to just keep everything around until we know exactly what we don't need, then we can start to phase it out. If we want to go this path, then we'd need to brute force certain HTLC+commitment details in order to sweep things properly.

PVminer · 2021-07-16T13:53:21Z

In my node (channel.db 833 MiB after compaction), the biggest culprit is payments-root-bucket.

/graph-edge alloc: 61.6 MiB, use: 58.3 MiB
/network-result-store-bucket alloc: 10.4 MiB, use: 10.0 MiB
/open-chan-bucket alloc: 206.8 MiB, use: 178.6 MiB
/payments-root-bucket alloc: 534.8 MiB, use: 67.3 MiB

The last bucket is not compacting at all, it was 537 MiB alloc pre compactions, while open-chan-bucket compacted from 364 MiB to 206.8 MiB.

The size of the channel.db starts to impact the usage. My node (on Raspberry Pi 4) is more sluggish than it used to be.

Is there a way to reduce payments-root-bucket?

dlaptev · 2021-07-27T13:32:41Z

Great to see this prioritized to a milestone! My two cents to shed more light on the scale of the problem for nodes with large number of channels.

My channel.db is now at 6GB and it is growing at ~200MB per day, which is clearly not sustainable. Compaction helps temporarily, the database quickly beats previous records.

A couple of months ago the node was routing the same number of payments per day (~1k), but the database was growing much slower. So my guess is to blame a recent spike in probes or some other flow of unsettled HTLCs.

While this is in the works for 0.14, is there any mitigation that can be applied?

Crypt-iQ · 2021-07-27T14:06:44Z

@dlaptev perhaps deleting failed payments can help? There is an option to do that now with 0.13.1

dlaptev · 2021-07-28T08:30:32Z

Good idea, thanks @Crypt-iQ! I tried DeleteAllPayments(true, true) for the moment to only cleanup HTLC attempts for failed payments, and the database is indeed slightly smaller after the compaction (4.6GB, while before it was 5.0GB after compaction). Will try DeleteAllPayments(true, false) for all failed payments next time.

itmanagerro · 2021-08-23T17:04:08Z

my channel.db at 27G now

@dlaptev indeed, last year it grew slowly, but now it's faster and faster with each day ===> people started to make use of automated rebalance tools, tremendous amount of failed transactions for each rebalance attempt (check out rebalance-lnd of Otto).

I've also noticed, that my very firsts rebalances went super fast, probably 2-4 seconds to acknowledge failure and switch to next try... but now, i got somewhere close to 300 seconds between tries... each day goes higher (as it's computing all previous failed txs).

to be honest, I wasn't aware up until today about this massive size for db and at this point I strongly believe a lot of folks will face trouble ahead soon enough... I do have some TB of SSD free space, but some rPIs builds for LND, won't have enough disk-space for bitcoin+lnchannels databases (eg. NODL/RaspiBolt).

the very next question is how bad can it get, once lnd has no disk-space to write channels.db to? I strongly believe #5388 should get max priority

Roasbeef · 2021-09-22T19:08:43Z

@itmanagerro this PR will serve to remove a lot of the garbage left around from old closed channels that had a lot of forwards: #4364

If you've never compacted your database, then you should definitely do so: https://github.com/lightningnetwork/lnd/blob/master/sample-lnd.conf#L11

I wager you have a ton of failed payments sitting around because of the rebalanced tools, that's what causes the DB size to grow more quickly, you can delete failed payments, or just the HTLC of failed payments with this command: https://api.lightning.community/#deleteallpayments

itmanagerro · 2021-09-23T11:52:49Z

Thanks @Roasbeef for the #4364, I'm quite up2date with the configs of LND and restarting LND weekly for compacting the DB, but it affects my node-rank uptime as the DB computation takes more than 3 minutes.
Minimum uptime 99.9% --- 0.1% of a day is 1 minute and 26 seconds.

I saw yesterday the npm i lightning package side and already did the "DeletedAllPayments", but now I have more than 250.000 invoices hogging on

DB Compact - db.bolt.auto-compact-min-age=0
GC Invoices - gc-canceled-invoices-on-startup=true and gc-canceled-invoices-on-the-fly=true
DeletedAllPayments - all cleared successfully

Now I'm running with 1.9GB channels.db and still going a bit slow, not as much as when it was 27GB tho'

itmanagerro · 2021-09-25T19:03:59Z

Cleaned a bit more junk and got to 1.6G, then I applied this patch, with no difference in disk-space.

2021-09-25 21:29:47.764 [INF] CHDB: Checking for schema update: latest_version=24, db_version=22
2021-09-25 21:29:47.764 [INF] CHDB: Performing database schema migration
2021-09-25 21:29:47.764 [INF] CHDB: Applying migration #23
2021-09-25 21:29:47.831 [INF] CHDB: Applying migration #24
2021-09-25 21:29:47.831 [INF] CHDB: Deleting forwarding packages for closed channels
2021-09-25 21:29:47.850 [INF] CHDB: Deletion of forwarding packages of closed channels complete! DB compaction is
recommended to free up thedisk space.
2021-09-25 21:29:48.095 [INF] LTND: Database(s) now open (time_to_open=3m17.961710618s)!

Checked DB files = 1.6G same as before the migration, then I've seen the "recommendation" and restarted lnd once again, DB same at 1.6G

I now installed bolt and these are the stats:

Aggregate statistics for 201 buckets

Page count statistics
Number of logical branch pages: 2225
Number of physical branch overflow pages: 0
Number of logical leaf pages: 338583
Number of physical leaf overflow pages: 73297

Tree statistics
Number of keys/value pairs: 1693111
Number of levels in B+tree: 10

Page size utilization
Bytes allocated for physical branch pages: 9113600
Bytes actually used for branch data: 8458500 (92%)
Bytes allocated for physical leaf pages: 1687060480
Bytes actually used for leaf data: 1369326127 (81%)

Bucket statistics
Total number of buckets: 6373
Total number on inlined buckets: 3980 (62%)
Bytes used for inlined buckets: 1376580 (0%)

dlaptev · 2021-10-16T09:24:57Z

Just another datapoint. My channel.db is now at 7.1GB immediately after deleting all payments (all payments, not only htlcs, not only failed) and compacting the database.

But even more larger pain point is the growth: within two days the DB grew to 11GB.

I have not yet tried #4364, but I do not think it will help, since this is clearly coming from active channels, not closed.

Any other tips on how to deal with this will be greatly appreciated!

Roasbeef · 2021-10-18T21:52:56Z

@dlaptev you need to compact in order to reclaim disk space. boltdb just moves things to a freelist by default.

dlaptev · 2021-10-19T03:18:53Z

@Roasbeef of course, 7.1GB is after compacting already.

itmanagerro · 2021-10-19T10:46:08Z

@Roasbeef

Meanwhile my channels.db got to 3.8GB after compacting and DeleteAllPayments(1,1) within 24 days from 1.6GB and the node is getting slower and slower.

I notice the same behaviour on many other nodes and I'm pretty sure it all goes down to the channels.db size, some of these nodes have force-closed many channels (slowness generates long-pending HTLC), some nodes were decommissioned completely after getting slower and slower (perhaps their DB got so big they simply can't start back the LND daemon).

Is there anyway to migrate lnd from boltdb to a mariadb instance, for example? Maybe SQL knowledge would allow me to understand better where/what this data is.

Now, because the node is getting slower, each tx tried takes few seconds, instead of mili-seconds.... from a security perspective this looks like a "slowlorries" kind of attack, which is creating many many pending HTLC basically depleting the available channels funds (I had a channel with citadel21 which had 12 long-pending HTCLs and only 10k sats available out of 2MM... took few weeks to get all the utxos out).

itmanagerro · 2021-10-22T08:51:44Z

Update: more and more nodes are getting down due to channels.db corruption.

/r/TheLightningNetwork has many "channel.db" issues, not fun to watch it.

https://www.reddit.com/r/TheLightningNetwork/search/?q=channel.db&restrict_sr=1&sr_nsfw=

feikede · 2021-10-25T14:54:40Z

My channel.db has 20GB now - glad I am not on Raspberry anymore...

daywalker90 · 2021-10-28T11:51:04Z

My channel.db is at 13GB atm, i already used DeleteAllPayments(true, false) and DeleteAllPayments(false, true) and it helped a bit. Is it safe to do DeleteAllPayments(false, false) and delete all successfull payments (i have 15k+ of those)?

guggero · 2021-10-28T11:54:10Z

Make sure to also run compaction after deleting large amount of data to actually re-claim the freed up space.

Is it safe to do DeleteAllPayments(false, false) and delete all successfull payments (i have 15k+ of those)?

That fully depends on your business case or accounting. From lnd's point of view it is safe to do that, yes. But you'll lose that information. So if you need it for accounting purposes, I suggest exporting the payments first.

feikede · 2021-10-28T14:16:16Z

I only have like 20 payments in my 20GB channel.db - don't think that's the pain point... (maybe I have a lot of failed payments... don't know)

Please, what is that DeleteAllPayments? Is it a lncli command?

daywalker90 · 2021-11-18T11:36:58Z

After upgrading to 0.14, deleting failed payments and then compacting my channel.db went from 19.1GB to 18.3GB. I currently have ~20 open channels, ~90+ closed channels, 6k+ forward events and 5k+ payments. At what point does this become a problem with regards to RAM?

Roasbeef · 2022-05-05T22:55:33Z

Fixed by #6347

wpaulino added channels database Related to the database/storage of LND P3 might get fixed, nice to have labels Nov 11, 2019

halseth mentioned this issue Jul 17, 2020

[reliable payments] switch result store clean-up #3131

Merged

Roasbeef mentioned this issue Jun 16, 2021

channeldb: master tracking disk-space reduction issue #5388

Closed

4 tasks

itmanagerro mentioned this issue Sep 23, 2021

channel.db is over 1.5gb after DeleteAllPayments and recompacting! Can I do anything else? #5705

Closed

Roasbeef added this to the v0.15.0 milestone Sep 27, 2021

C-Otto mentioned this issue Nov 6, 2021

Slow down channel.db growth (and add documentation) #5938

Closed

Roasbeef added P1 MUST be fixed or reviewed and removed P3 might get fixed, nice to have labels Feb 2, 2022

Roasbeef assigned yyforyongyu Mar 7, 2022

Roasbeef closed this as completed May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

channeldb: reduce DB size for node with large number of payments #3703

channeldb: reduce DB size for node with large number of payments #3703

guggero commented Nov 11, 2019

halseth commented Nov 11, 2019

Crypt-iQ commented Nov 11, 2019

Roasbeef commented Nov 11, 2019

PVminer commented Jul 16, 2021

dlaptev commented Jul 27, 2021

Crypt-iQ commented Jul 27, 2021

dlaptev commented Jul 28, 2021

itmanagerro commented Aug 23, 2021

Roasbeef commented Sep 22, 2021

itmanagerro commented Sep 23, 2021

itmanagerro commented Sep 25, 2021 •

edited

Loading

dlaptev commented Oct 16, 2021

Roasbeef commented Oct 18, 2021

dlaptev commented Oct 19, 2021

itmanagerro commented Oct 19, 2021 •

edited

Loading

itmanagerro commented Oct 22, 2021

feikede commented Oct 25, 2021

daywalker90 commented Oct 28, 2021

guggero commented Oct 28, 2021

feikede commented Oct 28, 2021

daywalker90 commented Nov 18, 2021

Roasbeef commented May 5, 2022

channeldb: reduce DB size for node with large number of payments #3703

channeldb: reduce DB size for node with large number of payments #3703

Comments

guggero commented Nov 11, 2019

halseth commented Nov 11, 2019

Crypt-iQ commented Nov 11, 2019

Roasbeef commented Nov 11, 2019

PVminer commented Jul 16, 2021

dlaptev commented Jul 27, 2021

Crypt-iQ commented Jul 27, 2021

dlaptev commented Jul 28, 2021

itmanagerro commented Aug 23, 2021

Roasbeef commented Sep 22, 2021

itmanagerro commented Sep 23, 2021

itmanagerro commented Sep 25, 2021 • edited Loading

dlaptev commented Oct 16, 2021

Roasbeef commented Oct 18, 2021

dlaptev commented Oct 19, 2021

itmanagerro commented Oct 19, 2021 • edited Loading

itmanagerro commented Oct 22, 2021

feikede commented Oct 25, 2021

daywalker90 commented Oct 28, 2021

guggero commented Oct 28, 2021

feikede commented Oct 28, 2021

daywalker90 commented Nov 18, 2021

Roasbeef commented May 5, 2022

itmanagerro commented Sep 25, 2021 •

edited

Loading

itmanagerro commented Oct 19, 2021 •

edited

Loading