Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channeldb: reduce DB size for node with large number of payments #3703

Closed
guggero opened this issue Nov 11, 2019 · 22 comments
Closed

channeldb: reduce DB size for node with large number of payments #3703

guggero opened this issue Nov 11, 2019 · 22 comments
Assignees
Labels
channels database Related to the database/storage of LND P1 MUST be fixed or reviewed
Milestone

Comments

@guggero
Copy link
Collaborator

guggero commented Nov 11, 2019

We've observed that nodes that do many payments (which includes failed payments or probes) end up with very large channel.db files (in some cases more than 20 GB).
I created a fork of boltbrowser that can read our channel DB file and displays a nice human-readable memory usage.

As an example, I have a medium-sized DB file (183 MB) that @joostjager provided.
Here are some of the bigger buckets:

/graph-edge                   alloc: 18.2 MiB, use: 9.2 MiB
/network-result-store-bucket  alloc: 12.7 MiB, use: 5.7 MiB
/open-chan-bucket             alloc: 122.6 MiB, use: 89.5 MiB
/payments-root-bucket         alloc: 15.7 MiB, use: 10.2 MiB

If we expand the open-chan-bucket further, we notice that there is only one channel and that all of the data is in the sub bucket revocation-log-key:

Path: open-chan-bucket -> 027xxxx -> 434xxxx -> revocation-log-key
Buckets: 0
Pairs: 32828
Leaf size allocated: 122.6 MiB
Leaf size in use: 89.5 MiB

Is there any data in the revocation-log-key that we can purge to reduce the database?

@halseth
Copy link
Contributor

halseth commented Nov 11, 2019

PR for cleaning network-result-store-bucket (blocked by MPP changes): #3131

@Crypt-iQ
Copy link
Collaborator

@guggero I think it may be possible to store some sort of diff between commitments so that revocation-log-key doesn't take up so much space. Unfortunately, even deleting records via a Delete call won't actually free up space, so a user would have to run bbolt compact.

@wpaulino wpaulino added channels database Related to the database/storage of LND P3 might get fixed, nice to have labels Nov 11, 2019
@Roasbeef
Copy link
Member

The biggest offender is the revocation log. It ends up redundantly storing all the HTLC blobs as long as they're present within a commitment transaction. At first, I thought we could just blank them all out, but it seems possible that in the future we have some additional redemption clauses that rely on information provided within the onion blob. In this case, if they were removed for a breached state, then we may not be able to sweep the funds. However, we have the revocation clause as is, which doesn't require any additional material to unlock, so perhaps we don't need to worry at all.

In that case, we can just blank out the HTLC onion blob before it's written to the revocation log, this should save a large amount of space.

Longer term, we could just reverse the design pattern of the revocation log itself, which was predicated on the following assumption: we could reconstruct all this at runtime, but it seems safer to just keep everything around until we know exactly what we don't need, then we can start to phase it out. If we want to go this path, then we'd need to brute force certain HTLC+commitment details in order to sweep things properly.

@PVminer
Copy link

PVminer commented Jul 16, 2021

In my node (channel.db 833 MiB after compaction), the biggest culprit is payments-root-bucket.

/graph-edge alloc: 61.6 MiB, use: 58.3 MiB
/network-result-store-bucket alloc: 10.4 MiB, use: 10.0 MiB
/open-chan-bucket alloc: 206.8 MiB, use: 178.6 MiB
/payments-root-bucket alloc: 534.8 MiB, use: 67.3 MiB

The last bucket is not compacting at all, it was 537 MiB alloc pre compactions, while open-chan-bucket compacted from 364 MiB to 206.8 MiB.

The size of the channel.db starts to impact the usage. My node (on Raspberry Pi 4) is more sluggish than it used to be.

Is there a way to reduce payments-root-bucket?

@dlaptev
Copy link

dlaptev commented Jul 27, 2021

Great to see this prioritized to a milestone! My two cents to shed more light on the scale of the problem for nodes with large number of channels.

My channel.db is now at 6GB and it is growing at ~200MB per day, which is clearly not sustainable. Compaction helps temporarily, the database quickly beats previous records.

A couple of months ago the node was routing the same number of payments per day (~1k), but the database was growing much slower. So my guess is to blame a recent spike in probes or some other flow of unsettled HTLCs.

While this is in the works for 0.14, is there any mitigation that can be applied?

@Crypt-iQ
Copy link
Collaborator

@dlaptev perhaps deleting failed payments can help? There is an option to do that now with 0.13.1

@dlaptev
Copy link

dlaptev commented Jul 28, 2021

Good idea, thanks @Crypt-iQ! I tried DeleteAllPayments(true, true) for the moment to only cleanup HTLC attempts for failed payments, and the database is indeed slightly smaller after the compaction (4.6GB, while before it was 5.0GB after compaction). Will try DeleteAllPayments(true, false) for all failed payments next time.

@itmanagerro
Copy link

my channel.db at 27G now

@dlaptev indeed, last year it grew slowly, but now it's faster and faster with each day ===> people started to make use of automated rebalance tools, tremendous amount of failed transactions for each rebalance attempt (check out rebalance-lnd of Otto).

I've also noticed, that my very firsts rebalances went super fast, probably 2-4 seconds to acknowledge failure and switch to next try... but now, i got somewhere close to 300 seconds between tries... each day goes higher (as it's computing all previous failed txs).

to be honest, I wasn't aware up until today about this massive size for db and at this point I strongly believe a lot of folks will face trouble ahead soon enough... I do have some TB of SSD free space, but some rPIs builds for LND, won't have enough disk-space for bitcoin+lnchannels databases (eg. NODL/RaspiBolt).

the very next question is how bad can it get, once lnd has no disk-space to write channels.db to? I strongly believe #5388 should get max priority

@Roasbeef
Copy link
Member

@itmanagerro this PR will serve to remove a lot of the garbage left around from old closed channels that had a lot of forwards: #4364

If you've never compacted your database, then you should definitely do so: https://github.com/lightningnetwork/lnd/blob/master/sample-lnd.conf#L11

I wager you have a ton of failed payments sitting around because of the rebalanced tools, that's what causes the DB size to grow more quickly, you can delete failed payments, or just the HTLC of failed payments with this command: https://api.lightning.community/#deleteallpayments

@itmanagerro
Copy link

Thanks @Roasbeef for the #4364, I'm quite up2date with the configs of LND and restarting LND weekly for compacting the DB, but it affects my node-rank uptime as the DB computation takes more than 3 minutes.
Minimum uptime 99.9% --- 0.1% of a day is 1 minute and 26 seconds.

I saw yesterday the npm i lightning package side and already did the "DeletedAllPayments", but now I have more than 250.000 invoices hogging on

  1. DB Compact - db.bolt.auto-compact-min-age=0
  2. GC Invoices - gc-canceled-invoices-on-startup=true and gc-canceled-invoices-on-the-fly=true
  3. DeletedAllPayments - all cleared successfully

Now I'm running with 1.9GB channels.db and still going a bit slow, not as much as when it was 27GB tho'

@itmanagerro
Copy link

itmanagerro commented Sep 25, 2021

Cleaned a bit more junk and got to 1.6G, then I applied this patch, with no difference in disk-space.

2021-09-25 21:29:47.764 [INF] CHDB: Checking for schema update: latest_version=24, db_version=22
2021-09-25 21:29:47.764 [INF] CHDB: Performing database schema migration
2021-09-25 21:29:47.764 [INF] CHDB: Applying migration #23
2021-09-25 21:29:47.831 [INF] CHDB: Applying migration #24
2021-09-25 21:29:47.831 [INF] CHDB: Deleting forwarding packages for closed channels
2021-09-25 21:29:47.850 [INF] CHDB: Deletion of forwarding packages of closed channels complete! DB compaction is
recommended to free up thedisk space.
2021-09-25 21:29:48.095 [INF] LTND: Database(s) now open (time_to_open=3m17.961710618s)!

Checked DB files = 1.6G same as before the migration, then I've seen the "recommendation" and restarted lnd once again, DB same at 1.6G

I now installed bolt and these are the stats:

Aggregate statistics for 201 buckets

Page count statistics
Number of logical branch pages: 2225
Number of physical branch overflow pages: 0
Number of logical leaf pages: 338583
Number of physical leaf overflow pages: 73297

Tree statistics
Number of keys/value pairs: 1693111
Number of levels in B+tree: 10

Page size utilization
Bytes allocated for physical branch pages: 9113600
Bytes actually used for branch data: 8458500 (92%)
Bytes allocated for physical leaf pages: 1687060480
Bytes actually used for leaf data: 1369326127 (81%)

Bucket statistics
Total number of buckets: 6373
Total number on inlined buckets: 3980 (62%)
Bytes used for inlined buckets: 1376580 (0%)

@Roasbeef Roasbeef added this to the v0.15.0 milestone Sep 27, 2021
@dlaptev
Copy link

dlaptev commented Oct 16, 2021

Just another datapoint. My channel.db is now at 7.1GB immediately after deleting all payments (all payments, not only htlcs, not only failed) and compacting the database.

But even more larger pain point is the growth: within two days the DB grew to 11GB.

I have not yet tried #4364, but I do not think it will help, since this is clearly coming from active channels, not closed.

Any other tips on how to deal with this will be greatly appreciated!

@Roasbeef
Copy link
Member

@dlaptev you need to compact in order to reclaim disk space. boltdb just moves things to a freelist by default.

@dlaptev
Copy link

dlaptev commented Oct 19, 2021

@Roasbeef of course, 7.1GB is after compacting already.

@itmanagerro
Copy link

itmanagerro commented Oct 19, 2021

@Roasbeef

Meanwhile my channels.db got to 3.8GB after compacting and DeleteAllPayments(1,1) within 24 days from 1.6GB and the node is getting slower and slower.

I notice the same behaviour on many other nodes and I'm pretty sure it all goes down to the channels.db size, some of these nodes have force-closed many channels (slowness generates long-pending HTLC), some nodes were decommissioned completely after getting slower and slower (perhaps their DB got so big they simply can't start back the LND daemon).

Is there anyway to migrate lnd from boltdb to a mariadb instance, for example? Maybe SQL knowledge would allow me to understand better where/what this data is.

Now, because the node is getting slower, each tx tried takes few seconds, instead of mili-seconds.... from a security perspective this looks like a "slowlorries" kind of attack, which is creating many many pending HTLC basically depleting the available channels funds (I had a channel with citadel21 which had 12 long-pending HTCLs and only 10k sats available out of 2MM... took few weeks to get all the utxos out).

@itmanagerro
Copy link

Update: more and more nodes are getting down due to channels.db corruption.

/r/TheLightningNetwork has many "channel.db" issues, not fun to watch it.

https://www.reddit.com/r/TheLightningNetwork/search/?q=channel.db&restrict_sr=1&sr_nsfw=

@feikede
Copy link

feikede commented Oct 25, 2021

My channel.db has 20GB now - glad I am not on Raspberry anymore...

@daywalker90
Copy link

My channel.db is at 13GB atm, i already used DeleteAllPayments(true, false) and DeleteAllPayments(false, true) and it helped a bit. Is it safe to do DeleteAllPayments(false, false) and delete all successfull payments (i have 15k+ of those)?

@guggero
Copy link
Collaborator Author

guggero commented Oct 28, 2021

Make sure to also run compaction after deleting large amount of data to actually re-claim the freed up space.

Is it safe to do DeleteAllPayments(false, false) and delete all successfull payments (i have 15k+ of those)?

That fully depends on your business case or accounting. From lnd's point of view it is safe to do that, yes. But you'll lose that information. So if you need it for accounting purposes, I suggest exporting the payments first.

@feikede
Copy link

feikede commented Oct 28, 2021

I only have like 20 payments in my 20GB channel.db - don't think that's the pain point... (maybe I have a lot of failed payments... don't know)

Please, what is that DeleteAllPayments? Is it a lncli command?

@daywalker90
Copy link

After upgrading to 0.14, deleting failed payments and then compacting my channel.db went from 19.1GB to 18.3GB. I currently have ~20 open channels, ~90+ closed channels, 6k+ forward events and 5k+ payments. At what point does this become a problem with regards to RAM?

@Roasbeef Roasbeef added P1 MUST be fixed or reviewed and removed P3 might get fixed, nice to have labels Feb 2, 2022
@Roasbeef
Copy link
Member

Roasbeef commented May 5, 2022

Fixed by #6347

@Roasbeef Roasbeef closed this as completed May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channels database Related to the database/storage of LND P1 MUST be fixed or reviewed
Projects
None yet
Development

No branches or pull requests