-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channeldb: reduce DB size for node with large number of payments #3703
Comments
PR for cleaning |
@guggero I think it may be possible to store some sort of diff between commitments so that |
The biggest offender is the revocation log. It ends up redundantly storing all the HTLC blobs as long as they're present within a commitment transaction. At first, I thought we could just blank them all out, but it seems possible that in the future we have some additional redemption clauses that rely on information provided within the onion blob. In this case, if they were removed for a breached state, then we may not be able to sweep the funds. However, we have the revocation clause as is, which doesn't require any additional material to unlock, so perhaps we don't need to worry at all. In that case, we can just blank out the HTLC onion blob before it's written to the revocation log, this should save a large amount of space. Longer term, we could just reverse the design pattern of the revocation log itself, which was predicated on the following assumption: we could reconstruct all this at runtime, but it seems safer to just keep everything around until we know exactly what we don't need, then we can start to phase it out. If we want to go this path, then we'd need to brute force certain HTLC+commitment details in order to sweep things properly. |
In my node (channel.db 833 MiB after compaction), the biggest culprit is payments-root-bucket.
The last bucket is not compacting at all, it was 537 MiB alloc pre compactions, while open-chan-bucket compacted from 364 MiB to 206.8 MiB. The size of the channel.db starts to impact the usage. My node (on Raspberry Pi 4) is more sluggish than it used to be. Is there a way to reduce payments-root-bucket? |
Great to see this prioritized to a milestone! My two cents to shed more light on the scale of the problem for nodes with large number of channels. My channel.db is now at 6GB and it is growing at ~200MB per day, which is clearly not sustainable. Compaction helps temporarily, the database quickly beats previous records. A couple of months ago the node was routing the same number of payments per day (~1k), but the database was growing much slower. So my guess is to blame a recent spike in probes or some other flow of unsettled HTLCs. While this is in the works for 0.14, is there any mitigation that can be applied? |
@dlaptev perhaps deleting failed payments can help? There is an option to do that now with 0.13.1 |
Good idea, thanks @Crypt-iQ! I tried |
my channel.db at 27G now @dlaptev indeed, last year it grew slowly, but now it's faster and faster with each day ===> people started to make use of automated rebalance tools, tremendous amount of failed transactions for each rebalance attempt (check out rebalance-lnd of Otto). I've also noticed, that my very firsts rebalances went super fast, probably 2-4 seconds to acknowledge failure and switch to next try... but now, i got somewhere close to 300 seconds between tries... each day goes higher (as it's computing all previous failed txs). to be honest, I wasn't aware up until today about this massive size for db and at this point I strongly believe a lot of folks will face trouble ahead soon enough... I do have some TB of SSD free space, but some rPIs builds for LND, won't have enough disk-space for bitcoin+lnchannels databases (eg. NODL/RaspiBolt). the very next question is how bad can it get, once lnd has no disk-space to write channels.db to? I strongly believe #5388 should get max priority |
@itmanagerro this PR will serve to remove a lot of the garbage left around from old closed channels that had a lot of forwards: #4364 If you've never compacted your database, then you should definitely do so: https://github.com/lightningnetwork/lnd/blob/master/sample-lnd.conf#L11 I wager you have a ton of failed payments sitting around because of the rebalanced tools, that's what causes the DB size to grow more quickly, you can delete failed payments, or just the HTLC of failed payments with this command: https://api.lightning.community/#deleteallpayments |
Thanks @Roasbeef for the #4364, I'm quite up2date with the configs of LND and restarting LND weekly for compacting the DB, but it affects my node-rank uptime as the DB computation takes more than 3 minutes. I saw yesterday the
Now I'm running with 1.9GB channels.db and still going a bit slow, not as much as when it was 27GB tho' |
Cleaned a bit more junk and got to 1.6G, then I applied this patch, with no difference in disk-space.
Checked DB files = 1.6G same as before the migration, then I've seen the "recommendation" and restarted lnd once again, DB same at 1.6G I now installed bolt and these are the stats:
|
Just another datapoint. My channel.db is now at 7.1GB immediately after deleting all payments (all payments, not only htlcs, not only failed) and compacting the database. But even more larger pain point is the growth: within two days the DB grew to 11GB. I have not yet tried #4364, but I do not think it will help, since this is clearly coming from active channels, not closed. Any other tips on how to deal with this will be greatly appreciated! |
@dlaptev you need to compact in order to reclaim disk space. boltdb just moves things to a freelist by default. |
@Roasbeef of course, 7.1GB is after compacting already. |
Meanwhile my channels.db got to 3.8GB after compacting and DeleteAllPayments(1,1) within 24 days from 1.6GB and the node is getting slower and slower. I notice the same behaviour on many other nodes and I'm pretty sure it all goes down to the channels.db size, some of these nodes have force-closed many channels (slowness generates long-pending HTLC), some nodes were decommissioned completely after getting slower and slower (perhaps their DB got so big they simply can't start back the LND daemon). Is there anyway to migrate lnd from boltdb to a mariadb instance, for example? Maybe SQL knowledge would allow me to understand better where/what this data is. Now, because the node is getting slower, each tx tried takes few seconds, instead of mili-seconds.... from a security perspective this looks like a "slowlorries" kind of attack, which is creating many many pending HTLC basically depleting the available channels funds (I had a channel with citadel21 which had 12 long-pending HTCLs and only 10k sats available out of 2MM... took few weeks to get all the utxos out). |
Update: more and more nodes are getting down due to channels.db corruption. /r/TheLightningNetwork has many "channel.db" issues, not fun to watch it. https://www.reddit.com/r/TheLightningNetwork/search/?q=channel.db&restrict_sr=1&sr_nsfw= |
My channel.db has 20GB now - glad I am not on Raspberry anymore... |
My channel.db is at 13GB atm, i already used |
Make sure to also run compaction after deleting large amount of data to actually re-claim the freed up space.
That fully depends on your business case or accounting. From |
I only have like 20 payments in my 20GB channel.db - don't think that's the pain point... (maybe I have a lot of failed payments... don't know) Please, what is that DeleteAllPayments? Is it a lncli command? |
After upgrading to 0.14, deleting failed payments and then compacting my channel.db went from 19.1GB to 18.3GB. I currently have ~20 open channels, ~90+ closed channels, 6k+ forward events and 5k+ payments. At what point does this become a problem with regards to RAM? |
Fixed by #6347 |
We've observed that nodes that do many payments (which includes failed payments or probes) end up with very large
channel.db
files (in some cases more than 20 GB).I created a fork of boltbrowser that can read our channel DB file and displays a nice human-readable memory usage.
As an example, I have a medium-sized DB file (183 MB) that @joostjager provided.
Here are some of the bigger buckets:
If we expand the
open-chan-bucket
further, we notice that there is only one channel and that all of the data is in the sub bucketrevocation-log-key
:Is there any data in the
revocation-log-key
that we can purge to reduce the database?The text was updated successfully, but these errors were encountered: