Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channel.db is over 1.5gb after DeleteAllPayments and recompacting! Can I do anything else? #5705

Closed
wankerstar opened this issue Sep 9, 2021 · 29 comments

Comments

@wankerstar
Copy link

wankerstar commented Sep 9, 2021

Background

After running DeleteAllPayments and compacting on my workhorse machine with chantools and moving back to the node, my channel.db lists as follows:
ls -lAF ~/.lnd/data/graph/mainnet/
-rw------- 1 bitcoin bitcoin 1564844032 Sep 9 17:58 channel.db

My channel.db is too large to compact on my node and must be moved to a stronger machine to run chantools. Ultimately this is not sustainable if the database cannot ever be reduced in size enough to compact it on the node itself. When it swells over about 2.1 gb, lnd will crash with a paging error.

Is there anything else I can do? The only option to clear data through the API seems to be DeleteAllPayments, which I use liberally, but it doesn't seem to slow the growth or allow significant gains when compacting.

Is lnd simply not intended for light hardware? Do I need to migrate to C-Lightning if I intend to continue to run an RPi node, and are they are any tools for such a migration?

Thanks in advance for any help!

Your environment

Raspberry Pi 4 running DietPi
lnd version 0.13.1-beta commit=v0.13.1-beta
Linux dietnode 5.10.17-v8+ #1403 SMP PREEMPT Mon Feb 22 11:37:54 GMT 2021 aarch64 GNU/Linux
Bitcoin Core version v0.21.1

@Roasbeef
Copy link
Member

Roasbeef commented Sep 9, 2021

This PR should help to clear out a lot of state: #4364

How old is your node? Number of channels as well?

Do you have automated GC of invoices activated?

      --gc-canceled-invoices-on-startup                       If true, we'll attempt to garbage collect canceled invoices upon start.
      --gc-canceled-invoices-on-the-fly                       If true, we'll delete newly canceled invoices on the fly.

If you have a lot of invoices, then these flags can be used to clear them out either on start up, or on an on going basis.

Another flag you can use is:

      --routing.strictgraphpruning                            If true, then the graph will be pruned more aggressively for zombies. In practice this means that edges with a single stale edge will be considered a zombie.

With that your node will keep a more compact graph that is more live.

@wankerstar
Copy link
Author

@Roasbeef thanks for the response!

The node is 7 months old and has over 100 channels.
It routes fairly actively, and I've executed a lot of failed payment attempts while re-balancing my channels.

The PR appears (at a casual glance) to apply to closed channels, which will certainly help, but most of my channels are both old and still open.

I do already have both of the above invoice GC options enabled via config file.

I have not enabled strictgraphpruning. Do you expect that will have a very significant impact?

What is it that eats up so much space that can't be reclaimed? More importantly, does it scale with the number of payments? With the number of channels? It would be helpful to know whether I should discourage or reduce either to extend the lifetime of the node.

Thanks again for your time and patience!

@itmanagerro
Copy link

#4364 #3703

@rafaelpac
Copy link

My node is a few months old, few channels... I have been monitoring the channels.db file and it had been around 300MB. A couple days ago I noticed that it started to grow. In the last few days I haven't open any new channels, but I did get one force closed by me (not on purpose, it just closed and now it shows that I did the force close). In the last days I have also activated the watchtower client with one tower and updated to 0.13.3. But I cant't tell for sure if the size increases were before or after any of these changes... I am at 512MB now, it is growing ~50MB per day.

@feikede
Copy link

feikede commented Oct 19, 2021

Mine is 16GB for like 100 routings, 60 channels, 60 days uptime. V 0.13.3 on Ryzen7, Ubuntu 20.

btc@xxx:~/.lnd$ ls -al ./data/graph/mainnet/channel.db
-rw------- 1 btc btc 15776518144 Oct 19 14:31 ./data/graph/mainnet/channel.db

Using settings from Alex' rundown here: https://github.com/alexbosworth/run-lnd

What needs to be done here? Is this a problem for my funds...?

@feikede
Copy link

feikede commented Oct 21, 2021

Ok, today I have
-rw------- 1 btc btc 16874332160 Oct 21 12:59 channel.db

This is 1 GB per two days. I don't think that's healthy. Any explanation? Thanks.

@alexbosworth
Copy link
Contributor

One thing you can do is look at the number of state updates on your channels and reduce those by closing/reopening

VajraOfIndra added a commit to VajraOfIndra/RaspiBolt that referenced this issue Oct 28, 2021
Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705). 
This update adds a line to the LND section of the welcome script to show the channel.db size

For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)
@Roasbeef
Copy link
Member

This is 1 GB per two days. I don't think that's healthy. Any explanation? Thanks.

Why type of activity is your node even doing? is it all just failed rebalance attempts?

@feikede
Copy link

feikede commented Oct 29, 2021

Today I have
-rw------- 1 btc btc 25838387200 Oct 29 10:59 channel.db

Growing is up to 1 GB per day... Thanks for your suggestions!

@Roasbeef No, I am not rebalancing, tried it about 10 times some weeks ago but stopped then.

@alexbosworth Can you give me some search terms for the log to look for?

Grepping for ERR gives just

2021-10-28 23:30:27.895 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:51944: read tcp 127.0.0.1:9735->127.0.0.1:51944: i/o timeout
2021-10-29 02:28:58.944 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:56308: read tcp 127.0.0.1:9735->127.0.0.1:56308: i/o timeout
2021-10-29 02:31:26.300 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:56366: read tcp 127.0.0.1:9735->127.0.0.1:56366: i/o timeout
2021-10-29 08:31:25.825 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:37288: read tcp 127.0.0.1:9735->127.0.0.1:37288: i/o timeout
2021-10-29 08:31:45.304 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:37296: read tcp 127.0.0.1:9735->127.0.0.1:37296: i/o timeout
2021-10-29 08:32:00.343 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:37308: read tcp 127.0.0.1:9735->127.0.0.1:37308: i/o timeout
2021-10-29 08:32:24.088 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:37312: read tcp 127.0.0.1:9735->127.0.0.1:37312: i/o timeout
2021-10-29 08:32:52.021 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:37326: read tcp 127.0.0.1:9735->127.0.0.1:37326: i/o timeout
2021-10-29 08:33:27.547 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:37336: read tcp 127.0.0.1:9735->127.0.0.1:37336: i/o timeout
2021-10-29 09:12:10.738 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:38440: read tcp 127.0.0.1:9735->127.0.0.1:38440: i/o timeout
2021-10-29 09:33:12.860 [ERR] BTCN: Can't accept connection: unable to accept connection from 127.0.0.1:39002: read tcp 127.0.0.1:9735->127.0.0.1:39002: i/o timeout

Looks like problems with TOR I think

Grepping for WRN gives just

2021-10-28 23:25:22.393 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-28 23:34:35.004 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 00:24:54.742 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 00:40:08.980 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 01:29:41.988 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 04:00:53.642 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 04:01:41.048 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 04:39:28.732 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 06:29:11.653 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel
2021-10-29 08:48:51.006 [WRN] DISC: ignoring remote ChannelAnnouncement for own channel

I have a home-made python script that queries (read-only macaroon) some data every 20 mins. lnd logs this for it

2021-10-29 11:02:01.817 [DBG] RPCS: [listchannels] fetched 56 channels from DB
2021-10-29 11:02:01.943 [DBG] RPCS: [feereport]

What I really have a lot is log like this:

2021-10-29 11:01:36.836 [DBG] HSWC: ChannelLink(x:1782:0): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:03:13.797 [DBG] HSWC: ChannelLink(x:977:1): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:03:47.915 [DBG] HSWC: ChannelLink(x:2975:1): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:05:20.020 [DBG] HSWC: ChannelLink(x:2456:0): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:05:32.196 [DBG] HSWC: ChannelLink(x:991:0): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:05:36.318 [DBG] HSWC: ChannelLink(x:1069:0): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:07:27.723 [DBG] HSWC: ChannelLink(x:1157:1): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:07:56.844 [DBG] HSWC: ChannelLink(x:606:0): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:09:56.408 [DBG] HSWC: ChannelLink(x:2053:1): sampled fee rate for 3 block conf: 1055 sat/kw
2021-10-29 11:10:27.365 [DBG] HSWC: ChannelLink(x:1607:0): sampled fee rate for 3 block conf: 1055 sat/kw

Any help appreciated! Thanks!

@guggero
Copy link
Collaborator

guggero commented Oct 29, 2021

@feikede did you ever compact your DB (add db.bolt.auto-compact=true to the config and restart your node)? If yes, was there no significant reduction in the size?

@feikede
Copy link

feikede commented Oct 29, 2021

@guggero Thanks, I'll try it an report the result.

btw, whats also in the log is this (removed the channel points - all different)

2021-10-28 23:29:56.524 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(1:1)
2021-10-29 01:01:08.359 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(2:1)
2021-10-29 01:01:43.144 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(3:0)
2021-10-29 01:18:54.771 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(4:0)
2021-10-29 02:11:32.342 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(5:1)
2021-10-29 02:32:28.104 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(6:1)
2021-10-29 04:13:52.138 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(7:1)
2021-10-29 05:50:06.272 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(8:1)
2021-10-29 05:56:08.821 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(9:0)
2021-10-29 06:03:14.341 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 06:12:36.363 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 06:16:51.247 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 06:34:09.669 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 06:46:03.332 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 06:47:23.012 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 07:00:54.729 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 07:07:53.072 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 07:11:18.339 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 07:31:52.833 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 07:48:53.343 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 08:07:25.499 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 08:27:43.695 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 08:34:07.961 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 08:46:41.836 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 08:53:02.136 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 09:11:35.641 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 09:12:02.409 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 09:26:45.402 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)
2021-10-29 09:33:05.206 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 10:07:11.202 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 10:10:57.743 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:0)
2021-10-29 11:31:10.086 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(:1)

@feikede
Copy link

feikede commented Oct 29, 2021

@guggero Hm, I read about the compression. Maybe I do it another time because it's more of a workaround not a solution....

@guggero
Copy link
Collaborator

guggero commented Oct 29, 2021

It's not a compression. Not sure what you exactly read, but it's the only way the underlying database format (bbolt) actually returns disk space. Otherwise your DB will never shrink, only grow.

@feikede
Copy link

feikede commented Oct 29, 2021

@alexbosworth is this the requested state update count?

lncli listchannels | grep num_updates
            "num_updates": "302",
            "num_updates": "68",
            "num_updates": "133",
            "num_updates": "334",
            "num_updates": "903",
            "num_updates": "313",
            "num_updates": "69",
            "num_updates": "786",
            "num_updates": "273",
            "num_updates": "219",
            "num_updates": "461",
            "num_updates": "548",
            "num_updates": "412",
            "num_updates": "521",
            "num_updates": "516",
            "num_updates": "360",
            "num_updates": "366",
            "num_updates": "300",
            "num_updates": "79",
            "num_updates": "404",
            "num_updates": "606",
            "num_updates": "489",
            "num_updates": "250",
            "num_updates": "267",
            "num_updates": "318",
            "num_updates": "443",
            "num_updates": "251",
            "num_updates": "173",
            "num_updates": "373",
            "num_updates": "515",
            "num_updates": "154",
            "num_updates": "203",
            "num_updates": "240",
            "num_updates": "305",
            "num_updates": "180",
            "num_updates": "231",
            "num_updates": "537",
            "num_updates": "1450",
            "num_updates": "59",
            "num_updates": "86",
            "num_updates": "835",
            "num_updates": "386",
            "num_updates": "313",
            "num_updates": "1173",
            "num_updates": "364",
            "num_updates": "44",
            "num_updates": "497",
            "num_updates": "437",
            "num_updates": "19",
            "num_updates": "377",
            "num_updates": "265",
            "num_updates": "242",
            "num_updates": "422",
            "num_updates": "296",
            "num_updates": "271",
            "num_updates": "291",

Are these high numbers or is it just normal?

@feikede
Copy link

feikede commented Oct 29, 2021

@guggero You're the man! It's down from 25 GB to 140 MB... That's cool!

Now I'll look if it still grows by 1 GB per day. Big Thanks!

@itmanagerro
Copy link

So... it all comes back to the fact that channels are meant to be "short lived" as it accrues num_updates ?

            "num_updates": "1417",
            "num_updates": "1512",
            "num_updates": "291",
            "num_updates": "463992",
            "num_updates": "1166",
            "num_updates": "2278",
            "num_updates": "2521",
            "num_updates": "3712",
            "num_updates": "167699",
            "num_updates": "122107",
            "num_updates": "49965",
            "num_updates": "1726",
            "num_updates": "1788",
            "num_updates": "511",
            "num_updates": "32220",
            "num_updates": "622",
            "num_updates": "103602",
            "num_updates": "1251",
            "num_updates": "502",
            "num_updates": "117928",
            "num_updates": "1435",
            "num_updates": "436",
            "num_updates": "5293",
            "num_updates": "1011",
            "num_updates": "528",
            "num_updates": "7821",
            "num_updates": "225842",
            "num_updates": "1155",
            "num_updates": "9828",
            "num_updates": "3589",
            "num_updates": "39596",
            "num_updates": "234474",
            "num_updates": "1502",
            "num_updates": "24552",
            "num_updates": "4734",
            "num_updates": "3925",
            "num_updates": "467",
            "num_updates": "32004",
            "num_updates": "7826",
            "num_updates": "1359",
            "num_updates": "2060",
            "num_updates": "681",
            "num_updates": "6560",
            "num_updates": "11951",
            "num_updates": "138116",
            "num_updates": "1593",
            "num_updates": "765",
            "num_updates": "1785",
            "num_updates": "15579",
            "num_updates": "16909",
            "num_updates": "1354",
            "num_updates": "8498",
            "num_updates": "39639",
            "num_updates": "3054",
            "num_updates": "2093",
            "num_updates": "16356",

@guggero
Copy link
Collaborator

guggero commented Oct 29, 2021

So... it all comes back to the fact that channels are meant to be "short lived" as it accrues num_updates ?

Oh wow, that's a lot of updates! I would say with the current way the forwarding log is stored, it basically comes down to the number of updates, yes. What #4364 did was remove that forwarding log for closed channels.
What we probably also should do (to address your specific case) is remove parts of the forwarding log for open channels, to make the growth issue less pronounced. The main question here will be: what parts must be kept for safety (e.g. being able to react to an old state being published, or just for accounting reasons) and what parts could be thrown away after a certain time.

@itmanagerro
Copy link

itmanagerro commented Oct 29, 2021

yeah, well... I have quite a good routing flow and I'm still at 10M inbound deficit.
if anyone wants to help you can always open channels to my node:
023662f1db3d0527dab0869e30f183021db7dc44f6f2e32ece42dd124846c89ca1 (min 2169420 sats)

thank you @guggero for starting this discussion, I think we just got some drag here... to be honest, i foresaw this as many peers got down gradually.

One concern I have is that out of 8 peers which i am in direct contact, only 3 of them came back after the channel.db issues, most of them quit running node... as they realised the risks involved.

So, if I understand correctly, we might have to introduce some similar "taproot" technology to forwardings of active channels to try and then a similar "purge" for data, perhaps like bitcoind purge does?

@alexbosworth
Copy link
Contributor

@alexbosworth is this the requested state update count?

Are these high numbers or is it just normal?

Don't look too high so the cause of larger size is probably past payment failure states that are logged to the database

@alexbosworth
Copy link
Contributor

So... it all comes back to the fact that channels are meant to be "short lived" as it accrues num_updates ?

you would probably notice a difference in db size by closing and reopening the largest num update channels

@feikede
Copy link

feikede commented Oct 31, 2021

Looks like my channel.db problem is solved for now. It grows by about 5 MB per day now - far below the 1 GB two days before. Lessons learned for me: Add db.bolt.auto-compact=true to lnd.config and don't have 50 days uptime.
Thanks for your help!

@rafaelpac
Copy link

Mine is also under control with the compaction. It went from ~500MB to ~100MB and then stabilized around 150MB.

@dlaptev
Copy link

dlaptev commented Nov 1, 2021

What we probably also should do (to address your specific case) is remove parts of the forwarding log for open channels, to make the growth issue less pronounced.

That would be great, @guggero! My channel.db is at 9.3GB after compaction (and after DeleteAllPayments). I have more than 7 million updates cumulatively over ~800 active channels. I also have lots of closed channels, so I have high hopes for #4364, but I would appreciate it if you could indeed think about optimising open channels as well.

Stadicus pushed a commit to raspibolt/raspibolt that referenced this issue Dec 2, 2021
* Add channel.db size to Welcome script

Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705). 
This update adds a line to the LND section of the welcome script to show the channel.db size

For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)

* updated variable name

I changed the variable name from 'channeldb' to channel_db_size' to make it more meaningful.

* Update 20-raspibolt-welcome

Replaced "head -c4" with "awk '{print $1}'" as advised by Pablo [here](https://t.me/raspibolt/3521) as a most robust way of getting the relevant section of the output (thanks to him!)

* Changed title to 'Channel.db size:'
Stadicus pushed a commit to Stadicus/raspibolt-dev that referenced this issue Dec 2, 2021
* Add channel.db size to Welcome script

Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705). 
This update adds a line to the LND section of the welcome script to show the channel.db size

For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)

* updated variable name

I changed the variable name from 'channeldb' to channel_db_size' to make it more meaningful.

* Update 20-raspibolt-welcome

Replaced "head -c4" with "awk '{print $1}'" as advised by Pablo [here](https://t.me/raspibolt/3521) as a most robust way of getting the relevant section of the output (thanks to him!)

* Changed title to 'Channel.db size:'
Stadicus pushed a commit to Stadicus/raspibolt-dev that referenced this issue Dec 2, 2021
* Add channel.db size to Welcome script

Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705). 
This update adds a line to the LND section of the welcome script to show the channel.db size

For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)

* updated variable name

I changed the variable name from 'channeldb' to channel_db_size' to make it more meaningful.

* Update 20-raspibolt-welcome

Replaced "head -c4" with "awk '{print $1}'" as advised by Pablo [here](https://t.me/raspibolt/3521) as a most robust way of getting the relevant section of the output (thanks to him!)

* Changed title to 'Channel.db size:'
Stadicus pushed a commit to Stadicus/raspibolt-dev that referenced this issue Dec 19, 2021
* Add channel.db size to Welcome script

Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705). 
This update adds a line to the LND section of the welcome script to show the channel.db size

For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)

* updated variable name

I changed the variable name from 'channeldb' to channel_db_size' to make it more meaningful.

* Update 20-raspibolt-welcome

Replaced "head -c4" with "awk '{print $1}'" as advised by Pablo [here](https://t.me/raspibolt/3521) as a most robust way of getting the relevant section of the output (thanks to him!)

* Changed title to 'Channel.db size:'
Stadicus added a commit to raspibolt/raspibolt that referenced this issue Dec 21, 2021
* Monitoring channel.db size in welcome/motd script (#809)

* Add channel.db size to Welcome script

  Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705). 
This update adds a line to the LND section of the welcome script to show the channel.db size

  For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)

* updated variable name

  I changed the variable name from 'channeldb' to channel_db_size' to make it more meaningful.

* Update 20-raspibolt-welcome

  Replaced "head -c4" with "awk '{print $1}'" as advised by Pablo [here](https://t.me/raspibolt/3521) as a most robust way of getting the relevant section of the output (thanks to him!)

* Changed title to 'Channel.db size:'

* update MOTD welcome script for v3

The MOTD welcome script grew significantly over time and does
not follow best shell script practices.

This change updates it to reflect the setup of RaspiBolt v3 and
improves overall script quality.

* system overview: trap Ctrl-C for clean exit

When the user presses Ctrl-C, the color encoding might disrupt the
regular command line.

This change traps the Ctrl-C abort commands SIGINT and SIGTERM,
and exits the script cleanly.

* system overview: address pr feedback

Co-authored-by: Indra <86242283+VajraOfIndra@users.noreply.github.com>
@Darth-Coin
Copy link

Darth-Coin commented Feb 18, 2022

@guggero You're the man! It's down from 25 GB to 140 MB... That's cool!

Now I'll look if it still grows by 1 GB per day. Big Thanks!

I am trying the same,by adding that line in lnd.conf.
My actual size is almost 2GB.
How much time you wait to start lnd with that line active?
Right now I have like 2h and lnd is still not started.
Or you just restart again lnd?

Update: worth it! I obtained 500MB from 2GB size. In exactly 2h30m waiting.

@feikede
Copy link

feikede commented Feb 19, 2022

@Darth-Coin Well waited! When you Ctrl-C that process you'd have to emergency restore your channels - force close all ... never ctrl-C that :-)

@Darth-Coin
Copy link

@feikede Nooo, I didn't even touch it. Just looking to the logs normally. Running an Umbrel node (dockers). I was tempted once to stop the starting process, but move along immediately, I knew that if nothing other things appear wrong, I should just wait.
Worth it. 2+ hours waiting and compacting it finished.

I had 2 channels in force close after that, but not sure what could cause that. Maybe the long time waiting to start, I don't know.
We will need better tools to monitor or debug these force closures.

kristapsk pushed a commit to kristapsk/raspibolt-pulse that referenced this issue Aug 29, 2022
* Monitoring channel.db size in welcome/motd script (#809)

* Add channel.db size to Welcome script

  Monitoring the channel.db size is critical as a very large DB can lead to node crashes (and potentially DB corruption & SCB recovery) on low-resource devices like Raspberry Pis (e.g. lightningnetwork/lnd#5705).
This update adds a line to the LND section of the welcome script to show the channel.db size

  For now the size is obtained by: channeldb=$(du -h /mnt/ext/lnd/data/graph/mainnet/channel.db | head -c4)
It would be better to get a numerical value and then add some simple logical tests to colour the size green, orange or red when it reaches a certain large size (e.g. 500MN orange, 1GB red). What would be the best way to do this? (i.e. get a numerical value for the db size rather than a string like now)

* updated variable name

  I changed the variable name from 'channeldb' to channel_db_size' to make it more meaningful.

* Update 20-raspibolt-welcome

  Replaced "head -c4" with "awk '{print $1}'" as advised by Pablo [here](https://t.me/raspibolt/3521) as a most robust way of getting the relevant section of the output (thanks to him!)

* Changed title to 'Channel.db size:'

* update MOTD welcome script for v3

The MOTD welcome script grew significantly over time and does
not follow best shell script practices.

This change updates it to reflect the setup of RaspiBolt v3 and
improves overall script quality.

* system overview: trap Ctrl-C for clean exit

When the user presses Ctrl-C, the color encoding might disrupt the
regular command line.

This change traps the Ctrl-C abort commands SIGINT and SIGTERM,
and exits the script cleanly.

* system overview: address pr feedback

Co-authored-by: Indra <86242283+VajraOfIndra@users.noreply.github.com>
@guggero
Copy link
Collaborator

guggero commented Nov 3, 2022

This should be solved now after the update to 0.15.x that includes the optional revocation log migration.
We got reports of massive reduction in DB sizes after running with --db.prune-revocation and then re-starting with --db.bolt.auto-compact.

@guggero guggero closed this as completed Nov 3, 2022
@itmanagerro
Copy link

@guggero I confirm, from 27GB I am running with 780MB now...

@hihouhou
Copy link

101G -> 130M!!!! thank you!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants