Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Lightweight lightningd.sqlite3 database #4824

Closed
jsarenik opened this issue Sep 27, 2021 · 21 comments
Closed

Feature: Lightweight lightningd.sqlite3 database #4824

jsarenik opened this issue Sep 27, 2021 · 21 comments

Comments

@jsarenik
Copy link
Collaborator

jsarenik commented Sep 27, 2021

Description

As discussed in one of the recent Monday meetings, I would like to know what are the minimum data in lightningd.sqlite3 file in order to keep the basic functionality of the node and all its channels, but being able to forget the history of all the forwards at least.

My node is routing for free (0 base and 0 proportional) no matter what. So I think all that is needed in the DB should be some channel states and balances. Anything else?

Currently I am experimenting with wiping out the contents of forwarded_payments and channel_htlcs tables.

The above reduces the size of lightningd.sqlite3 file significantly (from 380MB to 120MB). What other table contents would be safe to delete?

@cdecker
Copy link
Member

cdecker commented Sep 28, 2021

⚠ A warning to users: Never ever manually edit the database unless developers tell you to, you may risk losing funds. ⚠

Notice that in order to ensure consistency you'll have to run PRAGMA foreign_keys = ON; to ensure all foreign key constraints are respected, and make sure that you know which references may result in a cascade on delete.

In particular deleting all of channel_htlcs will break your channels in case of an uncooperative peer: it contains the information to claim or timeout HTLCs associated with a channel, and without that data you're trusting that the peer will not cheat (making LN kind of useless...). Please only ever delete entries from channel_htlcs that belong to a channel that has fully been closed (that includes any pending HTLCs that might still waiting for their timeout to expire).

@jsarenik jsarenik changed the title Feature: A stateless use-case Feature: A(n almost) stateless use-case Sep 29, 2021
@jsarenik
Copy link
Collaborator Author

@cdecker Thank you.

Can there be any in-flight HTLCs when the daemon has stopped? Would it be possible to add an option which would remove an obsolete entry (as soon as it is known that if it stays in the database, its use would be only as a historical record)?

@jsarenik
Copy link
Collaborator Author

jsarenik commented Oct 4, 2021

Numbers of rows for all the table records (not taking into account the row length yet) for my current lightningd.sqlite3 db on mainnet are:

$ sqlite3 lightningd.sqlite.copy .dump | grep -o '^INSERT INTO [a-z_]\+' | uniq -c
      1 INSERT INTO version
     36 INSERT INTO outputs
      6 INSERT INTO vars
     10 INSERT INTO shachains
    135 INSERT INTO shachain_known
     10 INSERT INTO peers
     31 INSERT INTO channels
     62 INSERT INTO channel_configs
 110588 INSERT INTO channel_htlcs
     28 INSERT INTO invoices
     16 INSERT INTO db_upgrades
  78633 INSERT INTO blocks
 772995 INSERT INTO utxoset
     89 INSERT INTO transactions
  71679 INSERT INTO forwarded_payments
     75 INSERT INTO transaction_annotations
    207 INSERT INTO payments
     32 INSERT INTO channel_feerates
     29 INSERT INTO penalty_bases
    114 INSERT INTO channel_state_changes
      9 INSERT INTO offers
     31 INSERT INTO channel_blockheights

@jsarenik jsarenik changed the title Feature: A(n almost) stateless use-case Feature: Lightweight lightningd.sqlite3 database Oct 4, 2021
@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Oct 5, 2021

Can there be any in-flight HTLCs when the daemon has stopped?

Yes, definitely; the daemon is written such that it expects to be SIGKILLed at any time, including states where HTLCs are in-flight.

HTLCs being in-flight are a normal part of operation and that state can persist for days; we cannot delay daemon shutdown until there are no in-flight HTLCs when in-flight HTLCs can persist for days at a time.

Indeed the only time we reliably "drain" all in-flight HTLCs on a channel are when we mutually close, and mutual closes can be delayed for days because of that, which is why we have a unilateraltimeout on close command; we just drop unilaterally if mutual close takes too long due to (among other reasons) in-flight HTLCs not getting drained.

Would it be possible to add an option which would remove an obsolete entry (as soon as it is known that if it stays in the database, its use would be only as a historical record)?

There are no obsolete HTLCs until you close the channel, unless you trust your peer(s) not to hurt you EVER.

The attack to be done as follows:

  • A peer can offer an HTLC to you that ultimately fails, involving almost all the channel capacity.
  • The peer can then arrange to have most of the channel capacity to you afterwards.
  • You delete the old failed HTLC on the assumption that it is obsolete and useless and to save space.
  • The peer takes the old state with the old HTLC and uses that onchain.
    • The peer loses its "main" output on that state, but you had most of the channel capacity anyway so it loses little.
    • The HTLC details are unknown to you, so you cannot revoke it (you deleted the information!)

The attacker cannot steal from you (if it publishes the HTLC claim, you can revoke the claim, since the revocation keys are stored in an O(1) structure called the shachain), but if the attacker is willing to pay so that you are hurt financially, then the attacker simply never publishes the HTLC claim. Then the output is unspendable forever, and you lose access to part of your funds (equal to the value of the "obsolete" HTLC you deleted). They do not win, but you definitely lose, and potentially lose more than the attacker does.

Only when the channel is closed and done with can you safely delete old HTLCs involving that channel (and IIRC this is done automatically).

This will change with Decker-Russell-Osuntokun, but current channels require all old HTLCs to be retained for safety. If you want to delete them, be warned that you risk funds loss. Poon-Dryja can be modified to also not require retaining old HTLCs, but that modification requires an additional 1-input-1-output transaction on unilateral closes and is non-standard, and if you need to change the standard you might as well hold out for Decker-Russell-Osuntokun.

@jsarenik
Copy link
Collaborator Author

jsarenik commented Oct 5, 2021

Thank you @ZmnSCPxj for this in-depth explanation. Greatly appreciated! I have a few more questions now.

Does the omission mean I can safely delete the contents of forwarded_payments?

Why does the table utxoset have so many entries? I thought addresses can be easily generated by descriptors using the incremental ids…

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Oct 5, 2021

Does the omission mean I can safely delete the contents of forwarded_payments?

Not sure; probably better to ask @rustyrussell, as the HTLC state machine is foreign to me and I believe that is what uses that table.

Why does the table utxoset have so many entries?

Because our design of UTXO tracking sucks? (^^.;;;;;)v See the confusion I have in #3858 (comment) (EDIT: or this one: #3858 (comment) ). Our spending-ness trackers were all kinda tacked on variously in various bits and pieces. Off the top of my head, I think utxoset is needed to keep track of what outputs a gossiped channel can claim to have; if it is not in utxoset then the gossip is rejected and we will not have a full picture of the network topology. Or I might be misremembering which UTXO tracking mechanism does that, we have like 4 or 5 different ways to monitor UTXOs/addresses/whatever-onchain-thing. All of them kinda assume we can grab the entire contents of a Bitcoin block when we learn about it, too....

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Oct 6, 2021

Okay, looking over the code again, it looks like utxoset is indeed the table that is used to keep track of which UTXOs a gossiped channel can claim to be anchored with. This is not a complete UTXO set --- we only keep track of P2WSH outputs (and should probably start keeping track of P2TR, sigh) after Lightning became cool, since those are the only ones that can be channels. What bitcoind should really do would be to expose a UTXO set query API, like the Electrum protocol does.

There are vague proposals to include UTXO set proofs in gossip (basically a Merkle proof of inclusion of the tx in a block, plus the tx itself so we can check the txo), but that requires two proofs --- proof the TXO was created, and proof it was spent, and liars have an incentive to suppress the propagation of the proof-it-was-spent. But if LN can switch to that then there is no need to track the UTXO set oursleves, and if everyone switches to P2TR then the UTXO set we have to keep track is going to be very large, about as large as the full UTXO set of the bitcoind. But that will take a long while, I think...

@ZmnSCPxj
Copy link
Collaborator

Re: utxoset, it seems to be a substantial part of the database.

So I got one of my regular db backups and did the following:

$ sqlite3 ${BACKUP}
sqlite3> VACUUM INTO "tmp.sqlite3";
sqlite3> .quit
$ sqlite3 tmp.sqlite3
sqlite3> DROP TABLE 'utxoset';
sqlite3> VACUUM INTO "tmp2.sqlite3";
sqlite3> .quit
$ ls -l tmp.sqlite3 tmp2.sqlite3
-rw-r--r--  1 zmnscpxj zmnscpxj 102178816 Oct 12 02:54 tmp2.sqlite3
-rw-r--r--  1 zmnscpxj zmnscpxj 200220672 Oct 12 02:54 tmp.sqlite3

As you can see, that is a fairly large drop in size (note that due to the way SQLITE3 manages storage), not doing a VACUUM means that even after dropping the table, tmp.sqlite3 size did not change from when the table still existed). That is almost 50% drop in uncompressed size.

(Do NOT do this in production, note the above that this was a backup copy and that I made a separate copy of that copy before doing anything.)

As noted, utxoset is used to filter out gossip. Gossip about channels that are not, in fact, channels (i.e. do not have an unspent funding outpoint) is filtered out by looking at this table.

However, we can observe that:

  • Only forwarding nodes want their channels gossiped (because for some reason unpublished channels are popular, in my opinion unpublished channels delenda est)
  • Forwarding nodes want their channels gossiped as soon as possible.
  • Most P2WSH outputs (and in the future, P2TR) are not going to be Lightning network published channels (well, it would be nice if that happened, but...).

Perhaps we can do something like:

  • Add a "known to be channel" column to the utxoset table (or make a new table foreign keyed ON DELETE CASCADE on the utxoset table like proper DB design).
  • If we get a gossip message about a channel and it is found in the utxoset table, set the above flag for that entry.
  • Entries with the flag cleared and whose blockheight is older than say 1000 blocks are pruned from the utxoset table.
    • This should currently be safe since no other table has a foreign key referencing utxoset.
    • blockheight is the first key in the short_channel_id index so this should be optimized reasonably by the SQLITE3 SQL optimizer. Indeed an experimental EXPLAIN SELECT * FROM 'utxoset' WHERE blockheight < 700000; suggests it does look at the short_channel_id index.
  • Similarly, entries with spendheight older than say 1000 blocks are also pruned from the utxoset table.
    • We also have a separate index utxoset_spend for the spendheight, so this deletion should be optimized as well.

(Note that the 50% reduction in size is an asymptote: we have to keep track of channels, and thus some utxoset entries must be retained and not pruned even as their blockheight is left deep, and if Lightning became so popular that nearly every P2WSH/P2TR is a Lightning published channel, then this earns no savings and increases size due to the extra column; but note that P2TR is intended to hide more uses of Bitcoin in the future (including the current very very common singlesig case), thus it seems very unlikely that every P2TR will be a Lightning published channel.)

Thoughts @rustyrussell ?

@jsarenik
Copy link
Collaborator Author

jsarenik commented Oct 12, 2021

Thank you @ZmnSCPxj ! I really appreciate your time and effort here.

The production-safe (VACUUM-only), which I run now on my backups (i.e. lightningd.sqlite3 files copied elsewhere when lightningd was shut down (using lightning-cli stop and not restarted afterwards by any "process manager")) should be:

PRAGMA foreign_keys = ON;
VACUUM;

@cdecker
Copy link
Member

cdecker commented Oct 15, 2021

Just wanted to mention that we merged #4850 which is a first step towards reducing the DB size. It drops the large encrypted blobs as soon as HTLCs are definitely settled. According to rusty this already takes a huge chunk out of his database, without impacting the operation of the node.

@jsarenik
Copy link
Collaborator Author

Thank you @cdecker !!! :-)

@jsarenik
Copy link
Collaborator Author

And @rustyrussell of course! :-) Thank you!

@jsarenik jsarenik closed this as completed Nov 4, 2021
@jsarenik
Copy link
Collaborator Author

jsarenik commented Dec 13, 2021

My lightningd.sqlite database is slowly getting fat again.

In backup I keep only the vacuumed version of it (179M according to ls -lh at the moment). The non-vacuumed was originally lightened from 380M to something around 150M after Rusty's changes. Now it is already 200M big and growing.

There was recently no visible activity (no noticable balance changes), but probably many failed forwarding attempts, though cleaning forwarded_payments table on a backup file releases merely 4M to make the resulting vacuumed file 173M big.

Ideas:

@jsarenik jsarenik reopened this Dec 13, 2021
@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Dec 13, 2021

is the partial index from 3433ff5 being cleaned up?

This should be automatically managed by SQLITE3; if not, then that would be a bug in SQLITE3. Note that this is not the only index, either; there are other indexes as well, probably larger.

Possibly what is happening is fragmentation of usable space, which VACUUM should fix. Maybe the major effect of the 2bb13ba patch series was not to clean up the database but the VACUUM done afterwards. Hmm.

My snapshotted backups are smaller than my "main" file as well by a similar degree as youquoted, and I do VACUUM on the snapshots before archiving, too. Maybe a once-a-day VACUUM would help?

@ZmnSCPxj
Copy link
Collaborator

Re growth, also remember the earlier discussion; utxoset takes up 50% of the storage. That is likely to just keep growing in the future despite our best efforts. We could probably prune it, as I described before, and that should help keep db sizes down when there is little activity on your channels. Otherwise even without channel activity it is likely that the utxoset table will just keep growing...

@jsarenik
Copy link
Collaborator Author

Thank you @ZmnSCPxj! If it is just utxoset growing, I am fine with that. Just wanted to make sure it is growing and to add some follow-up here.

@ZmnSCPxj
Copy link
Collaborator

@jsarenik SQLITE3 has a built-in autovacuum -- you could try this on your lightningd.sqlite3 while lightningd is stopped, which should be safe but caveat emptor and this is at your own risk and I am not liable for loss of funds and etc etc

$ systemctl stop lightningd # or whatever
$ sqlite3 /path/to/lightningd.sqlite3
sqlite3> pragma auto_vacuum=1;
sqlite3> vacuum;
sqlite3> .quit

You could try that and see if it slows down the growth or not.

@jsarenik
Copy link
Collaborator Author

@ZmnSCPxj Thank you for sharing! Everything works well for me.

@ZmnSCPxj
Copy link
Collaborator

Okay. Let us know if it seems to slow down the growth of the file; I am writing a doc/PERFORMANCE.md file right now about tuning your node for performance.

@ZmnSCPxj
Copy link
Collaborator

@jsarenik also note that auto-vacuum is not a "pure" optimization, it is a tradeoff: you get a smaller database file in general, but queries (both read-only ones and updates) get slower: (SQLITE3 has to put back-references in each stored row so it can immediately track deleted rows, it has to periodically vacuum, and it is now more likely that tables get fragmented due to deleted rows being moved to the end of the file (i.e. rows of other tables get moved into the space of the deleted row of the table). But you were asking for smaller db size, so...

@jsarenik
Copy link
Collaborator Author

@ZmnSCPxj Thanks for explanation. Yes, that is quite important. I will revert back to how it was originally (and do a manual vacuum on backup files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants