Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqldb: add invoice schema and sql queries #7354

Merged
merged 10 commits into from
Jul 13, 2023

Conversation

positiveblue
Copy link
Contributor

@positiveblue positiveblue commented Jan 24, 2023

This is the first PR for #6288

In this PR we define the tables, indexes and queries for our invoice schema.

We aim to support sqlite/postgres out of the box. The sql code in the migration/query files need to work for both engines with the minimum changes and we want to avoid using specific functionalities like postgres SEQUENCEs.

We will use sqlc to generate go code from SQL. The tool uses the config defined in sqlc.yaml. By default we will set the engine to postgres but the generated code also works for sqlite.

The invoice schema can be divided in three parts:

  • Bolt11 invoices: All the common tables to keep track of created invoices, their htlcs and their payments.
  • AMP invoices: Most of the logic is shared with BOLT11 invoices, but we add a couple of tables to be able to keep track of the different payments for each AMP invoice (identified by the set_id) and the extra data of each htlc.
  • Invoice Events: This aims to support a new functionality that is not currently supported by the KV store. We are interested in keep a record of the events that the an invoice went through.

Because the queries are not generated at runtime, we need to write generic ones that allow us to filter/order using different filter parameters that can be set to NULL when we don't want to apply in a specific query.

For filtering, we use the pattern

(
    column = sqlc.narg('param_name') OR
    sqlc.narg('param_name') IS NULL
) AND (
    column = sqlc.narg('param_name_2') OR
    sqlc.narg('param_name_2') IS NULL
)

We also can also use

    CASE
        WHEN sqlc.narg('param_name')=TRUE THEN (custom_logic)
        ELSE TRUE 
    END

Ordering is a bit more complicated, because we cannot add ASC/DESC in a CASE-WHEN-THEN-ELSE so we need to unroll the cases

ORDER BY
    CASE
        WHEN sqlc.narg('ording_param') = FALSE THEN column1  
        ELSE NULL
    END ASC,
    CASE
        WHEN sqlc.narg('ordering_param') = TRUE  THEN column1  
        ELSE NULL
    END DESC

A good example of a real query using this patterns is FilterInvocies.

There are a couple of things that I am considering to add, but I would like to know others opinions.

  • Currently, row ids (INTEGER PRIMARY KEY) are mapped to int32. That means that we have a counter valid until 2,147,483,647 which is 1 invoice/second for the next ~68 years. In Go code we use uint64 for them. uint64 is not a supported sql type, but we may want to use the largest integer value (int64) for this.

  • Invoice payments are stored in their own table. Some invoice types can have a one invoice -> many peyments so a JOIN
    make sense in this case. However, it could be interesting to add information about the "latest" payment in the invoice table. In that way we would fetch the invoice and its payment information in one trip to the db, whenever it's possible.

  • Most of the invoices will have the same feature set, based on their type. Maybe we could add an "features_set_enum" instead of storing them for each invoice. That would also gives us the flexibility of having a different feature set for the same kind of invoice in case we add it only to new invoices.

Here is a diagram of the proposed schema.
invoice-sql-schema

NOTE: the structure of this PR has changed so some of the old comments apply now to other PRs. This one is only about the SQL schema.

@saubyk saubyk added invoices sql database Related to the database/storage of LND labels Jan 26, 2023
@saubyk saubyk added this to the v0.16.0 milestone Jan 26, 2023
@saubyk saubyk requested a review from sputn1ck January 26, 2023 18:25
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, impressive work! :)
Did a first pass-through in preparation of the review club next week.

sqldb/postgres_fixture.go Outdated Show resolved Hide resolved
sqldb/postgres.go Outdated Show resolved Hide resolved
sqldb/sqldb.go Outdated Show resolved Hide resolved
@@ -0,0 +1,33 @@
//go:build test_db_postgres
// +build test_db_postgres

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could you explain why we are naming the test files with the prefix test instead of the suffix test?

Copy link
Contributor Author

@positiveblue positiveblue May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, comes from taproot_assets but as you noticed it would be good to change it to match the Go's good practices

This feedback will be added in #7343

sqldb/sqlite.go Outdated
@@ -0,0 +1,112 @@
package sqldb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would it be reasonable to only include this file when building for example with the kvdb_sqlite package, this came to my mind because later in the test files you mark the with specific tags like test_db_postgres or test_db_sqlite so was wondering why in the test code but not here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@saubyk saubyk requested a review from bhandras January 31, 2023 18:26
@sputn1ck sputn1ck self-requested a review February 2, 2023 11:16
Copy link
Collaborator

@sputn1ck sputn1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @positiveblue . Added some qs and need to research the SEQUENCES stuff a bit

sqlc.yaml Show resolved Hide resolved
postgresFS := newReplacerFS(sqlSchemas, map[string]string{
"BLOB": "BYTEA",
"INTEGER PRIMARY KEY": "SERIAL PRIMARY KEY",
"TIMESTAMP": "TIMESTAMP WITHOUT TIME ZONE",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the implications of storing with or without the time zone. E.g. assuming I'm running my node in my home in utc+0 and later migrating that to a remote server that is in a different timezone, my gut feeling is that would be a conflict here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to use no timezone until we make sure to display all printed timestamps as UTC or alternatively transform to the user's timezone (when printing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sputn1ck your gut feeling is right here: time/timezone == software engineering headache

As Andras said, the idea is to not use timezone (and everything is sitll consistent) until you want to show it to the user

last_amount_paid_msat BIGINT NOT NULL,

is_amp BOOLEAN NOT NULL,
is_hodl BOOLEAN NOT NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hodl 😄

Copy link
Contributor Author

@positiveblue positiveblue May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha I checked it in our code and we call them hodl invoices

I think it's they real name hold-invoices though...

sqldb/sqlc/queries/invoices.sql Outdated Show resolved Hide resolved
@sputn1ck sputn1ck self-requested a review February 2, 2023 11:59
Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool PR, great work so far @positiveblue 🚀 🚀

sqldb/sqlc/queries/sequences.sql Outdated Show resolved Hide resolved
postgresFS := newReplacerFS(sqlSchemas, map[string]string{
"BLOB": "BYTEA",
"INTEGER PRIMARY KEY": "SERIAL PRIMARY KEY",
"TIMESTAMP": "TIMESTAMP WITHOUT TIME ZONE",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to use no timezone until we make sure to display all printed timestamps as UTC or alternatively transform to the user's timezone (when printing).

sqldb/sqlite.go Outdated
@@ -0,0 +1,112 @@
package sqldb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

sqldb/sqlc/migrations/000002_invoices.up.sql Outdated Show resolved Hide resolved
sqldb/sqlc/migrations/000002_invoices.up.sql Outdated Show resolved Hide resolved
@saubyk saubyk modified the milestones: v0.16.0, v0.16.1 Feb 14, 2023
@saubyk saubyk modified the milestones: v0.16.1, v0.17.0 Mar 13, 2023
@positiveblue positiveblue force-pushed the invoice-sql-schema branch 2 times, most recently from 0fd6898 to c9c2e47 Compare June 8, 2023 00:05
@positiveblue positiveblue force-pushed the invoice-sql-schema branch 2 times, most recently from c902354 to c9d9247 Compare June 10, 2023 00:11
@positiveblue positiveblue changed the title [WIP] sqldb: add schema for invoices (table and queries) sqldb: add invoice schema and sql queries Jun 10, 2023
@positiveblue positiveblue changed the base branch from master to 0-17-0-staging June 10, 2023 01:00
@positiveblue positiveblue changed the base branch from 0-17-0-staging to master June 10, 2023 01:00
@positiveblue positiveblue marked this pull request as ready for review June 10, 2023 01:01
Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good to me, awesome work @positiveblue 🥇

@positiveblue positiveblue force-pushed the invoice-sql-schema branch 2 times, most recently from ad5df9e to 3297bf7 Compare June 22, 2023 19:30
@lightninglabs-deploy
Copy link

@bhandras: review reminder
@Roasbeef: review reminder
@sputn1ck: review reminder
@ziggie1984: review reminder

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR 🤓, learned a lot how to cleanly setup sql schemas with sqlc.

Just had some nits and questions.

Regarding your remarks:

  1. I would prefer int64 for the primary key, especially because sql dbs can easily deployed on other hosts and compacted on the fly so I consider the additonal space not as critical. On the other hand a migration later one is not a big deal as well?

  2. So regarding the payment information when fetching an invoice, do I understand it right that you propose just adding the latest successful payment rather then all when using MPP or AMP for example?

  3. Very much in favour of making the feature set space requirement more efficient but also keep the possibility to have it separate for each invoice just in case we want to be more flexible?

amount_msat BIGINT NOT NULL,

-- Delta to use for the time-lock of the CLTV extended to the final hop.
cltv_delta INTEGER NOT NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: For Bolt12 invoices there will be a "cltv_delta" included in the blinded path, however there will be no specific final_hop delta as in bolt11 invoices, there might be still a receiver required delta but it will be handled differently I think. So my question is, maybe we should allow a NULL value then?

sqldb/sqlc/migrations/000001_invoices.up.sql Show resolved Hide resolved
sqldb/sqlc/migrations/000001_invoices.up.sql Outdated Show resolved Hide resolved
is_keysend BOOLEAN NOT NULL,

-- Timestamp of when this invoice was created.
created_at TIMESTAMP NOT NULL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for every update there will be a corresponding invoice_event?

id INTEGER PRIMARY KEY,

-- The uint64 htlc id stored as text.
htlc_id TEXT NOT NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why the set_id is stored as bytes but he htlc_id we go for text ? Is text more efficient that BLOB?

-- not have leave this empty, like Keysends.
payment_request TEXT UNIQUE,

-- The invoice state.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the next commit you filter for pending_only which only hits for 0 and 3, would it make sense then to define the states in the comments here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave that for the application layer (Go) given that "pending_only"/"pending" is not a state by itself but an invoice is considered pending when its state is "open"/"accepted"


-- name: SelectAMPInvoicePayments :many
SELECT aip.*, ip.*
FROM amp_invoice_payments aip LEFT JOIN invoice_payments ip ON aip.settled_index = ip.id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Could you elaborate why we want to show information from the invoice_payments table here? Will those not be equal to the ones in the amp_invoice_payments table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sql generates a struct where fields in both tables get a standard name, fore example:

type SelectAMPInvoicePaymentsRow struct {
	SetID          []byte
	State          int16
	CreatedAt      time.Time
	SettledIndex   sql.NullInt32
	InvoiceID      int32
	ID             sql.NullInt32
	SettledAt      sql.NullTime
	AmountPaidMsat sql.NullInt64
	InvoiceID_2    sql.NullInt32
}

We could "avoid" the overlap and specify a .* for one of the tables and only the distinct columns in the other table. The problem is that next time that we add a column to the second table, we need to update this query too. Because it is not much data, I think it's better to leave it as is.

Copy link
Collaborator

@ziggie1984 ziggie1984 Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, now I understand thanks.

sqldb/sqlc/migrations/000003_invoice_events.up.sql Outdated Show resolved Hide resolved
CREATE TABLE IF NOT EXISTS invoice_event_types(
id INTEGER PRIMARY KEY,

description TEXT NOT NULL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: We do not use ENUM types here because we would need to redefine them in the go code which makes things more complicated than having everything in one place ?

Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending @ziggie1984's comments.
Great to see this coming together, will be super nice to use SQL finally! 🚀 🚀

sqlc is a tool that generates fully type-safe idiomatic code from SQL.
The result is Go code can then used execute the queries in the database.

The noraml flow looks like:
- The developer write some sql that will update the schema in the
  database: new tables, indices, etc
- The developer updates the set of queries that will use the new schema.
- `sqlc` generates type-safe interfaces to those queries.
- The developer can then write application code that calls the methods
  generated by sqlc.

The tool configuration needs to live in the repo's root and its name is
`sqlc.yaml`.

LND will support out of the box sqlite and postgres. The sql code needs to
be (almost) the same for both engines, so we cannot use custom functions
like `ANY` in postgres.

The SQLC config file needs to define what is the target engine, we will
set postgres but the generated code can be executed by sqlite too.

In some specific cases, we will `match and replace` some sql lines to be
sure the table definitions are valid for the targeted engine.
This is the schema for "ordinal" BOLT11 invoices.

The invoices table aims to keep an entry for each invoice, BOLT11 or not,
that will be supported.

Invoice related HTLCs will be stored in a separete table than forwarded
htlcs.

SQLite does not support `SEQUENCE`. We achieve atomic autoincrementals
using primary keys with autoincrement/serial. An invoice `AddIndex`
translates to `invoices(id)` while `SettleIndex` is `invoice_payments(id)`.
Set of queries to deal with invoices. A couple of things to take into
account:

    - Because the queries are not rewritten at runtime, we cannot have a
      generic `INSERT` with different tuples.
    - Because the queries are not rewritten at runtime, we cannot build
      one query with only the filters that matter for that queries. The
      two options are a combinatorial approach (a new query for every
      permutation) or a generic query using the pattern

          ```
            SELECT *
            FROM table
            WHERE (
                -- Can be read as:
                -- Match the filter 1 value if filter_1 != nil
                column_1 >= sqlc.narg('filter_1') OR
                sqlc.narg('filter_1') IS NULL
            ) AND (
                column_2 >= sqlc.narg('filter_2') OR
                sqlc.narg('filter_2') IS NULL
            ) ...
          ```
Schema for AMP invocies.

AMP invoices can be paid multiple times and each payment to an AMP invoice
is identified by a `set_id`.

The A in AMP stands for `Atomic`. All the htlcs belonging to the same
AMP payment (share the same set_id) will be resolved at the same time
with the same result: settled/canceled.

AMP invoices do not have an "invoice preimage". Instead, each htcl has
its own hash/preimage. When a new htlc is added the hash for that htlc
is attached to it. When all the htlcs of a set_id have been received we
are able to compute the preimage for each one of them.
run `make sqlc`

All the code in this commit is auto-generated.
Copy link
Collaborator

@sputn1ck sputn1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @positiveblue 🥇

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Job 🎉

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR! This series beings a marked improvement for the general persistence situation within lnd.

Left some non-blocking comments that can be followed up on later. Some may end up changing the schema, but we can do so in a breaking manner, as people can't yet use the definitions, etc.

LGTM 🌪️


-- The encoded payment request for this invoice. Some invoice types may
-- not have leave this empty, like Keysends.
payment_request TEXT UNIQUE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a bit of room to further decouple things here. Eg: in the future a logical invoice will have both a BOLT 11 and BOLT 12 invoice format, along w/ w/e might arise in the future. So we could spin this out into another table that then references this main invoice table.

Similarly, not every invoice will have a 1:1 payment hash to pre-image relationship. Eg: AMP invoices only have a payment_addr as the hash+preimage are generated by the sender.

state SMALLINT NOT NULL,

-- The accumulated value of all the htlcs settled for this invoice.
amount_paid_msat BIGINT NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in theory could be derived via a view that sums over the set of payments to the invoice itself. Another factor here is the set of active HTLCs as well.

-- The accumulated value of all the htlcs settled for this invoice.
amount_paid_msat BIGINT NOT NULL,

-- This field will be true for AMP invoices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then the general invoice feature table was dropped in favor of this/

CREATE INDEX IF NOT EXISTS invoice_htlc_custom_records_htlc_id_idx ON invoice_htlc_custom_records(htlc_id);

-- invoice_payments contains the information of a settled invoice payment.
CREATE TABLE IF NOT EXISTS invoice_payments (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

In the future we can add another event table here for stuff like: cancel, accept, hodl, etc.

@@ -0,0 +1,54 @@
-- amp_invoices_payments
CREATE TABLE IF NOT EXISTS amp_invoice_payments (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also nice to have easy access to the amount here as well? Then can just read this out rather than needing to also fetch+sum over the set of invoice HTLCs.

SELECT aip.*, ip.*
FROM amp_invoice_payments aip LEFT JOIN invoice_payments ip ON aip.settled_index = ip.id
WHERE (
set_id = sqlc.narg('set_id') OR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the way the queries exist as is, I think what you also want here is based on the payment_addr as well. Then given that, you can fetch them all for a payment addr w/o an intermediate look up for the invoice itself.

CREATE TABLE IF NOT EXISTS invoice_event_types(
id INTEGER PRIMARY KEY,

description TEXT NOT NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future if we want very strict typing here, we can make a set "enum values" table, then point to that itself (all items unique, essentially pre-populated).

You can also do stuff like:

type CHECK(type in ('THIS', THAT'))

etc

@Roasbeef Roasbeef merged commit acecb12 into lightningnetwork:master Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Related to the database/storage of LND invoices sql
Projects
Status: High Priority
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants