Unified OUTPOINT_TO_UTXO_ENTRY table #3915

partialord · 2024-08-27T20:38:16Z

Replace the OUTPOINT_TO_TXOUT, OUTPOINT_TO_SAT_RANGES, and SATPOINT_TO_SEQUENCE_NUMBER tables with a new OUTPOINT_TO_UTXO_ENTRY table. This saves space by avoiding redundant key storage, and makes updates quicker since we can write all of an outpoint's data at once.

It also simplifies caching. We used to have utxo_cache for OUTPOINT_TO_TXOUT and range_cache for OUTPOINT_TO_SAT_RANGES, while SATPOINT_TO_SEQUENCE_NUMBER was uncached; now, there's a single unified utxo_cache which covers all three. Additionally, since SCRIPT_PUBKEY_TO_OUTPOINT and SEQUENCE_NUMBER_TO_SATPOINT are storing data which is already included in the cache (just with the keys and values swapped), we get to cache those tables too for free.

The combined effect is that an index with --index-sats and --index-addresses is now about 54 GB smaller and twice as fast to build.

As far as I know, no one has ever used this, and it would take extra work to support it with the new OUTPOINT_TO_UTXO_ENTRY table (which greatly improves performance). Just remove the option for now.

This code is unused, now that we have the newer (and much nicer) pointer-handling code on lines 190-194.

is_output_spent() was assuming that OUTPOINT_TO_TXOUT contains all utxos when --index-addresses is specified, but that's not correct; only --index-sats guarantees that. This was causing is_output_spent() to return incorrect values for outputs that were spent before block 767430, when using --index-addresses but not --index-sats.

Replace the OUTPOINT_TO_TXOUT, OUTPOINT_TO_SAT_RANGES, and SATPOINT_TO_SEQUENCE_NUMBER tables with a new OUTPOINT_TO_UTXO_ENTRY table. This saves space by avoiding redundant key storage, and makes updates quicker since we can write all of an outpoint's data at once. It also simplifies caching. We used to have utxo_cache for OUTPOINT_TO_TXOUT and range_cache for OUTPOINT_TO_SAT_RANGES, while SATPOINT_TO_SEQUENCE_NUMBER was uncached; now, there's a single unified utxo_cache which covers all three. Additionally, since SCRIPT_PUBKEY_TO_OUTPOINT and SEQUENCE_NUMBER_TO_SATPOINT are storing data which is already included in the cache (just with the keys and values swapped), we get to cache those tables too for free. The combined effect is that an index with --index-sats and --index-addresses is now about 54 GB smaller and twice as fast to build.

With the unified OUTPOINT_TO_UTXO_ENTRY table, these stats aren't useful; they're either identical to the stats in the regular "Committing at block height ..." message (if you're using --index-sats), or zero (if you're not).

I think no reason not to, since the schema version is changing anyways.

I think millis is probably too much precision unless doing performance work.

We have an `Entry` trait which uses "value" for the low-level value in redb, and `entry` for the high-level value parsed from the low-level value. Use that for UtxoEntry.

raphjaph

Very nice!

casey

I made small cosmetic changes to this until I convinced myself that it's fine.

partialord · 2024-08-29T22:06:43Z

Nice! These changes look good to me.

The new usage of "entry"/"value" doesn't quite seem consistent, though. A UtxoEntry isn't an entry in the sense of the Entry trait; it's an entry in the sense of InscriptionEntry or RuneEntry, whose low-level types are (quite logically) called InscriptionEntryValue and RuneEntryValue.

So if we want to be consistent, I think the types should be UtxoEntry, UtxoEntryValue, and UtxoEntryValueBuf. Or for shorter names, we could call this a Utxo rather than a UtxoEntry, and then the names would be Utxo, UtxoValue, and UtxoValueBuf. I think that might help with variable naming, too -- all the "utxo_entry" variables I had which are now "value" would instead become "utxo", which is a lot more informative.

casey · 2024-08-30T17:07:59Z

Yah, good point. Utxo feels a bit ambiguous though, since there are a bunch of things in the codebase that you might refer to as a "utxo". I'm tempted to go with UtxoEntry, UtxoEntryValue, and UtxoEntryBuilder, Just because UtxoEntryValueBuf is heinous T_T. What do you think?

partialord · 2024-08-31T00:14:34Z

Yeah, I agree that Utxo and UtxoValue are ambiguous.

Personally I feel like UtxoEntryValue is already kind of heinous, though. I think the "Value" convention works for the existing types mostly because the Value types are so rarely used: the runes code is entirely based on RuneEntry, and you never see RuneEntryValue except in the table definition. So even though RuneEntryValue is a relatively long and uninformative name, it successfully communicates "this is a variant version of RuneEntry", and that's good enough for its very minor usage.

But UtxoEntry isn't like that at all. The "high-level" UtxoEntry type isn't even usable by itself -- it just stores references into the low-level value! The low-level type is the one you actually use everywhere. And as we optimize ord, other types will probably start to follow this pattern too, because it's more efficient; converting all the data into a new format on every load and store is pretty expensive in terms of memory allocations and copies. In particular, keeping sat ranges in the low-level format is high on my list of future optimizations.

For type names, my general feeling is it's much more important to communicate what data the type contains than it is to describe exactly what format that data is stored in. This is why the "Value" convention works at all: the word "Value" doesn't communicate much, but that's fine, because a RuneEntry and a RuneEntryValue store the same data and we don't really care about the details of the representation.

So I think my preference is still to use UtxoEntry and UtxoEntryBuf for the low-level types. In particular, this successfully communicates that these are the types you're supposed to use: when you load a rune entry from the database, you want to store it as a RuneEntry, and when you load a UTXO entry, you want to store it as a UtxoEntry (or UtxoEntryBuf). The other versions of each type are just transient things you convert through but don't keep around.

I'd love to find a better name than ParsedUtxoEntry for the high-level type, but also I feel like that name isn't particularly important for exactly the same reason that "RuneEntryValue" isn't that important: it doesn't get used much.

partialord · 2024-08-31T01:15:29Z

Oh yeah, another option that might be worth considering: we could get rid of ParsedUtxoEntry entirely, and just have accessors on the low-level UtxoEntry to retrieve the individual fields. I suspect this is a wash in terms of performance -- it's slightly faster in some cases (because we don't have to parse the whole entry to retrieve a single field) and slightly slower in others (because we have to reparse the beginning to access the middle), but I'm guessing the difference is insignificant in both cases.

This was actually my original plan during development. It makes field access a little nicer for the user, since they don't have to call .parse(), and it avoids the need for separate high-level and low-level types entirely. I switched from this to ParsedUtxoEntry in order to make the parsing code a bit easier to read (because parsing the whole entry at once is simpler than parsing it incrementally), but I think I might actually be able to make an incremental version that's as easy to read as the current code.

Alternatively, if it turns out that the parser really needs to be stateful to be efficient, we could have UtxoEntry, UtxoEntryBuf, and UtxoEntryParser. This avoids the redundant parsing work, and from the user's point of view it would work just like the current code: you call .parse() to get a UtxoEntryParser, and then you can call accessors to get the individual fields. But the naming makes it very clear that the underlying UtxoEntry is the real type, and the parser is just a view into it.

casey · 2024-09-03T04:32:47Z

Yah, I think you're right, I find all that convincing. I pushed a commit going back tot he old names. UtxoEntryBuf and ParsedUtxoEntry are not great, but at least they're clear!

* Change test-bitcoincore-rpc to mockcore in README.md (ordinals#3842) * Commit twice to work around redb off-by-one bug (ordinals#3856) * Release 0.19.1 (ordinals#3864) - Bump version: 0.19.0 → 0.19.1 - Update changelog - Update changelog contributor credits - Update dependencies * Update Portuguese Translation pt.po (ordinals#3837) * Updated Chinese translation (ordinals#3881) * Fix rune links for runes with no symbol (ordinals#3849) * Suppress printing sat_ranges by default (ordinals#3867) * Re-enter beta (ordinals#3884) * Update pointer specification (ordinals#3861) * Clarify that unused runes tags should not be used (ordinals#3885) * Migrate object.rs to snafu error handling (ordinals#3858) * Make index settings harder to misuse (ordinals#3893) * Don't unnecessarily insert into utxo cache when indexing addresses (ordinals#3894) * Remove trailing space from runes specification (ordinals#3896) * Serve responses with cross origin isolation headers (ordinals#3898) * List all Bitcoin Core wallets (ordinals#3902) * Make first first and last sat in range clickable (ordinals#3903) * Migrate Outgoing to SnafuError (ordinals#3854) * Update Bitcoin Core deploy to 27.1 (ordinals#3912) * Add sat_balance to address API (ordinals#3905) Co-authored-by: raph <raphjaph@protonmail.com> * Add Dutch translation to Ordinals Handbook (ordinals#3907) * Migrate chain.rs to snafu error (ordinals#3904) * Bump version to 0.20.0-dev (ordinals#3916) * Revert "Serve responses with cross origin isolation headers" (ordinals#3920) * Remove inscription content type counts from /status page (ordinals#3922) * Unified OUTPOINT_TO_UTXO_ENTRY table (ordinals#3915) - Upgrade `redb` to 2.1.1 - Remove `--index-spent-sats` - Remove redundant pointer handling in `index_inscriptions()` - Fix incorrect `is_output_spent()` results when not using `--index-sats` - Unify UTXO index data in `OUTPOINT_TO_UTXO_ENTRY` table - Read addresses from index when exporting * Add address field to `/r/inscription/:id` (ordinals#3891) * Add inscriptions and runes details to address API endpoint (ordinals#3924) * Release 0.20.0 (ordinals#3928) - Bump ord version: 0.19.1 → 0.20.0 - Bump ordinals version: 0.0.9 → 0.0.10 - Update changelog - Update changelog contributor credits - Update dependencies --------- Co-authored-by: Anchor <49644098+TheHeBoy@users.noreply.github.com> Co-authored-by: Casey Rodarmor <casey@rodarmor.com> Co-authored-by: 0xArtur <metaverseartur@gmail.com> Co-authored-by: Dr.JingLee <95461562+DrJingLee@users.noreply.github.com> Co-authored-by: nine <118634361+cryptoni9n@users.noreply.github.com> Co-authored-by: Bohdan Cryptolions <37701692+ansigroup@users.noreply.github.com> Co-authored-by: raph <raphjaph@protonmail.com> Co-authored-by: Patrick Collins <patrick@collinatorstudios.com> Co-authored-by: Tibebtc <tibedekock@live.nl> Co-authored-by: partialord <178683722+partialord@users.noreply.github.com> Co-authored-by: Eloc <42568538+elocremarc@users.noreply.github.com> Co-authored-by: twosatsmaxi <vashalpesh@gmail.com>

* Change test-bitcoincore-rpc to mockcore in README.md (ordinals#3842) * Commit twice to work around redb off-by-one bug (ordinals#3856) * Release 0.19.1 (ordinals#3864) - Bump version: 0.19.0 → 0.19.1 - Update changelog - Update changelog contributor credits - Update dependencies * Update Portuguese Translation pt.po (ordinals#3837) * Updated Chinese translation (ordinals#3881) * Fix rune links for runes with no symbol (ordinals#3849) * Suppress printing sat_ranges by default (ordinals#3867) * Re-enter beta (ordinals#3884) * Update pointer specification (ordinals#3861) * Clarify that unused runes tags should not be used (ordinals#3885) * Migrate object.rs to snafu error handling (ordinals#3858) * Make index settings harder to misuse (ordinals#3893) * Don't unnecessarily insert into utxo cache when indexing addresses (ordinals#3894) * Remove trailing space from runes specification (ordinals#3896) * Serve responses with cross origin isolation headers (ordinals#3898) * List all Bitcoin Core wallets (ordinals#3902) * Make first first and last sat in range clickable (ordinals#3903) * Migrate Outgoing to SnafuError (ordinals#3854) * Update Bitcoin Core deploy to 27.1 (ordinals#3912) * Add sat_balance to address API (ordinals#3905) Co-authored-by: raph <raphjaph@protonmail.com> * Add Dutch translation to Ordinals Handbook (ordinals#3907) * Migrate chain.rs to snafu error (ordinals#3904) * Bump version to 0.20.0-dev (ordinals#3916) * Revert "Serve responses with cross origin isolation headers" (ordinals#3920) * Remove inscription content type counts from /status page (ordinals#3922) * Unified OUTPOINT_TO_UTXO_ENTRY table (ordinals#3915) - Upgrade `redb` to 2.1.1 - Remove `--index-spent-sats` - Remove redundant pointer handling in `index_inscriptions()` - Fix incorrect `is_output_spent()` results when not using `--index-sats` - Unify UTXO index data in `OUTPOINT_TO_UTXO_ENTRY` table - Read addresses from index when exporting * Add address field to `/r/inscription/:id` (ordinals#3891) * Add inscriptions and runes details to address API endpoint (ordinals#3924) * Release 0.20.0 (ordinals#3928) - Bump ord version: 0.19.1 → 0.20.0 - Bump ordinals version: 0.0.9 → 0.0.10 - Update changelog - Update changelog contributor credits - Update dependencies * Bump version to 0.20.0-dev (ordinals#3929) * Add test to remind us to fix the UtxoEntry redb type name (ordinals#3934) * Put AddressInfo into api module (ordinals#3933) * Fix clippy lint (ordinals#3937) * Add inscription index to /status (ordinals#3938) * Remove unnecessary symbols in docs/src/guides/testing.md (ordinals#3945) * Add inscription examples to handbook (ordinals#3769) * Allow scrolling in iframe (ordinals#3947) * Skip serializing None in batch::File (ordinals#3943) * Fix /output page (ordinals#3948) * Add `/satpoint/<SATPOINT>` endpoint (ordinals#3949) * Don't log RPC connections to bitcoind (ordinals#3952) * Start indexing at correct block height (ordinals#3956) * Fix output API struct (ordinals#3957) * Remove dependency on `ord-bitcoincore-rpc` crate (ordinals#3959) * Keep sat ranges in low-level format (ordinals#3963) * Implement burn for wallet command (ordinals#3437) * Add multi parent support to wallet (ordinals#3228) * Get parents using `as_slice` instead of converting to `Vec` (ordinals#3972) * Rename parents_values -> parent_values (ordinals#3973) * Fix non-existant output lookup (ordinals#3968) * Release 0.20.1 (ordinals#3975) * Refactor burn command (ordinals#3976) * Remove regtest.ordinals.net just recipes (ordinals#3978) * Add `ord verify` (ordinals#3906) * Release 0.21.0 (ordinals#3997) - Bump version: 0.20.1 → 0.21.0 - Update changelog - Update changelog contributor credits - Update dependencies * Remove /runes/balances API endpoint (ordinals#3980) * Update rust-bitcoin in ord (ordinals#3962) * Revert redb to 2.1.3 (ordinals#4003) * Release 0.21.1 (ordinals#4006) * Update Bitcoin Core install script (ordinals#4007) * Remove pre-alpha warning from ord help (ordinals#4011) * Change mint progress to `mints / terms.cap` (ordinals#4012) * Only show rune mint progress during mint (ordinals#4013) * Show if JSON API is enabled on /status (ordinals#4014) * Fix build error --------- Co-authored-by: Anchor <49644098+TheHeBoy@users.noreply.github.com> Co-authored-by: Casey Rodarmor <casey@rodarmor.com> Co-authored-by: 0xArtur <metaverseartur@gmail.com> Co-authored-by: Dr.JingLee <95461562+DrJingLee@users.noreply.github.com> Co-authored-by: nine <118634361+cryptoni9n@users.noreply.github.com> Co-authored-by: Bohdan Cryptolions <37701692+ansigroup@users.noreply.github.com> Co-authored-by: raph <raphjaph@protonmail.com> Co-authored-by: Patrick Collins <patrick@collinatorstudios.com> Co-authored-by: Tibebtc <tibedekock@live.nl> Co-authored-by: partialord <178683722+partialord@users.noreply.github.com> Co-authored-by: Eloc <42568538+elocremarc@users.noreply.github.com> Co-authored-by: twosatsmaxi <vashalpesh@gmail.com> Co-authored-by: tiaoxizhan <tiaoxizhan@outlook.com> Co-authored-by: onchainguy <1436535+onchainguy-btc@users.noreply.github.com> Co-authored-by: lifofifo <134870335+lifofifoX@users.noreply.github.com> Co-authored-by: dcorral <hi@dcorral.com>

partialord and others added 27 commits August 27, 2024 03:36

upgrade redb to 2.1.1

c21d4db

remove --index-spent-sats

816a4cd

As far as I know, no one has ever used this, and it would take extra work to support it with the new OUTPOINT_TO_UTXO_ENTRY table (which greatly improves performance). Just remove the option for now.

remove redundant pointer handling in index_inscriptions()

c274dcb

This code is unused, now that we have the newer (and much nicer) pointer-handling code on lines 190-194.

remove unneeded members from InscriptionUpdater

e033c70

remove redundant stats

ce6f199

With the unified OUTPOINT_TO_UTXO_ENTRY table, these stats aren't useful; they're either identical to the stats in the regular "Committing at block height ..." message (if you're using --index-sats), or zero (if you're not).

use the address index when exporting with addresses

8ab4c5a

index_transaction_output_script_pubkeys() can just return () now

a8e55f7

Alphabetize and renumber statistics

26c4535

I think no reason not to, since the schema version is changing anyways.

Combine use statements in utxo_entry.rs

962ae31

Alphebetize derives on UtxoEntry

ce0cd76

Go back to default logging timestamp

e98dedc

I think millis is probably too much precision unless doing performance work.

Add doc comment to Index::is_special_outpoint

162a155

Inline utxo_entry var

804f02a

Inline sat_ranges var

f2c4029

parsed_inscriptions -> inscriptions

7be3b8f

Import ParsedUtxoEntry and UtxoEntryBuf in index.rs

c17c489

inscriptions -> has_new_inscriptions

123ef99

parsed_inscriptions -> transferred_inscriptions

906da20

Tweak comment

a2c99f8

Use turbofish instead of type annotation

3353d94

SatRangesOrValue -> Sats

b073df7

Use entry/value terminology

22d6a55

We have an `Entry` trait which uses "value" for the low-level value in redb, and `entry` for the high-level value parsed from the low-level value. Use that for UtxoEntry.

Tweak comment

aa78319

NeedSatRangesOrValue -> NeedSats

597acd5

Merge branch 'master' into unified-table

f31ba3c

raphjaph approved these changes Aug 29, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/master' into unified-table

e2a096c

casey approved these changes Aug 29, 2024

View reviewed changes

Go back to old names

ae60f7c

casey enabled auto-merge (squash) September 3, 2024 04:36

casey disabled auto-merge September 3, 2024 04:37

Make schema statistic key zero again

05ca1d5

casey enabled auto-merge (squash) September 3, 2024 04:41

casey merged commit 0ae0572 into ordinals:master Sep 3, 2024
5 checks passed

BrewTestBot mentioned this pull request Sep 3, 2024

ord 0.20.0 Homebrew/homebrew-core#183374

Merged

partialord deleted the unified-table branch September 20, 2024 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified OUTPOINT_TO_UTXO_ENTRY table #3915

Unified OUTPOINT_TO_UTXO_ENTRY table #3915

partialord commented Aug 27, 2024

raphjaph left a comment

casey left a comment

partialord commented Aug 29, 2024

casey commented Aug 30, 2024

partialord commented Aug 31, 2024

partialord commented Aug 31, 2024

casey commented Sep 3, 2024

Unified OUTPOINT_TO_UTXO_ENTRY table #3915

Unified OUTPOINT_TO_UTXO_ENTRY table #3915

Conversation

partialord commented Aug 27, 2024

raphjaph left a comment

Choose a reason for hiding this comment

casey left a comment

Choose a reason for hiding this comment

partialord commented Aug 29, 2024

casey commented Aug 30, 2024

partialord commented Aug 31, 2024

partialord commented Aug 31, 2024

casey commented Sep 3, 2024