Skip to content

Conversation

@random-zebra
Copy link

@random-zebra random-zebra commented Jul 29, 2020

This backports the commits listed as "Preparation" in bitcoin#10195.
The rest will be included in a followup PR.

Built on top of:

Here we:

  • add in-memory size estimation of a LevelDB batch
  • remove the usage of error() in ApplyTxInUndo/DisconnectBlock
  • Introduce the new CHashVerifier class to hash read data
  • Add a specialized version of SipHash for tuples of 256 bits and 32 bits data
  • Remove the transaction version from utxo and undo data structures
  • Add on-disk size of the utxo set to gettxoutsetinfo
  • store tx metadata in all undo records, instead of records corresponding only to the last output of a tx being spent.

@furszy
Copy link

furszy commented Aug 5, 2020

Now that 1771 and 1772 has been merged, this can be rebased.

@random-zebra random-zebra force-pushed the 202007_pertxoutcache_1 branch from 22fdaec to c8e8e23 Compare August 5, 2020 15:36
@random-zebra
Copy link
Author

Rebased on master.

@random-zebra
Copy link
Author

Rebased on top of #1793, so that one should be reviewed/merged first.

@random-zebra random-zebra force-pushed the 202007_pertxoutcache_1 branch from b8e8bc1 to 4ee5fb5 Compare August 12, 2020 10:21
random-zebra added a commit that referenced this pull request Aug 18, 2020
cdbb1a6 Remove unused variable in test, fixing warning. (Russell Yanofsky)
03031a5 Check FRESH validity in CCoinsViewCache::BatchWrite (Russell Yanofsky)
3fb3e03 Fix dangerous condition in ModifyNewCoins. (random-zebra)
228d6bc [test] Add CCoinsViewCache Access/Modify/Write tests (random-zebra)

Pull request description:

  Going on with the coins view cache update work (started with #1774 and #1777).
  This backports the following upstream PRs (adapting them with the assumption that duplicate coinbase transactions are not possible):

  - bitcoin#9308 - [test] Add CCoinsViewCache Access/Modify/Write tests
  - bitcoin#9107 - Safer modify new coins
  - bitcoin#9310 - Assert FRESH validity in CCoinsViewCache::BatchWrite
  - bitcoin#9435 - Removed unused variable in test, fixing warning.

  Note: Will rebase #1773 on top of this one (to change `ApplyTxInUndo` return value in the unit test too).

ACKs for top commit:
  furszy:
    pretty nice one, ACK cdbb1a6
  Fuzzbawls:
    ACK cdbb1a6

Tree-SHA512: 69c534da1083c3c4a535923e98da7750474c6698bb45e042778b539a27fab0455d78aaf670bed9485522f9d2ee25e4ad57fa7d35632523fe7376fe5f5febb1e5
sipa and others added 7 commits August 18, 2020 15:17
This allows estimating the in-memory size of a LevelDB batch.
>>> backports bitcoin/bitcoin@f54580e

The error() function unconditionally reports an error. It should only
be used for actually exception situations, and not for the type of
inconsistencies that ApplyTxInUndo/DisconnectBlock can graciously deal
with.

This also makes a subtle semantics change: in ApplyTxInUndo, when a
record with metadata is encountered (indicating it is the last spend
from a tx), don't wipe the CCoins record if it wasn't empty at that
point. This makes sure that UTXO operations never affect any other
UTXOs (including those from the same tx).
>>> backports bitcoin/bitcoin@e484652

This is necessary later, when we drop the nVersion field from the undo
data. At that point deserializing and reserializing the data won't
roundtrip anymore, and thus that approach can't be used to verify
checksums anymore.

With this CHashVerifier approach, we can deserialize while hashing the
exact serialized form that was used. This is both more efficient and
more correct in that case.
We'll need a version of SipHash for tuples of 256 bits and 32 bits
data, when CCoinsViewCache switches from using txids to COutPoints as
keys.
>>> backports bitcoin/bitcoin@d342424

This makes the following changes:
* In undo data and the chainstate database, the transaction nVersion
  field is removed from the data structures, always written as 0, and
  ignored when reading.
* The definition of hash_serialized in gettxoutsetinfo is changed to no
  longer incude the nVersion field. It is renamed to hash_serialized_2
  to avoid confusion. The new definition also includes transaction
  height and coinbase information, as this information was missing
  before.

This depends on having a CHashVerifier-based undo data checksum
verifier.

Apart from changing the definition of serialized_hash, downgrading
after using this patch is supported, as no release ever used the value
of nVersion field in UTXO entries.
>>> backports bitcoin/bitcoin@7d991b5

Previously, transaction metadata (height, coinbase or not, and before
the previous commit also nVersion) was only stored for undo records
that correspond to the last output of a transaction being spent.

This only saves 2 bytes per undo record. Change this to storing this
information for every undo record, and stop complaining for having it
in non-last output spends. This means that undo dat written with
this patch won't be readable by older versions anymore.
@random-zebra random-zebra force-pushed the 202007_pertxoutcache_1 branch from 4ee5fb5 to abac479 Compare August 18, 2020 13:25
@random-zebra
Copy link
Author

Rebased on master. Ready for review.

@random-zebra random-zebra force-pushed the 202007_pertxoutcache_1 branch from abac479 to 4916233 Compare August 18, 2020 14:27
@random-zebra random-zebra force-pushed the 202007_pertxoutcache_1 branch from 4916233 to 83512d2 Compare August 18, 2020 14:33
Copy link

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, utACK 83512d2.
Will run it and play around with the disk size estimation for some time.

Copy link
Collaborator

@Fuzzbawls Fuzzbawls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookin good!

utACK 83512d2

@furszy furszy merged commit 90ffc66 into PIVX-Project:master Aug 19, 2020
random-zebra added a commit that referenced this pull request Sep 5, 2020
535d8e4 scripted-diff: various renames for per-utxo consistency (random-zebra)
60c73ad Rename CCoinsCacheEntry::coins to coin (Pieter Wuille)
1166f1f Merge CCoinsViewCache's GetOutputFor and AccessCoin (Pieter Wuille)
f25d3c6 [MOVEONLY] Move old CCoins class to txdb.cpp (random-zebra)
aab17b3 Upgrade from per-tx database to per-txout (Pieter Wuille)
a55eb98 Reduce reserved memory space for flushing (random-zebra)
7d637ea Remove unused CCoins methods (random-zebra)
4b7b1a3 Extend coins_tests (random-zebra)
8cf2dc0 Switch CCoinsView and chainstate db from per-txid to per-txout (random-zebra)
b20a662 [Cleanup] Remove unused CCoinsViewCache::IsOutputAvailable (random-zebra)
ea82855 Refactor GetUTXOStats in preparation for per-COutPoint iteration (Pieter Wuille)
0f9f2b6 [Core] Fix not-pruned-check in miner for zerocoin mint outputs (random-zebra)
3698d47 Replace CCoins-based CTxMemPool::pruneSpent with isSpent (Pieter Wuille)
11c728a Remove ModifyCoins/ModifyNewCoins (Pieter Wuille)
3c65936 Switch tests from ModifyCoins to AddCoin/SpendCoin (random-zebra)
b05fe55 Switch CScriptCheck to use Coin instead of CCoins (random-zebra)
9e5c2ae Only pass things committed to by tx's witness hash to CScriptCheck (random-zebra)
22b0b4a [Cleanup] fix warning in comparisons with nCoinbaseMaturity (random-zebra)
666081d Switch from per-tx to per-txout CCoinsViewCache methods in some places (random-zebra)
53fc2be [Core] PIVX: remove potential_overwrite: BIP34 has always been enforced (random-zebra)
16924aa Introduce new per-txout CCoinsViewCache functions (random-zebra)
2d74bc1 Optimization: Coin&& to ApplyTxInUndo (random-zebra)
3fcb092 Replace CTxInUndo with Coin (random-zebra)
42a8996 [Core] PIVX: Add fCoinStake boolean field to Coin class (random-zebra)
a01a9f2 Introduce Coin, a single unspent output (random-zebra)

Pull request description:

  The chainstate database, and its cache, rely on a per-tx model (i.e. it is a map from txids to *lists* of unspent outputs).
  This PR chages it to a per-txout model (i.e. a map from outpoints to *single* unspent outputs).

  Pros:
  - Simpler code.
  - Avoiding the CPU overhead of deserializing and serializing the unused outputs.
  - More predictable memory usage.
  - More easily adaptable to various cache flushing strategies.

  Cons:
  - Slightly larger on-disk representation, and sometimes larger in-memory representation (when there are multiple outputs for the same txid in the cache, which becomes optional).

  Adapted from bitcoin#10195 (adding a `fCoinStake` boolean memeber to the Coin class and considering BIP34 always in effect as per #1775)

  Based on top of:
  - [x] #1799
  - [x] #1800
  - [x] #1797
  - [x] #1796
  - [x] #1795
  - [x] #1793
  - [x] #1773

  The actual PR starts with "Introduce Coin, a single unspent output" (171bf0b8a55de0e3f94305b33e97bb1cf4a01b70)

  ⚠️  **Note for testers**: Do a backup of the chainstate before testing. With this PR, at startup, the database is automatically upgraded to the new format (so a reindex is not needed).

ACKs for top commit:
  furszy:
    Code review ACK 535d8e4. Need some live testing now.
  furszy:
    ACK 535d8e4
  Fuzzbawls:
    ACK 535d8e4

Tree-SHA512: 8648d31c343ff901f8568c14d0848241c9b8aca0b9a8f89f3e5dbe02d384ba35faf572757e84f979fba1f198d721a2516d0e784b5e0dc8d02b6e308b4278b484
@random-zebra random-zebra modified the milestones: 5.0.0, 4.3.0 Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants