Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3. refactor(db): add disk serialization types for transactions #3741

Merged
merged 7 commits into from
Mar 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions book/src/dev/rfcs/0005-state-updates.md
Original file line number Diff line number Diff line change
Expand Up @@ -611,10 +611,10 @@ We use the following rocksdb column families:
| `hash_by_tx_loc` | `TransactionLocation` | `transaction::Hash` | Never |
| `tx_loc_by_hash` | `transaction::Hash` | `TransactionLocation` | Never |
| *Transparent* | | | |
| `utxo_by_out_loc` | `OutLocation` | `transparent::Output` | Delete |
| `balance_by_transparent_addr` | `transparent::Address` | `Amount \|\| TransparentAddrLoc` | Update |
| `utxo_by_transparent_addr_loc` | `TransparentAddrLoc` | `AtLeastOne<OutLocation>` | Up/Del |
| `tx_by_transparent_addr_loc` | `TransparentAddrLoc` | `AtLeastOne<TransactionLocation>` | Append |
| `utxo_by_out_loc` | `OutputLocation` | `transparent::Output` | Delete |
| `balance_by_transparent_addr` | `transparent::Address` | `Amount \|\| AddressLocation` | Update |
| `utxo_by_transparent_addr_loc` | `AddressLocation` | `AtLeastOne<OutputLocation>` | Up/Del |
| `tx_by_transparent_addr_loc` | `AddressLocation` | `AtLeastOne<TransactionLocation>` | Append |
| *Sprout* | | | |
| `sprout_nullifiers` | `sprout::Nullifier` | `()` | Never |
| `sprout_anchors` | `sprout::tree::Root` | `sprout::tree::NoteCommitmentTree` | Never |
Expand All @@ -640,12 +640,12 @@ Block and Transaction Data:
- `TransactionCount`: same as `TransactionIndex`
- `TransactionLocation`: `Height \|\| TransactionIndex`
- `HeightTransactionCount`: `Height \|\| TransactionCount`
- `TransparentOutputIndex`: 24 bits, big-endian, unsigned (max ~223,000 transfers in the 2 MB block limit)
- `OutputIndex`: 24 bits, big-endian, unsigned (max ~223,000 transfers in the 2 MB block limit)
- transparent and shielded input indexes, and shielded output indexes: 16 bits, big-endian, unsigned (max ~49,000 transfers in the 2 MB block limit)
- `OutLocation`: `TransactionLocation \|\| TransparentOutputIndex`
- `TransparentAddrLoc`: the first `OutLocation` used by a `transparent::Address`.
- `OutputLocation`: `TransactionLocation \|\| OutputIndex`
- `AddressLocation`: the first `OutputLocation` used by a `transparent::Address`.
Always has the same value for each address, even if the first output is spent.
- `Utxo`: `Output`, derives extra fields from the `OutLocation` key
- `Utxo`: `Output`, derives extra fields from the `OutputLocation` key
- `AtLeastOne<T>`: `[T; AtLeastOne::len()]` (for known-size `T`)

We use big-endian encoding for keys, to allow database index prefix searches.
Expand Down Expand Up @@ -734,30 +734,31 @@ So they should not be used for consensus-critical checks.
we store blocks by height, storing the height saves one level of indirection.
Transaction hashes can be looked up using `hash_by_tx`.

- Similarly, UTXOs are stored in `utxo_by_outpoint` by `OutLocation`,
- Similarly, UTXOs are stored in `utxo_by_outpoint` by `OutputLocation`,
rather than `OutPoint`. `OutPoint`s can be looked up using `tx_by_hash`,
and reconstructed using `hash_by_tx`.

- The `Utxo` type can be constructed from the `Output` data,
`height: TransactionLocation.height`, and
`is_coinbase: OutLocation.output_index == 1`.
`is_coinbase: TransactionLocation.index == 0`
(coinbase transactions are always the first transaction in a block).

- `balance_by_transparent_addr` is the sum of all `utxo_by_transparent_addr_loc`s
that are still in `utxo_by_outpoint`. It is cached to improve performance for
addresses with large UTXO sets. It also stores the `TransparentAddrLoc` for each
addresses with large UTXO sets. It also stores the `AddressLocation` for each
address, which allows for efficient lookups.

- `utxo_by_transparent_addr_loc` stores unspent transparent output locations by address.
UTXO locations are appended by each block. If an address lookup discovers a UTXO
has been spent in `utxo_by_outpoint`, that UTXO location can be deleted from
`utxo_by_transparent_addr_loc`. (We don't do these deletions every time a block is
committed, because that requires an expensive full index search.)
This list includes the `TransparentAddrLoc`, if it has not been spent.
This list includes the `AddressLocation`, if it has not been spent.
(This duplicate data is small, and helps simplify the code.)

- `tx_by_transparent_addr_loc` stores transaction locations by address.
This list includes transactions containing spent UTXOs.
It also includes the `TransactionLocation` from the `TransparentAddrLoc`.
It also includes the `TransactionLocation` from the `AddressLocation`.
(This duplicate data is small, and helps simplify the code.)

- Each `*_note_commitment_tree` stores the note commitment tree state
Expand Down
10 changes: 10 additions & 0 deletions zebra-state/proptest-regressions/service/check/tests/utxo.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Seeds for failure cases proptest has generated in the past. It is
# automatically read and these particular cases re-run before any
# novel cases are generated.
#
# It is recommended to check this file in to source control so that
# everyone who runs the test benefits from these saved cases.
cc e269485ce65fc50f093f8d979c5afb233709e0c18e56ab419afb065c2e0bf854 # shrinks to output = zebra_chain::transparent::Output, mut prevout_input = zebra_chain::transparent::Input, use_finalized_state = false
cc 2639971d2f0cad4354fa6a4b00f8d588e04638c33d884f8d31ca6b09e43a31d9 # shrinks to output = zebra_chain::transparent::Output, mut prevout_input = zebra_chain::transparent::Input, use_finalized_state_output = false, mut use_finalized_state_spend = false
cc 59045504569e389f48e0f8d1b7938e5fdfed84e1ba83af25c18df8300086788c # shrinks to unused_output = zebra_chain::transparent::Output, prevout_input = zebra_chain::transparent::Input
cc 65bbd1a767ce94e046fbab250fc8b9c8f3acc52bf9d032c9f198347052b62775 # shrinks to output = zebra_chain::transparent::Output, mut prevout_input = zebra_chain::transparent::Input
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Seeds for failure cases proptest has generated in the past. It is
# automatically read and these particular cases re-run before any
# novel cases are generated.
#
# It is recommended to check this file in to source control so that
# everyone who runs the test benefits from these saved cases.
cc 933c998cd42e62c9b80ceae375981200f1039e493262f7d931d973900c75812e # shrinks to (chain, count, network, _history_tree) = (alloc::vec::Vec<zebra_state::request::PreparedBlock><zebra_state::request::PreparedBlock>, len=104, 2, Mainnet, HistoryTree(None))
2 changes: 2 additions & 0 deletions zebra-state/proptest-regressions/service/tests.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@
# It is recommended to check this file in to source control so that
# everyone who runs the test benefits from these saved cases.
cc 37aea4b0880d7d9029ea4fad0136bd8553f81eea0435122737ec513f4f6fb73c # shrinks to (network, nu_activation_height, chain) = (Mainnet, Height(1046400), alloc::vec::Vec<alloc::sync::Arc<zebra_chain::block::Block>><alloc::sync::Arc<zebra_chain::block::Block>>, len=101)
cc 1a833b934966164ec7170c4bbdd7c48723ac0c873203af5f7880539ff1c095bf # shrinks to (network, finalized_blocks, non_finalized_blocks) = (Mainnet, alloc::vec::Vec<zebra_state::request::FinalizedBlock><zebra_state::request::FinalizedBlock>, len=2, alloc::vec::Vec<zebra_state::request::PreparedBlock><zebra_state::request::PreparedBlock>, len=9)
cc 5fe3b32843194422a1ed411c7187c013d0cfd5c5f4a238643df1d5a7decd12c0 # shrinks to (network, finalized_blocks, non_finalized_blocks) = (Mainnet, alloc::vec::Vec<zebra_state::request::FinalizedBlock><zebra_state::request::FinalizedBlock>, len=2, alloc::vec::Vec<zebra_state::request::PreparedBlock><zebra_state::request::PreparedBlock>, len=9)
23 changes: 2 additions & 21 deletions zebra-state/src/service/finalized_state/arbitrary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,14 @@

use std::sync::Arc;

use proptest::prelude::*;

use zebra_chain::{
amount::NonNegative,
block::{self, Block},
sprout,
value_balance::ValueBalance,
};
use zebra_chain::{amount::NonNegative, block::Block, sprout, value_balance::ValueBalance};

use crate::service::finalized_state::{
disk_db::{DiskWriteBatch, WriteDisk},
disk_format::{FromDisk, IntoDisk, TransactionLocation},
disk_format::{FromDisk, IntoDisk},
FinalizedState,
};

impl Arbitrary for TransactionLocation {
type Parameters = ();

fn arbitrary_with(_args: Self::Parameters) -> Self::Strategy {
(any::<block::Height>(), any::<u32>())
.prop_map(|(height, index)| Self { height, index })
.boxed()
}

type Strategy = BoxedStrategy<Self>;
}

pub fn round_trip<T>(input: T) -> T
where
T: IntoDisk + FromDisk,
Expand Down
32 changes: 29 additions & 3 deletions zebra-state/src/service/finalized_state/disk_format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ mod tests;

pub use block::TransactionLocation;

/// Helper trait for defining the exact format used to interact with disk per
/// type.
/// Helper trait for defining the exact format used to store to disk,
/// for each type.
pub trait IntoDisk {
/// The type used to compare a value as a key to other keys stored in a
/// database.
Expand All @@ -29,6 +29,14 @@ pub trait IntoDisk {
fn as_bytes(&self) -> Self::Bytes;
}

/// Helper trait for types with fixed-length disk storage.
///
/// This trait must not be implemented for types with variable-length disk storage.
pub trait IntoDiskFixedLen: IntoDisk {
/// Returns the fixed serialized length of `Bytes`.
fn fixed_byte_len() -> usize;
}

/// Helper type for retrieving types from the disk with the correct format.
///
/// The ivec should be correctly encoded by IntoDisk.
Expand All @@ -41,7 +49,7 @@ pub trait FromDisk: Sized {
fn from_bytes(bytes: impl AsRef<[u8]>) -> Self;
}

// Generic trait impls
// Generic serialization impls

impl<'a, T> IntoDisk for &'a T
where
Expand Down Expand Up @@ -74,10 +82,28 @@ where
}
}

// Commonly used serialization impls

impl IntoDisk for () {
type Bytes = [u8; 0];

fn as_bytes(&self) -> Self::Bytes {
[]
}
}

// Generic serialization length impls

impl<T> IntoDiskFixedLen for T
where
T: IntoDisk,
T::Bytes: Default + IntoIterator + Copy,
{
/// Returns the fixed size of `Bytes`.
///
/// Assumes that `Copy` types are fixed-sized byte arrays.
fn fixed_byte_len() -> usize {
// Bytes is probably a [u8; N]
Self::Bytes::default().into_iter().count()
}
}
89 changes: 63 additions & 26 deletions zebra-state/src/service/finalized_state/disk_format/block.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,29 +15,58 @@ use zebra_chain::{
transaction,
};

use crate::service::finalized_state::disk_format::{FromDisk, IntoDisk};
use crate::service::finalized_state::disk_format::{FromDisk, IntoDisk, IntoDiskFixedLen};

#[cfg(any(test, feature = "proptest-impl"))]
use proptest_derive::Arbitrary;

// Transaction types

/// A transaction's index in its block.
#[derive(Copy, Clone, Debug, Eq, PartialEq, Ord, PartialOrd, Serialize, Deserialize)]
#[cfg_attr(any(test, feature = "proptest-impl"), derive(Arbitrary))]
pub struct TransactionIndex(u32);

impl TransactionIndex {
/// Create a transaction index from the native index integer type.
#[allow(dead_code)]
pub fn from_usize(transaction_index: usize) -> TransactionIndex {
TransactionIndex(
transaction_index
.try_into()
.expect("the maximum valid index fits in the inner type"),
)
}

/// Return this index as the native index integer type.
#[allow(dead_code)]
pub fn as_usize(&self) -> usize {
self.0
.try_into()
.expect("the maximum valid index fits in usize")
}
}

/// A transaction's location in the chain, by block height and transaction index.
///
/// This provides a chain-order list of transactions.
#[derive(Copy, Clone, Debug, Eq, PartialEq, Ord, PartialOrd, Serialize, Deserialize)]
#[cfg_attr(any(test, feature = "proptest-impl"), derive(Arbitrary))]
pub struct TransactionLocation {
/// The block height of the transaction.
pub height: Height,

/// The index of the transaction in its block.
pub index: u32,
pub index: TransactionIndex,
}

impl TransactionLocation {
/// Create a transaction location from a block height and index (as the native index integer type).
#[allow(dead_code)]
pub fn from_usize(height: Height, index: usize) -> TransactionLocation {
pub fn from_usize(height: Height, transaction_index: usize) -> TransactionLocation {
TransactionLocation {
height,
index: index
.try_into()
.expect("all valid indexes are much lower than u32::MAX"),
index: TransactionIndex::from_usize(transaction_index),
}
}
}
Expand Down Expand Up @@ -92,37 +121,39 @@ impl FromDisk for block::Hash {

// Transaction trait impls

impl IntoDisk for TransactionIndex {
type Bytes = [u8; 4];

fn as_bytes(&self) -> Self::Bytes {
self.0.to_be_bytes()
}
}

impl FromDisk for TransactionIndex {
fn from_bytes(disk_bytes: impl AsRef<[u8]>) -> Self {
TransactionIndex(u32::from_be_bytes(disk_bytes.as_ref().try_into().unwrap()))
}
}

impl IntoDisk for TransactionLocation {
type Bytes = [u8; 8];

fn as_bytes(&self) -> Self::Bytes {
let height_bytes = self.height.as_bytes();
let index_bytes = self.index.to_be_bytes();
let index_bytes = self.index.as_bytes();

let mut bytes = [0; 8];

bytes[0..4].copy_from_slice(&height_bytes);
bytes[4..8].copy_from_slice(&index_bytes);

bytes
[height_bytes, index_bytes].concat().try_into().unwrap()
}
}

impl FromDisk for TransactionLocation {
fn from_bytes(disk_bytes: impl AsRef<[u8]>) -> Self {
let disk_bytes = disk_bytes.as_ref();
let height = {
let mut bytes = [0; 4];
bytes.copy_from_slice(&disk_bytes[0..4]);
let height = u32::from_be_bytes(bytes);
Height(height)
};

let index = {
let mut bytes = [0; 4];
bytes.copy_from_slice(&disk_bytes[4..8]);
u32::from_be_bytes(bytes)
};
let height_len = Height::fixed_byte_len();

let (height_bytes, index_bytes) = disk_bytes.as_ref().split_at(height_len);

let height = Height::from_bytes(height_bytes);
let index = TransactionIndex::from_bytes(index_bytes);

TransactionLocation { height, index }
}
Expand All @@ -135,3 +166,9 @@ impl IntoDisk for transaction::Hash {
self.0
}
}

impl FromDisk for transaction::Hash {
fn from_bytes(disk_bytes: impl AsRef<[u8]>) -> Self {
transaction::Hash(disk_bytes.as_ref().try_into().unwrap())
}
}
Loading