Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bitcoin Merkle trees #1386

Merged
merged 8 commits into from
Dec 1, 2020
11 changes: 7 additions & 4 deletions zebra-chain/src/block/header.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,14 @@ pub struct Header {
/// block’s header.
pub previous_block_hash: Hash,

/// The root of the transaction Merkle tree.
/// The root of the Bitcoin-inherited transaction Merkle tree, binding the
/// block header to the transactions in the block.
///
/// The Merkle root is derived from the SHA256d hashes of all transactions
/// included in this block as assembled in a binary tree, ensuring that none
/// of those transactions can be modied without modifying the header.
/// Note that because of a flaw in Bitcoin's design, the `merkle_root` does
/// not always precisely bind the contents of the block (CVE-2012-2459). It
/// is sometimes possible for an attacker to create multiple distinct sets of
/// transactions with the same Merkle root, although only one set will be
/// valid.
pub merkle_root: merkle::Root,

/// Some kind of root hash.
Expand Down
152 changes: 122 additions & 30 deletions zebra-chain/src/block/merkle.rs
Original file line number Diff line number Diff line change
@@ -1,51 +1,143 @@
//! The Bitcoin-inherited Merkle tree of transactions.
#![allow(clippy::unit_arg)]

use std::{fmt, io};
use std::{fmt, io::Write};

#[cfg(any(any(test, feature = "proptest-impl"), feature = "proptest-impl"))]
use proptest_derive::Arbitrary;

use crate::serialization::{sha256d, SerializationError, ZcashDeserialize, ZcashSerialize};
use crate::transaction::Transaction;
use crate::serialization::sha256d;
use crate::transaction::{self, Transaction};

/// A binary hash tree of SHA256d (two rounds of SHA256) hashes for
/// node values.
#[derive(Default)]
pub struct Tree<T> {
_leaves: Vec<T>,
}
/// The root of the Bitcoin-inherited transaction Merkle tree, binding the
/// block header to the transactions in the block.
///
/// Note that because of a flaw in Bitcoin's design, the `merkle_root` does
/// not always precisely bind the contents of the block (CVE-2012-2459). It
/// is sometimes possible for an attacker to create multiple distinct sets of
/// transactions with the same Merkle root, although only one set will be
/// valid.
///
/// # Malleability
///
/// The Bitcoin source code contains the following note:
///
/// > WARNING! If you're reading this because you're learning about crypto
/// > and/or designing a new system that will use merkle trees, keep in mind
/// > that the following merkle tree algorithm has a serious flaw related to
/// > duplicate txids, resulting in a vulnerability (CVE-2012-2459).
/// > The reason is that if the number of hashes in the list at a given time
/// > is odd, the last one is duplicated before computing the next level (which
/// > is unusual in Merkle trees). This results in certain sequences of
/// > transactions leading to the same merkle root. For example, these two
/// > trees:
/// >
/// > ```ascii
/// > A A
/// > / \ / \
/// > B C B C
/// > / \ | / \ / \
/// > D E F D E F F
/// > / \ / \ / \ / \ / \ / \ / \
/// > 1 2 3 4 5 6 1 2 3 4 5 6 5 6
/// > ```
/// >
/// > for transaction lists [1,2,3,4,5,6] and [1,2,3,4,5,6,5,6] (where 5 and
/// > 6 are repeated) result in the same root hash A (because the hash of both
/// > of (F) and (F,F) is C).
/// >
/// > The vulnerability results from being able to send a block with such a
/// > transaction list, with the same merkle root, and the same block hash as
/// > the original without duplication, resulting in failed validation. If the
/// > receiving node proceeds to mark that block as permanently invalid
/// > however, it will fail to accept further unmodified (and thus potentially
/// > valid) versions of the same block. We defend against this by detecting
/// > the case where we would hash two identical hashes at the end of the list
/// > together, and treating that identically to the block having an invalid
/// > merkle root. Assuming no double-SHA256 collisions, this will detect all
/// > known ways of changing the transactions without affecting the merkle
/// > root.
///
/// This vulnerability does not apply to Zebra, because it does not store invalid
/// data on disk, and because it does not permanently fail blocks or use an
/// aggressive anti-DoS mechanism.
#[derive(Clone, Copy, Eq, PartialEq, Serialize, Deserialize)]
#[cfg_attr(any(test, feature = "proptest-impl"), derive(Arbitrary))]
pub struct Root(pub [u8; 32]);

impl<Transaction> ZcashSerialize for Tree<Transaction> {
fn zcash_serialize<W: io::Write>(&self, _writer: W) -> Result<(), io::Error> {
unimplemented!();
impl fmt::Debug for Root {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_tuple("Root").field(&hex::encode(&self.0)).finish()
}
}

impl<Transaction> ZcashDeserialize for Tree<Transaction> {
fn zcash_deserialize<R: io::Read>(_reader: R) -> Result<Self, SerializationError> {
unimplemented!();
fn hash(h1: &[u8; 32], h2: &[u8; 32]) -> [u8; 32] {
let mut w = sha256d::Writer::default();
w.write_all(h1).unwrap();
w.write_all(h2).unwrap();
w.finish()
}

impl<T> std::iter::FromIterator<T> for Root
where
T: std::convert::AsRef<Transaction>,
{
fn from_iter<I>(transactions: I) -> Self
where
I: IntoIterator<Item = T>,
{
transactions
.into_iter()
.map(|tx| tx.as_ref().hash())
.collect()
}
}

/// A SHA-256d hash of the root node of a merkle tree of SHA256-d
/// hashed transactions in a block.
#[derive(Clone, Copy, Eq, PartialEq, Serialize, Deserialize)]
#[cfg_attr(any(test, feature = "proptest-impl"), derive(Arbitrary))]
pub struct Root(pub [u8; 32]);
impl std::iter::FromIterator<transaction::Hash> for Root {
fn from_iter<I>(hashes: I) -> Self
where
I: IntoIterator<Item = transaction::Hash>,
{
let mut hashes = hashes.into_iter().map(|hash| hash.0).collect::<Vec<_>>();

impl From<Tree<Transaction>> for Root {
fn from(merkle_tree: Tree<Transaction>) -> Self {
let mut hash_writer = sha256d::Writer::default();
merkle_tree
.zcash_serialize(&mut hash_writer)
.expect("Sha256dWriter is infallible");
Self(hash_writer.finish())
while hashes.len() > 1 {
hashes = hashes
.chunks(2)
.map(|chunk| match chunk {
[h1, h2] => hash(h1, h2),
[h1] => hash(h1, h1),
_ => unreachable!("chunks(2)"),
})
.collect();
}

Self(hashes[0])
}
}

impl fmt::Debug for Root {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_tuple("Root").field(&hex::encode(&self.0)).finish()
#[cfg(test)]
mod tests {
use super::*;

use crate::{block::Block, serialization::ZcashDeserialize};

#[test]
fn block_test_vectors() {
for block_bytes in zebra_test::vectors::BLOCKS.iter() {
let block = Block::zcash_deserialize(&**block_bytes).unwrap();
let merkle_root = block.transactions.iter().collect::<Root>();
teor2345 marked this conversation as resolved.
Show resolved Hide resolved
assert_eq!(
merkle_root,
block.header.merkle_root,
"block: {:?} {:?} transaction hashes: {:?}",
block.coinbase_height().unwrap(),
block.hash(),
block
.transactions
.iter()
.map(|tx| tx.hash())
.collect::<Vec<_>>()
);
}
}
}
19 changes: 17 additions & 2 deletions zebra-chain/src/block/tests/vectors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,28 @@ fn deserialize_block() {
.zcash_deserialize_into::<Block>()
.expect("block test vector should deserialize");

for block in zebra_test::vectors::BLOCKS.iter() {
block
for block_bytes in zebra_test::vectors::BLOCKS.iter() {
let block = block_bytes
.zcash_deserialize_into::<Block>()
.expect("block is structurally valid");

let round_trip_bytes = block
.zcash_serialize_to_vec()
.expect("vec serialization is infallible");

assert_eq!(&round_trip_bytes[..], *block_bytes);
}
}

#[test]
fn coinbase_parsing_rejects_above_0x80() {
zebra_test::init();

zebra_test::vectors::BAD_BLOCK_MAINNET_202_BYTES
.zcash_deserialize_into::<Block>()
.expect_err("parsing fails");
}

#[test]
fn block_test_vectors_unique() {
zebra_test::init();
Expand Down
36 changes: 21 additions & 15 deletions zebra-chain/src/transparent/serialize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,20 @@ fn parse_coinbase_height(
Height((op_n - 0x50) as u32),
CoinbaseData(data.split_off(1)),
)),
// Blocks 17 through 256 exclusive encode block height with the `0x01` opcode.
(Some(0x01), len) if len >= 2 => {
// Blocks 17 through 128 exclusive encode block height with the `0x01` opcode.
// The Bitcoin encoding requires that the most significant byte is below 0x80.
(Some(0x01), len) if len >= 2 && data[1] < 0x80 => {
Ok((Height(data[1] as u32), CoinbaseData(data.split_off(2))))
}
// Blocks 256 through 65536 exclusive encode block height with the `0x02` opcode.
(Some(0x02), len) if len >= 3 => Ok((
// Blocks 128 through 32768 exclusive encode block height with the `0x02` opcode.
// The Bitcoin encoding requires that the most significant byte is below 0x80.
(Some(0x02), len) if len >= 3 && data[2] < 0x80 => Ok((
Height(data[1] as u32 + ((data[2] as u32) << 8)),
CoinbaseData(data.split_off(3)),
)),
// Blocks 65536 through 2**24 exclusive encode block height with the `0x03` opcode.
(Some(0x03), len) if len >= 4 => Ok((
// Blocks 65536 through 2**23 exclusive encode block height with the `0x03` opcode.
// The Bitcoin encoding requires that the most significant byte is below 0x80.
(Some(0x03), len) if len >= 4 && data[3] < 0x80 => Ok((
Height(data[1] as u32 + ((data[2] as u32) << 8) + ((data[3] as u32) << 16)),
CoinbaseData(data.split_off(4)),
)),
Expand All @@ -82,7 +85,8 @@ fn parse_coinbase_height(
Ok((Height(0), CoinbaseData(data)))
}
// As noted above, this is included for completeness.
(Some(0x04), len) if len >= 5 => {
// The Bitcoin encoding requires that the most significant byte is below 0x80.
(Some(0x04), len) if len >= 5 && data[4] < 0x80 => {
let h = data[1] as u32
+ ((data[2] as u32) << 8)
+ ((data[3] as u32) << 16)
Expand All @@ -106,13 +110,13 @@ fn coinbase_height_len(height: block::Height) -> usize {
0
} else if let _h @ 1..=16 = height.0 {
1
} else if let _h @ 17..=255 = height.0 {
} else if let _h @ 17..=127 = height.0 {
2
} else if let _h @ 256..=65535 = height.0 {
} else if let _h @ 128..=32767 = height.0 {
3
} else if let _h @ 65536..=16_777_215 = height.0 {
} else if let _h @ 32768..=8_388_607 = height.0 {
4
} else if let _h @ 16_777_216..=block::Height::MAX_AS_U32 = height.0 {
} else if let _h @ 8_388_608..=block::Height::MAX_AS_U32 = height.0 {
5
} else {
panic!("Invalid coinbase height");
Expand All @@ -122,22 +126,24 @@ fn coinbase_height_len(height: block::Height) -> usize {
fn write_coinbase_height<W: io::Write>(height: block::Height, mut w: W) -> Result<(), io::Error> {
// We can't write this as a match statement on stable until exclusive range
// guards are stabilized.
// The Bitcoin encoding requires that the most significant byte is below 0x80,
// so the ranges run up to 2^{n-1} rather than 2^n.
if let 0 = height.0 {
// Genesis block does not include height.
} else if let h @ 1..=16 = height.0 {
w.write_u8(0x50 + (h as u8))?;
} else if let h @ 17..=255 = height.0 {
} else if let h @ 17..=127 = height.0 {
w.write_u8(0x01)?;
w.write_u8(h as u8)?;
} else if let h @ 256..=65535 = height.0 {
} else if let h @ 128..=32767 = height.0 {
w.write_u8(0x02)?;
w.write_u16::<LittleEndian>(h as u16)?;
} else if let h @ 65536..=16_777_215 = height.0 {
} else if let h @ 32768..=8_388_607 = height.0 {
w.write_u8(0x03)?;
w.write_u8(h as u8)?;
w.write_u8((h >> 8) as u8)?;
w.write_u8((h >> 16) as u8)?;
} else if let h @ 16_777_216..=block::Height::MAX_AS_U32 = height.0 {
} else if let h @ 8_388_608..=block::Height::MAX_AS_U32 = height.0 {
w.write_u8(0x04)?;
w.write_u32::<LittleEndian>(h)?;
} else {
Expand Down
Loading