Hexary trie prunes when it shouldn't #92

hyperevo · 2019-08-15T17:57:44Z

Version: 1.4.0
Python: 3.6
OS: debian linux

What was wrong?

Trying to create a trie of a list of identical hashes, using make_trie_root_and_nodes from py-evm, and I get the exception: trie.exceptions.MissingTrieNode: Trie database is missing hash.

This only occurs if the HexaryTrie has prune = True.

It looks like a leaf was pruned when it shouldn't have been.

The exception occurs on the 129th iteration of the loop.

What did you expect it to do?

Expected it to add the hashes to the trie database like normal so that I can get a root hash and nodes.

Code to reproduce the error

    import rlp
    from trie import HexaryTrie
    ZERO_HASH32 = 32 * b'\x00'
    BLANK_ROOT_HASH = b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n\x5bH\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
    
    
    def make_trie_root_and_nodes( items):
        return _make_trie_root_and_nodes(tuple(rlp.encode(item) for item in items))
    
    def _make_trie_root_and_nodes(items):
        kv_store = {}
        trie = HexaryTrie(kv_store, BLANK_ROOT_HASH, prune=True)
        memory_trie = trie
        for index, item in enumerate(items):
            index_key = rlp.encode(index, sedes=rlp.sedes.big_endian_int)
            memory_trie[index_key] = item
        return trie.root_hash, kv_store
    
    def test_trie_root():
        list_of_bytes = []
        for i in range(129):
            random_bytes = ZERO_HASH32
            list_of_bytes.append(random_bytes)
    
        print(_make_trie_root_and_nodes(list_of_bytes))
    
    test_trie_root()

hyperevo · 2019-08-15T18:00:14Z

@carver

carver · 2019-08-15T18:38:41Z

As a workaround, you should be able to get it working with squash_changes() instead, like:

import rlp
from trie import HexaryTrie
ALL_ZEROS = b'\x00' * 32

def add_values_to_trie(trie, values):
    for index, value in enumerate(values):
        index_key = rlp.encode(index, sedes=rlp.sedes.big_endian_int)
        trie[index_key] = value

def make_trie_root_and_nodes(values):
    kv_store = {}
    trie = HexaryTrie(kv_store)
    with trie.squash_changes() as memory_trie:
        add_values_to_trie(memory_trie, values)
    return trie.root_hash, kv_store

make_trie_root_and_nodes([ALL_ZEROS] * 128)

hyperevo · 2019-08-15T20:03:17Z

Unfortunately that throws the same exception because squash_changes() also sets prune=True:

def squash_changes(self):
    scratch_db = ScratchDB(self.db)
    with scratch_db.batch_commit(do_deletes=self.is_pruning):
        memory_trie = type(self)(scratch_db, self.root_hash, prune=True)
        yield memory_trie
    try:
        self.root_node = memory_trie.root_node
    except MissingTrieNode:
        # if the new root node is missing,
        #   (or no changes happened in a squash trie where the old root node was missing),
        #   then we shouldn't crash here
        self.root_hash = memory_trie.root_hash

If you run your workaround, but with 129 hashes instead of 128, it throws the same exception:

make_trie_root_and_nodes([ALL_ZEROS] * 129)

carver · 2019-08-15T23:51:37Z

Ah, great. Off-by-one will get you every time. I have it reproduced locally, and will take a look tomorrow.

carver · 2019-08-16T19:51:50Z

Yup, when storing identical values, with pruning on, this is bound to happen. I don't see a clean, simple way around this problem. Roughly, it's that you can have multiple subtrees that are identical, so removing one kills the other. In the following tree, if different letters are different hashes:

A --- B --- C
|     |
|     |
|     D
|
E --- C
|
F

Removing C (which can happen by adding a new value with the same prefix, for example), will cause the key C to get deleted with pruning on. That will make the other subtree with the C hash inaccessible.

Reference counting seems like the obvious solution to this. I don't think it needs to be an enormous change, although sometimes you don't know until you dig in. I don't expect it to be a tiny change. It will have a performance impact as well, especially if you want to enable pruning when starting from a non-empty trie. (which means you have to store the reference-counting to the database)

You shouldn't run into this if you are always adding unique values, so we may add some kind of uniqueness option that would allow you to prune at the same speed as today by ignoring ref-counting.

carver · 2019-08-16T20:48:38Z

Also @hyperevo you mentioned that this was happening in py-evm:

It actually came up when I was creating the trie root and nodes for a list of transactions in a block.

I am not sure how this could happen, because you aren't allowed to insert duplicates of the same transaction into the block. Was this happening even without a duplicate value?

carver · 2019-08-16T23:53:14Z

I expect #93 to resolve the problem. It would be great if you could pull my branch to test it out.

hyperevo · 2019-08-17T00:29:20Z

Thanks for such a quick response! I will try your fix asap.

hyperevo · 2019-08-22T21:04:55Z

Sorry for the delay. I got caught up with other work. I tried your solution and it works. Thank you for that fix!

minor formatting updates, remove additional docs to separate pr

carver mentioned this issue Aug 16, 2019

Use reference counting to avoid over-pruning trie #93

Merged

carver closed this as completed in #93 Aug 23, 2019

pacrob added a commit to pacrob/py-trie that referenced this issue May 12, 2023

Merge pull request ethereum#92 from pacrob/formatting-fixes

f69bc74

minor formatting updates, remove additional docs to separate pr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hexary trie prunes when it shouldn't #92

Hexary trie prunes when it shouldn't #92

hyperevo commented Aug 15, 2019 •

edited by carver

Loading

hyperevo commented Aug 15, 2019

carver commented Aug 15, 2019 •

edited

Loading

hyperevo commented Aug 15, 2019 •

edited

Loading

carver commented Aug 15, 2019

carver commented Aug 16, 2019

carver commented Aug 16, 2019

carver commented Aug 16, 2019

hyperevo commented Aug 17, 2019

hyperevo commented Aug 22, 2019

Hexary trie prunes when it shouldn't #92

Hexary trie prunes when it shouldn't #92

Comments

hyperevo commented Aug 15, 2019 • edited by carver Loading

What was wrong?

What did you expect it to do?

Code to reproduce the error

hyperevo commented Aug 15, 2019

carver commented Aug 15, 2019 • edited Loading

hyperevo commented Aug 15, 2019 • edited Loading

carver commented Aug 15, 2019

carver commented Aug 16, 2019

carver commented Aug 16, 2019

carver commented Aug 16, 2019

hyperevo commented Aug 17, 2019

hyperevo commented Aug 22, 2019

hyperevo commented Aug 15, 2019 •

edited by carver

Loading

carver commented Aug 15, 2019 •

edited

Loading

hyperevo commented Aug 15, 2019 •

edited

Loading