Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hexary trie prunes when it shouldn't #92

Closed
hyperevo opened this issue Aug 15, 2019 · 9 comments · Fixed by #93
Closed

Hexary trie prunes when it shouldn't #92

hyperevo opened this issue Aug 15, 2019 · 9 comments · Fixed by #93

Comments

@hyperevo
Copy link

hyperevo commented Aug 15, 2019

  • Version: 1.4.0
  • Python: 3.6
  • OS: debian linux

What was wrong?

Trying to create a trie of a list of identical hashes, using make_trie_root_and_nodes from py-evm, and I get the exception: trie.exceptions.MissingTrieNode: Trie database is missing hash.

This only occurs if the HexaryTrie has prune = True.

It looks like a leaf was pruned when it shouldn't have been.

The exception occurs on the 129th iteration of the loop.

What did you expect it to do?

Expected it to add the hashes to the trie database like normal so that I can get a root hash and nodes.

Code to reproduce the error

    import rlp
    from trie import HexaryTrie
    ZERO_HASH32 = 32 * b'\x00'
    BLANK_ROOT_HASH = b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n\x5bH\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
    
    
    def make_trie_root_and_nodes( items):
        return _make_trie_root_and_nodes(tuple(rlp.encode(item) for item in items))
    
    def _make_trie_root_and_nodes(items):
        kv_store = {}
        trie = HexaryTrie(kv_store, BLANK_ROOT_HASH, prune=True)
        memory_trie = trie
        for index, item in enumerate(items):
            index_key = rlp.encode(index, sedes=rlp.sedes.big_endian_int)
            memory_trie[index_key] = item
        return trie.root_hash, kv_store
    
    def test_trie_root():
        list_of_bytes = []
        for i in range(129):
            random_bytes = ZERO_HASH32
            list_of_bytes.append(random_bytes)
    
        print(_make_trie_root_and_nodes(list_of_bytes))
    
    test_trie_root()
@hyperevo
Copy link
Author

@carver

@carver
Copy link
Contributor

carver commented Aug 15, 2019

As a workaround, you should be able to get it working with squash_changes() instead, like:

import rlp
from trie import HexaryTrie
ALL_ZEROS = b'\x00' * 32

def add_values_to_trie(trie, values):
    for index, value in enumerate(values):
        index_key = rlp.encode(index, sedes=rlp.sedes.big_endian_int)
        trie[index_key] = value

def make_trie_root_and_nodes(values):
    kv_store = {}
    trie = HexaryTrie(kv_store)
    with trie.squash_changes() as memory_trie:
        add_values_to_trie(memory_trie, values)
    return trie.root_hash, kv_store

make_trie_root_and_nodes([ALL_ZEROS] * 128)

@hyperevo
Copy link
Author

hyperevo commented Aug 15, 2019

Unfortunately that throws the same exception because squash_changes() also sets prune=True:

def squash_changes(self):
    scratch_db = ScratchDB(self.db)
    with scratch_db.batch_commit(do_deletes=self.is_pruning):
        memory_trie = type(self)(scratch_db, self.root_hash, prune=True)
        yield memory_trie
    try:
        self.root_node = memory_trie.root_node
    except MissingTrieNode:
        # if the new root node is missing,
        #   (or no changes happened in a squash trie where the old root node was missing),
        #   then we shouldn't crash here
        self.root_hash = memory_trie.root_hash

If you run your workaround, but with 129 hashes instead of 128, it throws the same exception:

make_trie_root_and_nodes([ALL_ZEROS] * 129)

@carver
Copy link
Contributor

carver commented Aug 15, 2019

Ah, great. Off-by-one will get you every time. I have it reproduced locally, and will take a look tomorrow.

@carver
Copy link
Contributor

carver commented Aug 16, 2019

Yup, when storing identical values, with pruning on, this is bound to happen. I don't see a clean, simple way around this problem. Roughly, it's that you can have multiple subtrees that are identical, so removing one kills the other. In the following tree, if different letters are different hashes:

A --- B --- C
|     |
|     |
|     D
|
E --- C
|
F

Removing C (which can happen by adding a new value with the same prefix, for example), will cause the key C to get deleted with pruning on. That will make the other subtree with the C hash inaccessible.

Reference counting seems like the obvious solution to this. I don't think it needs to be an enormous change, although sometimes you don't know until you dig in. I don't expect it to be a tiny change. It will have a performance impact as well, especially if you want to enable pruning when starting from a non-empty trie. (which means you have to store the reference-counting to the database)

You shouldn't run into this if you are always adding unique values, so we may add some kind of uniqueness option that would allow you to prune at the same speed as today by ignoring ref-counting.

@carver
Copy link
Contributor

carver commented Aug 16, 2019

Also @hyperevo you mentioned that this was happening in py-evm:

It actually came up when I was creating the trie root and nodes for a list of transactions in a block.

I am not sure how this could happen, because you aren't allowed to insert duplicates of the same transaction into the block. Was this happening even without a duplicate value?

@carver
Copy link
Contributor

carver commented Aug 16, 2019

I expect #93 to resolve the problem. It would be great if you could pull my branch to test it out.

@hyperevo
Copy link
Author

Thanks for such a quick response! I will try your fix asap.

@hyperevo
Copy link
Author

Sorry for the delay. I got caught up with other work. I tried your solution and it works. Thank you for that fix!

pacrob added a commit to pacrob/py-trie that referenced this issue May 12, 2023
minor formatting updates, remove additional docs to separate pr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants