Use reference counting to avoid over-pruning trie #93

carver · 2019-08-16T23:47:18Z

What was wrong?

Fixes #92

There is an example of where the current pruner incorrectly removes a
necessary node hash, because it got duplicated. See the new test:
test_hexary_trie_avoid_over_pruning()

How was it fixed?

Added an incremental reference counter, which only prunes nodes when the
number of usages drops to zero. It only works on fresh databases. With
existing databases, it would still delete required nodes.

Also, it skips an unnecessary database write, which would incorrectly
increment the reference counter.

Cute Animal Picture

pipermerriam

Am I correct that this will not work unless the database is empty when given to the Trie class? I.E. we cannot use this in trinity with our level db database?

carver · 2019-08-22T23:10:41Z

Am I correct that this will not work unless the database is empty when given to the Trie class? I.E. we cannot use this in trinity with our level db database?

Exactly, that's what I was trying to get at with the docstring:

Important note about Pruning:

Pruning against an existing database, with duplicate values, will
delete important data. Only turn on pruning with a fresh database.
If working with an existing database, use the :meth:squash_changes
context manager instead of turning pruning on directly.

Any idea how to say that more clearly?

FWIW, not being able to prune an on-disk DB was already an issue. It's not made worse by the current PR (hopefully made a little better by at least mentioning it in a docstring). In practice, we have only ever used pruning for a fresh in-memory database, or in a context like squash_changes() which will handle it correctly.

pipermerriam · 2019-08-22T23:13:30Z

I'm inclined to just deprecate and remove the pruning feature since:

databases that need pruning are the big ones.
I don't see a compelling use case for pruning in-memory databases

ergo... our pruning feature isn't useful and adding complexity to support it doesn't seem worth it.

carver · 2019-08-22T23:30:39Z

2. I don't see a compelling use case for pruning in-memory databases

Two come to mind:

It's used as part of squash_changes(). Since we don't have a batch-add, every time you add N values, you create N-1 state roots that you don't need anymore (plus all the downstream nodes that change). If we don't prune those out in memory, we would persist all of them to the database.
The tiny transaction and receipt tries are built in memory, using a pruning trie. For basically the same reason as 1, they would write a bunch of unnecessary trie nodes to disk if we didn't prune along the way.

It's possible that we might be able to hide away the prune keyword somehow and force squash_changes as the only approach to do that. I think we could cover both use cases that way, but I'm not sure how it would look to launch a pruning trie from inside squash_changes() (which uses the prune keyword internally right now).

pipermerriam · 2019-08-23T00:27:19Z

It's possible that we might be able to hide away the prune keyword somehow and force squash_changes as the only approach to do that.

This direction seems preferable.

There is an example of where the current pruner incorrectly removes a necessary node hash, because it got duplicated. See the new test: test_hexary_trie_avoid_over_pruning() Added an incremental reference counter, which only prunes nodes when the number of usages drops to zero. It only works on fresh databases. With existing databases, it would still delete required nodes. Also, skip an unnecessary database write, which now incorrectly increments the reference counter, and causes a failure in the reference counter.

carver · 2019-08-23T17:30:56Z

It's possible that we might be able to hide away the prune keyword somehow and force squash_changes as the only approach to do that.

This direction seems preferable.

Cool, I noted it as not for external usage, and that it is likely to be deprecated. I also added an issue to change how pruning is enabled, to make it an internal API.

pipermerriam · 2019-08-23T17:33:19Z

trie/hexary.py

+            self._prune_key(prune_key)
+
+    def _prune_key(self, key):
+        self.ref_count[key] -= 1


Maybe this should have an existence check since if the key were to not be in the database this would end up with the refcount being negative.

…sion bump sphinx version and set py version rtd uses to 3.8

carver mentioned this pull request Aug 16, 2019

Hexary trie prunes when it shouldn't #92

Closed

carver requested a review from pipermerriam August 16, 2019 23:53

pipermerriam reviewed Aug 22, 2019

View reviewed changes

carver force-pushed the pruning-with-ref-counter branch from 61e6ec3 to ff9d79e Compare August 23, 2019 17:26

carver mentioned this pull request Aug 23, 2019

Remove prune flag as an external API #94

Open

pipermerriam approved these changes Aug 23, 2019

View reviewed changes

Immediately catch if the reference count goes < 0

22e21fd

carver force-pushed the pruning-with-ref-counter branch from 8f7a10b to 22e21fd Compare August 23, 2019 20:56

carver merged commit 3be5646 into ethereum:master Aug 23, 2019

carver deleted the pruning-with-ref-counter branch August 23, 2019 21:09

pacrob added a commit to pacrob/py-trie that referenced this pull request May 12, 2023

Merge pull request ethereum#93 from pacrob/bump-sphinx-and-rtd-py-ver…

3b9f8c2

…sion bump sphinx version and set py version rtd uses to 3.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use reference counting to avoid over-pruning trie #93

Use reference counting to avoid over-pruning trie #93

carver commented Aug 16, 2019 •

edited

Loading

pipermerriam left a comment

carver commented Aug 22, 2019

pipermerriam commented Aug 22, 2019

carver commented Aug 22, 2019 •

edited

Loading

pipermerriam commented Aug 23, 2019

carver commented Aug 23, 2019

pipermerriam Aug 23, 2019

Use reference counting to avoid over-pruning trie #93

Use reference counting to avoid over-pruning trie #93

Conversation

carver commented Aug 16, 2019 • edited Loading

What was wrong?

How was it fixed?

Cute Animal Picture

pipermerriam left a comment

Choose a reason for hiding this comment

carver commented Aug 22, 2019

pipermerriam commented Aug 22, 2019

carver commented Aug 22, 2019 • edited Loading

pipermerriam commented Aug 23, 2019

carver commented Aug 23, 2019

pipermerriam Aug 23, 2019

Choose a reason for hiding this comment

carver commented Aug 16, 2019 •

edited

Loading

carver commented Aug 22, 2019 •

edited

Loading