Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements 'generational' GC by only writing out every nth trie root #14952

Closed
wants to merge 3 commits into from

Conversation

Arachnid
Copy link
Contributor

@Arachnid Arachnid commented Aug 9, 2017

No description provided.

@@ -616,7 +616,13 @@ func (s *StateDB) CommitTo(dbw trie.DatabaseWriter, deleteEmptyObjects bool) (ro
delete(s.stateObjectsDirty, addr)
}
// Write trie changes.
s.db.SetBlockNumber(blockNumber)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner to pass blockNumber as a parameter to CommitTo instead of having that rely on a pre-set field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, there are so many CommitTo methods I lost track of which calls which

@karalabe
Copy link
Member

Dump from Gitter so it doesn't get lost:

I think I found the reason why your PR gets slow. The trie is caching two things for each node: the hash (which gets computed the fist time its needed), and a dirty flag, which tracks if it was written to the database already. In the current code, comitting a trie writes it to the database, so all dirty nodes get saved and cleaned; after which the trie is cached for later reuse (e.g. next block).

Your code however only hashes the trie, but defers commit to a later point, so all the dirty nodes are left dirty. The issue is, that the trie is cached dirty, so even if later the past trie is written out to the database, all subsequent uses will still use the dirty cached one; which essentially means, that every trie commit writes to the database all past nodes too, so in time, the database writes compound to crazy amounts.

This might explain why your import was so slow and why it used so much memory since it kept everything in a dirty state and written everything out all the time over and over again. I'm not aware of an easy fix for this though, because the trie is immutable, whereas a deferred write needs to be able to update the nodes' dirty flag later, after having new references to the nodes.

You can find below the hack to prove the above hunch, and here are the logs with the detailed import stats for the first 100K blocks on master, your PR and my hack.

diff --git a/trie/hasher.go b/trie/hasher.go
index 672456e..297b116 100644
--- a/trie/hasher.go
+++ b/trie/hasher.go
@@ -49,6 +49,8 @@ func returnHasherToPool(h *hasher) {
        hasherPool.Put(h)
 }

+var clean = make(map[common.Hash]bool)
+
 // hash collapses a node down into a hash node, also returning a copy of the
 // original node initialized with the computed hash to replace the original one.
 func (h *hasher) hash(n node, db DatabaseWriter, force bool) (node, node, error) {
@@ -63,7 +65,7 @@ func (h *hasher) hash(n node, db DatabaseWriter, force bool) (node, node, error)
                        cacheUnloadCounter.Inc(1)
                        return hash, hash, nil
                }
-               if !dirty {
+               if !dirty || clean[common.BytesToHash(hash)] {
                        return hash, n, nil
                }
        }
@@ -167,6 +169,7 @@ func (h *hasher) store(n node, db DatabaseWriter, force bool) (node, error) {
                hash = hashNode(h.sha.Sum(nil))
        }
        if db != nil {
+               clean[common.BytesToHash(hash)] = true
                return hash, db.Put(hash, h.tmp.Bytes())
        }
        return hash, nil

@karalabe
Copy link
Member

Closing this as being problematic for both fast sync and also the issue with the commits. AFAIK there's a new work being done in this front, lets focus on that.

@karalabe karalabe closed this Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants