-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic state snapshots #20152
Dynamic state snapshots #20152
Conversation
core/state/snapshot/disklayer.go
Outdated
} | ||
// Cache doesn't contain account, pull from disk and cache for later | ||
blob := rawdb.ReadAccountSnapshot(dl.db, hash) | ||
dl.cache.Set(key, blob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on whether we really should cache nil-items here...
Hi Sam
…On Thu, Oct 10, 2019, 18:29 Martin Holst Swende ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In core/state/snapshot/account.go
<#20152 (comment)>
:
> +// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.
+
+package snapshot
+
+import (
+ "bytes"
+ "math/big"
+
+ "github.com/ethereum/go-ethereum/common"
+ "github.com/ethereum/go-ethereum/rlp"
+)
+
+// Account is a slim version of a state.Account, where the root and code hash
+// are replaced with a nil byte slice for empty accounts.
+type Account struct {
+ Nonce uint64
Considering that there's only been 500M transactions made, do we really
need uint64 here? A uint32 would be 'safe' through 4.2 billion
transactions, at least... ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#20152?email_source=notifications&email_token=AAA7UGPNFZXNCP2IA3G5VYLQN3YX7A5CNFSM4I5ND4EKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCHQKAHQ#pullrequestreview-299933726>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA7UGKCCASTNWOA66YFF3DQN3YX7ANCNFSM4I5ND4EA>
.
|
Another couple of points,
Now, if we have
|
core/state/snapshot/difflayer.go
Outdated
dl.lock.Lock() | ||
defer dl.lock.Unlock() | ||
|
||
dl.parent = parent.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need some smarter locking here. If the parents have side branches, those don't get locked by the child's lock.
core/state/snapshot/snapshot.go
Outdated
// Snapshot represents the functionality supported by a snapshot storage layer. | ||
type Snapshot interface { | ||
// Info returns the block number and root hash for which this snapshot was made. | ||
Info() (uint64, common.Hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be 'nicer' if the Info
returns the span of blocks that a snapshot represents, and not just the last block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually thinking of nuking the whole number tracking. Clique gets messy because roots are not unique across blocks. We could make it root + number, but that would entail statedb needing to add block numbers to every single method, which gets messy fast.
Alternatively, we can just not care about block number and instead just track child -> parent hierarchies. This would mean that we could end up with a lot more state "cached" in memory than 128 if there are ranges of empty clique blocks, but those wouldn't add any new structs, just keep the existing ones around for longer, so should be fine.
func (dl *diffLayer) Journal() error { | ||
dl.lock.RLock() | ||
defer dl.lock.RUnlock() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should check dl.stale
here and error out if set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, probably better to do it in journal
, since one of the parents might be stale, not just this level
core/state/snapshot/snapshot.go
Outdated
// If we still have diff layers below, recurse | ||
if parent, ok := diff.parent.(*diffLayer); ok { | ||
return st.cap(parent, layers-1, memory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that it's standalone and not internal to a difflayer, would be nicer to iterate instead of recurse, imo
} | ||
writer = file | ||
} | ||
// Everything below was journalled, persist this layer too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Everything below was journalled, persist this layer too | |
if dl.stale{ | |
return nil, ErrSnapshotStale | |
} | |
// Everything below was journalled, persist this layer too |
} | ||
// If we haven't reached the bottom yet, journal the parent first | ||
if writer == nil { | ||
file, err := dl.parent.(*diffLayer).journal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we obtain the readlock in Journal
-- but here we're calling the parent journal
directly, bypassing the lock-taking.
We should remove the locking from Journal
and do it in this method instead, right after the parent is done writing
0b8d955
to
0f86812
Compare
|
||
// If the layer is being generated, ensure the requested hash has already been | ||
// covered by the generator. | ||
if dl.genMarker != nil && bytes.Compare(key, dl.genMarker) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work, but would be more obvious with the following change:
if dl.genMarker != nil && bytes.Compare(key, dl.genMarker) > 0 { | |
if dl.genMarker != nil && bytes.Compare(accountHash[:], dl.genMarker) > 0 { |
Answering myself: we the new diskLayer doesn't contain it, so the caller will later have to resolve it from the trie, which is fine... I think? |
core/state/snapshot/generate.go
Outdated
|
||
speed := done/uint64(time.Since(gs.start)/time.Millisecond+1) + 1 // +1s to avoid division by zero | ||
ctx = append(ctx, []interface{}{ | ||
"eta", common.PrettyDuration(time.Duration(left/speed) * time.Millisecond), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add percentage
too? I think that's a pretty user-friendly thing to have. Basically 100 * binary.BigEndian.Uint32(marker[:4]) / uint32(-1))
core/state/statedb.go
Outdated
@@ -468,6 +466,10 @@ func (s *StateDB) updateStateObject(obj *stateObject) { | |||
|
|||
// If state snapshotting is active, cache the data til commit | |||
if s.snap != nil { | |||
// If the account is an empty resurrection, unmark the storage nil-ness | |||
if storage, ok := s.snapStorage[obj.addrHash]; storage == nil && ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is correct. There are a couple of things that can happen:
-
Old code (before
CREATE2
): A contract with storage is selfdestructed in txn
. In txn+1
, someone sends a wei to the address, and the account is recreated. The desired end-state should be that the storage has becomenil
and the account exists. -
New code, withe CREATE2: A contract is killed in tx
n
. In txn+1
, the contract is recreated, and the initcode sets new storaege slots. So the old storage slots are all cleared, and there are now new storage slots set. We need to handle this (we don't currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a couple comments here and there, as well as a question about future-proofing the PR.
Account(hash common.Hash) (*Account, error) | ||
|
||
// AccountRLP directly retrieves the account RLP associated with a particular | ||
// hash in the snapshot slim data format. | ||
AccountRLP(hash common.Hash) ([]byte, error) | ||
|
||
// Storage directly retrieves the storage data associated with a particular hash, | ||
// within a particular account. | ||
Storage(accountHash, storageHash common.Hash) ([]byte, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a showstopper for merging the PR, merely a question regarding future evolutions of Ethereum: there is an EIP (haven't found it yet) that suggests merging the account and storage tries. This kept recurring in stateless 1.x discussions. It would be a good idea to have a more generic method like GetBlobAtHash(f) inerface{}
, taking a function f
to deserialize the blob.
// | ||
// Note, the method is an internal helper to avoid type switching between the | ||
// disk and diff layers. There is no locking involved. | ||
Parent() snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's an internal helper, then it shouldn't be public
// The goal of a state snapshot is twofold: to allow direct access to account and | ||
// storage data to avoid expensive multi-level trie lookups; and to allow sorted, | ||
// cheap iteration of the account/storage tries for sync aid. | ||
type Tree struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change name to match comment, or vice-versa.
* core/state/snapshot/iterator: fix two disk iterator flaws * core/rawdb: change SnapshotStoragePrefix to avoid prefix collision with preimagePrefix
As it's now released, wouldn't it make sense to update the Command Line Options wiki? The --snapshot switch is completely missing there |
@fruor No, the feature will be on by default soon-ish. We just don't want yet people to use it as it might still change a bit. It's there if someone wants to test it (e.g. our benchmarker runs). |
Note, this PR is semi-experimental work. All code included has been extensively tested on live nodes, but it is very very very sensitive code. As such, the PR hides the included logic behind
--snapshot
. We've decided to merge it to get the code onto master as it's closing in on the 6 month development mark already.This PR creates a secondary data structure for storing the Ethereum state, called a snapshot. This snapshot is special as it dynamically follows the chain and can also handle small-ish reorgs:
<hash> -> <account>
mapping for the account trie and<account-hash><slot-hash> -> <slot-value>
mapping for the storage tries. The layout permits fast iteration over the accounts and storage, which will be used for a new sync algorithm.The snapshot can be built fully online, during the live operation of a Geth node. This is harder than it seems because rebuilding the snapshot for mainnet takes 9 hours, during which the in-memory garbage collection long deletes the state needed for a single capture.
HEAD-128
and is capable of suspending/resuming if a state is missing (a restart will only write out some tries, not all cached in memory).The benefit of the snapshot is that it acts as an acceleration structure for state accesses:
O(log N)
disk reads (+leveldb overhead) to access an account / storage slot, the snapshot can provide direct,O(1)
access time. This should be a small improvement in block processing and a huge improvement ineth_call
evaluations.O(1)
complexity per entry + sequential disk access, which should enable remote nodes to retrieve state data significantly cheaper than before (the sort order is the state trie leaf order, so responses can directly be assembled into tries too).The downside of the snapshot is that the raw account and storage data is essentially duplicated. In the case of mainnet, this means an extra 15GB of SSD space used.