Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/state: set-based journalling #30660

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Oct 23, 2024

This is a second attempt at #30500 .

This PR introduces set-based journalling, where the journalling-events are basically stored in per-scoped maps.

Whenever we enter a new call scope, we create a new scoped journal. Changes within the same scoped journal overwrite eachother. For example: if a scope updates a balance to from 6 to 5 then 4, then 3, there will only be one preimage in the journal. As opposed to the old journal (linear_journal), which would have multiple entries: [ prev: 6, prev: 5, prev:4].

The linear / appending journal is wasteful on memory, and also slow on rollbacks, since each change is rolled back individually.


@karalabe reminded me of the burntpix benchmark, on which this PR excels, so I thought it was worth another chance.

master:

[user@work go-ethereum]$ ./evm_master run --prestate ./burntpix.json --receiver 0x49206861766520746F6F206D7563682074696D65  --input 0xa4de9ab4000000000000000000000000000000000000000000000000000000000F1FD58E000000000000000000000000000000000000000000000000000000000007A120 --bench | tail -c +131 | sed 's/[0]*$//' | xxd -r -p > output.svg
EVM gas used:    5642735088
execution time:  49.349209526s
allocations:     915684
allocated bytes: 175333368

This PR:

[user@work go-ethereum]$ ./evm_setjournal run --prestate ./burntpix.json --receiver 0x49206861766520746F6F206D7563682074696D65  --input 0xa4de9ab4000000000000000000000000000000000000000000000000000000000F1FD58E000000000000000000000000000000000000000000000000000000000007A120 --bench | tail -c +131 | sed 's/[0]*$//' | xxd -r -p > output.svg
EVM gas used:    5642735088
execution time:  48.740463584s
allocations:     30198
allocated bytes: 30308272

Allocations, 915K -> 30K,
Allocated bytes: 175M -> 30M

core/genesis.go Outdated Show resolved Hide resolved
core/genesis.go Outdated Show resolved Hide resolved
core/state/setjournal.go Outdated Show resolved Hide resolved
Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random nits for now

core/state/setjournal.go Outdated Show resolved Hide resolved
core/state/journal_api.go Outdated Show resolved Hide resolved
core/state/journal_api.go Outdated Show resolved Hide resolved
core/state/journal_api.go Outdated Show resolved Hide resolved
core/state/journal_api.go Outdated Show resolved Hide resolved
core/state/journal_api.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor Author

holiman commented Nov 6, 2024

Lifting this to top level

revertTOsnapshot and discardSnapshot seems asymmetric (one discards something, the other reverts up to something).

Would it make sense to unify these? I.e. have discard also take the parent until which to discard (if revert knows where to revert, discard should also know? shoudln't it be called in the same spot?).

Also, discard doesn't really need an id, does it, you will only ever discard the head?

So, now I have made (actually, still working on it) the api cleaner,

journal:

	// snapshot starts a new journal scope which can be reverted or discarded.
	// The lifeycle of journalling is as follows:
	// - snapshot() starts a 'scope'.
	// - The method snapshot() may be called any number of times.
	// - For each call to snapshot, there should be a corresponding call to end
	//  the scope via either of:
	//   - revertToSnapshot, which undoes the changes in the scope, or
	//   - discardSnapshot, which discards the ability to revert the changes in the scope.
	snapshot()

	// revertSnapshot reverts all state changes made since the last call to snapshot().
	revertSnapshot(s *StateDB)

	// discardSnapshot removes the latest snapshot; after calling this
	// method, it is no longer possible to revert to that particular snapshot, the
	// changes are considered part of the parent scope.
	discardSnapshot()

StateDB

	// Snapshot starts a new journalled scope.
	Snapshot()
	// RevertSnapshot reverts all state changes made in the most recent journalled scope.
	RevertSnapshot()
	// DiscardSnapshot removes the ability to roll back the changes in the most
	// recent journalled scope. After calling this method, the changes are considered
	// part of the parent scope.
	DiscardSnapshot()

This had quite a large fallout, and also made for some large changes (simplifications) inside the existing linear journal. The result is that after these changes, I am not 100% confident that it won't blow up in some corner-case where the journal is empty / reset, and something calls revert or discard. I intentionally did not check for out of bounds, because I want to locate the source of such flaws.

One such source is when the state processor applies a tx: it either becomes finalized, in which case the journal will be reset, since cross-tx reverting is not allowed. However, if it fails, then the caller (outside) needs to revert it. So that's one point where, although the API is symmetric, the call pattern is not.

So, my plan now: let the tests run, then put it on benchmarkers for a spin.

@holiman
Copy link
Contributor Author

holiman commented Nov 7, 2024

I tried to depoy this on bench06 without wiping it. cc @rjl493456442 I guess you were running a future db format on it? Hope I didn't break any important run.. ?

Nov 07 13:25:36 bench06.ethdevops.io geth INFO [11-07|12:25:36.509] Using pebble as the backing database
Nov 07 13:25:36 bench06.ethdevops.io geth INFO [11-07|12:25:36.509] Allocated cache and file handles database=/datadir/geth/geth/chaindata cache=2.00GiB handles=524,288
Nov 07 13:25:39 bench06.ethdevops.io geth Chain metadata
Nov 07 13:25:39 bench06.ethdevops.io geth databaseVersion: 9 (0x9)
Nov 07 13:25:39 bench06.ethdevops.io geth headBlockHash: 0xe637caa29a2821cffeafd8db1906e765b0be7e4d8350123e156d757d3db62b2f
Nov 07 13:25:39 bench06.ethdevops.io geth headFastBlockHash: 0xe637caa29a2821cffeafd8db1906e765b0be7e4d8350123e156d757d3db62b2f
Nov 07 13:25:39 bench06.ethdevops.io geth headHeaderHash: 0xe637caa29a2821cffeafd8db1906e765b0be7e4d8350123e156d757d3db62b2f
Nov 07 13:25:39 bench06.ethdevops.io geth lastPivotNumber: 20977634 (0x14017e2)
Nov 07 13:25:39 bench06.ethdevops.io geth len(snapshotSyncStatus): 285 bytes
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotDisabled: false
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotJournal: 7938463 bytes
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotRecoveryNumber: <nil>
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotRoot: 0x047f51ef93470331dee55a068d094b27ccf97dc6e5ce104ae99587bfefb8ff12
Nov 07 13:25:39 bench06.ethdevops.io geth txIndexTail: 18785755 (0x11ea5db)
Nov 07 13:25:39 bench06.ethdevops.io geth
SkeletonSyncStatus: {"Subchains":[{"Head":21135734,"Tail":21135734,"Next":"0x36399eb652517a6a8618352d09c8d1cdfdf63a5bda68e3dc4a20c5eb290a27a8"}],"Finalized":21050420}
Nov 07 13:25:40 bench06.ethdevops.io geth Fatal: Failed to register the Ethereum service: rlp: input list has too many elements for rawdb.freezerTableMeta
Nov 07 13:25:40 bench06.ethdevops.io geth Fatal: Failed to register the Ethereum service: rlp: input list has too many elements for rawdb.freezerTableMeta

@holiman
Copy link
Contributor Author

holiman commented Nov 7, 2024

Deploying now, so

  • bench05: this PR, but using linear_journal
  • bench06: this PR, but using setjournal

@karalabe
Copy link
Member

karalabe commented Nov 7, 2024 via email

@holiman
Copy link
Contributor Author

holiman commented Nov 13, 2024

Deploying now, so

* `bench05`: this PR, but using `linear_journal`

* `bench06`: this PR, but using `setjournal`

Has been running fine for ~5 days. Nothing noteworthy with regards to differences in charts on the two machines.

@rjl493456442
Copy link
Member

I tried to depoy this on bench06 without wiping it. cc @rjl493456442 I guess you were running a future db format on it? Hope I didn't break any important run.. ?

Nov 07 13:25:36 bench06.ethdevops.io geth INFO [11-07|12:25:36.509] Using pebble as the backing database
Nov 07 13:25:36 bench06.ethdevops.io geth INFO [11-07|12:25:36.509] Allocated cache and file handles database=/datadir/geth/geth/chaindata cache=2.00GiB handles=524,288
Nov 07 13:25:39 bench06.ethdevops.io geth Chain metadata
Nov 07 13:25:39 bench06.ethdevops.io geth databaseVersion: 9 (0x9)
Nov 07 13:25:39 bench06.ethdevops.io geth headBlockHash: 0xe637caa29a2821cffeafd8db1906e765b0be7e4d8350123e156d757d3db62b2f
Nov 07 13:25:39 bench06.ethdevops.io geth headFastBlockHash: 0xe637caa29a2821cffeafd8db1906e765b0be7e4d8350123e156d757d3db62b2f
Nov 07 13:25:39 bench06.ethdevops.io geth headHeaderHash: 0xe637caa29a2821cffeafd8db1906e765b0be7e4d8350123e156d757d3db62b2f
Nov 07 13:25:39 bench06.ethdevops.io geth lastPivotNumber: 20977634 (0x14017e2)
Nov 07 13:25:39 bench06.ethdevops.io geth len(snapshotSyncStatus): 285 bytes
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotDisabled: false
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotJournal: 7938463 bytes
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotRecoveryNumber: <nil>
Nov 07 13:25:39 bench06.ethdevops.io geth snapshotRoot: 0x047f51ef93470331dee55a068d094b27ccf97dc6e5ce104ae99587bfefb8ff12
Nov 07 13:25:39 bench06.ethdevops.io geth txIndexTail: 18785755 (0x11ea5db)
Nov 07 13:25:39 bench06.ethdevops.io geth
SkeletonSyncStatus: {"Subchains":[{"Head":21135734,"Tail":21135734,"Next":"0x36399eb652517a6a8618352d09c8d1cdfdf63a5bda68e3dc4a20c5eb290a27a8"}],"Finalized":21050420}
Nov 07 13:25:40 bench06.ethdevops.io geth Fatal: Failed to register the Ethereum service: rlp: input list has too many elements for rawdb.freezerTableMeta
Nov 07 13:25:40 bench06.ethdevops.io geth Fatal: Failed to register the Ethereum service: rlp: input list has too many elements for rawdb.freezerTableMeta

Yes, the db version is 9, so it's confirmed!

core/state: add handling for DiscardSnapshot
core/state: use new journal
core/state, genesis: fix flaw re discard/commit.
	In case the state is committed, the journal is reset, thus it is not correct to Discard/Revert snapshots at that point.
core/state: fix nil defer in merge
core/state: fix bugs in setjournal
core/state: journal api changes
core/state: bugfixes in sparse journal
core/state: journal tests
core/state: improve post-state check in journal-fuzzing test
core/state: post-rebase fixups
miner: remove discard-snapshot call, it's not needed since journal will be reset in Finalize
core/state: fix tests
core/state: lint
core/state: supply origin-value when reverting storage change
Update core/genesis.go
core/state: fix erroneous comments
core/state: review-nits regarding the journal
// This is fine
return
}
j.revisions = j.revisions[:id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can panic with runtime error: slice bounds out of range [:-1] if you try to discard on an empty journal. I think this is was also the behavior before the change, but it was made explicitly then, while this is not so easy to spot. Maybe we should be more explicit about it here


// revertSnapshot reverts all state changes made since the last call to snapshot().
func (j *linearJournal) revertSnapshot(s *StateDB) {
id := len(j.revisions) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this will panic if there was no previous call to snapshot()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough the set based journaling will not panic on revertSnapshot or discardSnapshot


// revertSnapshot reverts all state changes made since the last call to snapshot().
func (j *sparseJournal) revertSnapshot(s *StateDB) {
id := len(j.entries) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call revertSnapshot on the top-most snapshot, you will end up with a broken journal, since the initial (hidden) snapshot will be reverted. I don't think its intended behavior, since all subsequent calls will panics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fix would be to call s.snapshot() if id == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks to me like a fix which just fixes it because it somehow works, but I don't think it makes sense semantically. Why not just no-op if caller tries to revert when there's nothing to revert?
That is, return if len(j.entries) == 0 ?

@MariusVanDerWijden
Copy link
Member

I built a little fuzzer for differential fuzzing the two journals against each other here: https://github.com/MariusVanDerWijden/go-ethereum/tree/journal_on_heapfix_fuzz, its in the first commit, I also added some things on top, so that the fuzzer doesn't panic

@holiman
Copy link
Contributor Author

holiman commented Nov 21, 2024

I built a little fuzzer for differential fuzzing

Nice! We could add it to this PR, but please make it so that it's deterministic from the fuzzer input. So don't use rand, use the input as source (and read zeroes if it's depleted, which should be detected in next loop)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants