Change handling of dirty objects in state #15225

holiman · 2017-10-02T12:31:35Z

This PR has some major changes to how we track dirty accounts, before committing changes to database.

At present (without this PR), dirty accounts are tracked through stateObjectsDirty. This tracking mechanism is not based on the journal, but kept totally paralell. Because of this, the state_objects:s have onDirty callback mechanisms to set the dirtyness. It's quite complicated, to be honest.

It has an additional quadratic overhead: the between-transaction processing will iterate through the entire stateObjectsDirty collection, and update the objects. For a transaction towards the end of the block, this means that ~150 accounts can be processed while in fact there should only be two affected by that transactions.

The PR changes that.

It introduces a method to 'flatten' a journal, which returns a list of accounts which are affected by a series of journal-entries.
At tx-level, only flattened journals are used to determine which objects are dirty (and processed)
The block-level cache, stateObjectsDirty is generated from the flattened journals, which are inserted into the cache.
There is no more markDirty handling or callbacks
There is no touched attribute on state objects (not needed anymore)
The special handling of ripeMD has been moved from journal into the state_object, by explicitly inserting it into the stateObjectsDirty (which means that even if a journal is reverted, it will still reside in the block-level cache ( This could be a (historic) consensus error, since it may mean that the ripeMD isn't completely cleaned until after the end of the block, and not end of transaction. ))

Additionally, it happens to pass a couple of more tests than earlier. In general, this is more robust; instead of having dirty-tracking "in paralell", it answers the question "what is dirty" by looking at the canonical source of truth: the journal.

OBS This is a bit of experimental PR.

holiman · 2017-10-05T07:42:21Z

As I wrote above:

This could be a (historic) consensus error, since it may mean that the ripeMD isn't completely cleaned until after the end of the block, and not end of transaction.

Turned out that was indeed the case, it blew up on 2675119. I was able to take some measurements on import times before that, however.

Both were executed with default settings for cache, and --fakepow. I measured blocks 2473231 to 2674999 :

Geth master 00:58:24
This pr : 00:50:09

I've changed the logic a bit, and will try a full import again.

holiman · 2017-10-06T08:24:55Z

With the latest changes, I've successfully imported blocks 2M to 3M. That segment contains the entire shanghai attacks and the subsequent statesweep: two hard forks and one unintentional fork due to consensus-issue.
The imports were made with default options for cache (so very low cache) and fakepow.

With the master-code:

Import done in 26h47m47.774517906s.

With this PR:

Import done in 24h22m44.109765619s.

96467s vs 87764s => it's 9% faster.

holiman · 2017-10-08T11:07:01Z

Some more experiments.

With `cache 2048` and `--fakepow`

Master:

Import done in 23h16m48.74176876s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          4 |    1836.58919 |    1260.35082 |       0.00000 |   79468.84025
   1   |         56 |     108.52697 |    1940.85489 |   81491.96636 |   81153.34476
   2   |        580 |    1087.91604 |    4074.31101 |  131188.30989 |  131160.43501
   3   |       7207 |   10899.54581 |   20125.81019 |  675141.48754 |  675142.40703
   4   |      48290 |   93540.77628 |   35187.58902 | 1180787.74085 | 1180397.40719

Trie cache misses:  174745338
Trie cache unloads: 31743610

Object memory: 2574.312 MB current, 8212.398 MB peak
System memory: 10117.967 MB current, 10117.967 MB peak
Allocations:   45777.941 million
GC pause:      6.603554202s

With this PR.

Import done in 22h40m6.14112637s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          1 |     455.30492 |    1250.53102 |       0.00000 |   79744.52129
   1   |        140 |     265.69575 |    2000.89512 |   84784.69367 |   84473.38843
   2   |       1398 |    2677.86621 |    4453.98388 |  150051.81522 |  150029.54390
   3   |       8380 |   12299.58781 |   13889.01611 |  447246.55409 |  447223.28746
   4   |      47500 |   92062.27779 |   43177.72579 | 1485068.41720 | 1484681.04689

Trie cache misses:  174741501
Trie cache unloads: 31731703

Object memory: 2329.977 MB current, 8131.587 MB peak
System memory: 10400.729 MB current, 10400.729 MB peak
Allocations:   36408.961 million
GC pause:      23.292975431s

=> 83808 s vs 81606 s ==> 2.6 % improvement.

So with better cache options, the improvements are less tangible. Also worth noting, is that Allocations went down by around 20%.

holiman · 2017-10-09T16:16:23Z

With this PR, import block 3M to 4M, --fakepow and --cache 2048

Import done in 12h38m9.720592597s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          2 |     926.93876 |    1287.50333 |       0.00000 |   80959.97332
   1   |         52 |     103.66826 |    2106.47193 |   84460.47817 |   83451.95941
   2   |        611 |    1036.40052 |    3831.22480 |  126264.75189 |  126177.00895
   3   |       7304 |   10375.71059 |   11249.37192 |  330391.57060 |  330112.27958
   4   |      51826 |  100938.49129 |   14379.99059 |  400161.66422 |  399955.03715
   5   |      37500 |   73805.25628 |     153.04139 |    4990.58212 |    4990.41243

Trie cache misses:  122443133
Trie cache unloads: 19956674

Object memory: 2814.558 MB current, 4738.710 MB peak
System memory: 7060.628 MB current, 7060.628 MB peak
Allocations:   29276.271 million
GC pause:      17.30536443s

Compacting entire database...
Compaction done in 5h28m56.871360459s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          0 |       0.00000 |    1288.43996 |       0.00000 |   81023.06263
   1   |          0 |       0.00000 |    2131.56375 |   85554.17450 |   84530.47110
   2   |          0 |       0.00000 |    3907.27510 |  128740.72071 |  128648.91455
   3   |          0 |       0.00000 |   11663.35599 |  344246.85950 |  343952.13564
   4   |          0 |       0.00000 |   17745.92485 |  516766.85356 |  516517.84837
   5   |      93862 |  187111.56640 |   13437.09847 |  214883.36016 |  214822.26067

holiman · 2017-10-11T09:03:36Z

Import made it to 4189257 before it ran out of disk. So I'd consider full-sync successfull.

holiman · 2017-11-10T13:32:33Z

I think this method maybe needs to be amended, since stateObjectsDirty is no longer a fixed tracker, but generated on the fly from the journal.

func (self *StateDB) Copy() *StateDB {
	self.lock.Lock()
	defer self.lock.Unlock()

	// Copy all the basic fields, initialize the memory ones
	state := &StateDB{
		db:                self.db,
		trie:              self.trie,
		stateObjects:      make(map[common.Address]*stateObject, len(self.stateObjectsDirty)),
		stateObjectsDirty: make(map[common.Address]struct{}, len(self.stateObjectsDirty)),
		refund:            new(big.Int).Set(self.refund),
		logs:              make(map[common.Hash][]*types.Log, len(self.logs)),
		logSize:           self.logSize,
		preimages:         make(map[common.Hash][]byte),
	}
	// Copy the dirty states, logs, and preimages
	for addr := range self.stateObjectsDirty {
		state.stateObjects[addr] = self.stateObjects[addr].deepCopy(state)
		state.stateObjectsDirty[addr] = struct{}{}
	}
	for hash, logs := range self.logs {
		state.logs[hash] = make([]*types.Log, len(logs))
		copy(state.logs[hash], logs)
	}
	for hash, preimage := range self.preimages {
		state.preimages[hash] = preimage
	}
	return state
}

karalabe · 2017-11-17T12:51:39Z

core/state/statedb.go

@@ -450,20 +450,22 @@ func (self *StateDB) Copy() *StateDB {
 	self.lock.Lock()
 	defer self.lock.Unlock()

+	dirtyObjects := self.journal.flatten()


This seems dangerous to me because it iterates over all the previous opcodes. So we have an O(opcodes * calls) complexity here

Not 'opcodes' technically, just the ones that modify the state in some manner. Why would it be O( n*m) ?
Afict it's O( n ) , n being the length of the journal.

But yes, I'm not 100% certain about how to view Copy, and what the semantics of that should be. The copy should (?) have a clean and fresh journal, but needs a map of uncommitted changes.

fjl · 2017-12-01T12:16:11Z

Didn't have time to look into this yet. Will take a stab next week.

karalabe · 2018-03-15T10:53:44Z

core/state/statedb.go

@@ -585,6 +591,12 @@ func (s *StateDB) clearJournalAndRefund() {
 func (s *StateDB) Commit(deleteEmptyObjects bool) (root common.Hash, err error) {
 	defer s.clearJournalAndRefund()

+	dirtyObjects := s.journal.flatten()
+
+	for addr, v := range dirtyObjects {


Nitpick, you can replace dirtyObjects with s.journal.flatten(). Less code, cleaner.

karalabe · 2018-03-15T10:53:52Z

core/state/statedb.go

+
+	dirtyObjects := s.journal.flatten()
+
+	for addr, v := range dirtyObjects {


Nitpick, you can replace dirtyObjects with s.journal.flatten(). Less code, cleaner.

karalabe · 2018-03-15T10:54:30Z

core/state/statedb.go

-}
+//func (self *StateDB) MarkStateObjectDirty(addr common.Address) {
+//	self.stateObjectsDirty[addr] = struct{}{}
+//}


Pls delete this.

karalabe · 2018-03-15T10:54:50Z

core/state/journal.go

 }

-type journal []journalEntry
+//type journal []journalEntry


pls delete this

karalabe · 2018-03-15T10:55:49Z

core/state/journal.go


 func (ch addPreimageChange) undo(s *StateDB) {
 	delete(s.preimages, ch.hash)
 }
+func (ch addPreimageChange) getAccount() *common.Address {


Please leave ~~a space~~ an empty line between all methods in this file.

holiman · 2018-03-15T12:57:01Z

Addressed the concerns

holiman · 2018-03-25T12:54:56Z

Some stats about block processing times from the monitoring environement.

holiman · 2018-03-25T12:56:42Z

This graph shows the percentage, pr / master block processing times. This pr seems to land at about 60-80%, which is a non-insignificant speedup.

holiman · 2018-03-27T17:37:34Z

core/state/statedb.go

@@ -488,7 +488,7 @@ func (self *StateDB) Copy() *StateDB {
 func (self *StateDB) Snapshot() int {
 	id := self.nextRevisionId
 	self.nextRevisionId++
-	self.validRevisions = append(self.validRevisions, revision{id, self.journal.Length()})
+	self.validRevisions = append(self.validRevisions, revision{id, len(self.journal.entries)})


Not sure why you removed Length ? Is this really cleaner?

probably not

holiman · 2018-03-28T17:57:33Z

Testing with these changes now:

#git diff
diff --git a/core/state/state_object.go b/core/state/state_object.go
index 3c40c20..fdac884 100644
--- a/core/state/state_object.go
+++ b/core/state/state_object.go
@@ -242,7 +242,7 @@ func (c *stateObject) AddBalance(amount *big.Int) {
        // EIP158: We must check emptiness for the objects such that the account
        // clearing (0,0,0 objects) can take effect.
        if amount.Sign() == 0 {
-               if c.empty() {
+               if c.empty() && !c.db.IgnoresTouchedness{
                        c.touch()
                }
 
diff --git a/core/state/statedb.go b/core/state/statedb.go
index c7ef49f..64b637b 100644
--- a/core/state/statedb.go
+++ b/core/state/statedb.go
@@ -79,7 +79,7 @@ type StateDB struct {
        journal        *journal
        validRevisions []revision
        nextRevisionId int
-
+       IgnoresTouchedness bool // transactions prior to EIP 158 does not have a notion about 'touched'
        lock sync.Mutex
 }
 
diff --git a/core/state_processor.go b/core/state_processor.go
index 4dc58b9..c1b334a 100644
--- a/core/state_processor.go
+++ b/core/state_processor.go
@@ -65,6 +65,9 @@ func (p *StateProcessor) Process(block *types.Block, statedb *state.StateDB, cfg
        if p.config.DAOForkSupport && p.config.DAOForkBlock != nil && p.config.DAOForkBlock.Cmp(block.Number()) == 0 {
                misc.ApplyDAOHardFork(statedb)
        }
+       if block.Number().Cmp(p.config.EIP158Block) < 0{
+               statedb.IgnoresTouchedness = true
+       }
        // Iterate over and process the individual transactions
        for i, tx := range block.Transactions() {
                statedb.Prepare(tx.Hash(), block.Hash(), i)

holiman · 2018-03-29T07:36:32Z

This graph shows blocknumber on y-axis, and time on x-axis, when performing a full sync from zero to 3.9M. This PR is in blue. It performs non-insignificantly faster.

holiman · 2018-03-29T17:38:56Z

For context: at block=1714175, on tx 0x1237f737031e40bcde4a8b7e717b2d15e3ecadfe49bb1bbc71ee9deb09c6fcf2, there's an (inner) CALL to 0x03 (ripemd), which eventually goes OOG (actually I'm not 100% sure if it's the CALL that goes OOG or the transaction.

If we do special treatment of 0x03, that cannot be applied to pre-158.
An alternative fix would be to, just like we have different opcode tables, have different statedb:s for different forks. And these state transition mechanisms could we swapped at fork blocks.

That would make some things simpler; we encode the homestead state transition in on implementation, and never again touch that code in future hardforks. And we have a separate one for eip158, which has the notion of touched. But that is a much bigger rewrite --- but if done right, could make our lives a lot simpler going forward.

holiman · 2018-03-30T17:55:40Z

After a couple of days, this PR is now 11 (ELEVEN) hours ahead of master, blockwise about 160K blocks more advanced.

holiman · 2018-04-05T07:46:33Z

@karalabe A 'cleaner' solution than my fix which made state processor rule-aware, is to simply do a nil-check and continue loop if nil.

An object can be in the journal but not the stateobjects_dirty, in the specific case of a touch (which is journaled) but before touch actually exist as a concept. Thus, we could simply add a nil-check and a comment about when this can occur.

holiman · 2018-04-07T17:12:18Z

This graph shows the time after both got synced, eventually (yes, slow-sync from 0). This PR performs 0-100% better, i.e sometimes processing in half the time and sometimes in same time. So roughly 50% improvement in block processing time.

I will now try out my cleaner solution, and force-push here if it works out.
Also, the times to sync from 0 to head were:

137h on master
94h with this PR (68%)

holiman · 2018-04-09T08:40:51Z

Pushed new fix

karalabe · 2018-04-10T09:09:39Z

core/state/statedb.go

+		if !exist {
+			// ripeMD is 'touched' at block 1714175, in tx 0x1237f737031e40bcde4a8b7e717b2d15e3ecadfe49bb1bbc71ee9deb09c6fcf2
+			// That tx goes out of gas, and although the notion of 'touched' does not exist there, the
+			// touch-even will still be recorded in the journal. Since ripeMD is a special snowflake,


touch-evenT

karalabe · 2018-04-10T09:10:23Z

core/state/statedb.go

+			// ripeMD is 'touched' at block 1714175, in tx 0x1237f737031e40bcde4a8b7e717b2d15e3ecadfe49bb1bbc71ee9deb09c6fcf2
+			// That tx goes out of gas, and although the notion of 'touched' does not exist there, the
+			// touch-even will still be recorded in the journal. Since ripeMD is a special snowflake,
+			// it will persist in the journal even though the journal is reverted. In this special circumstance ,


circumstance< no space >,

karalabe

LGTM

wuestholz · 2018-04-10T17:56:00Z

@holiman @karalabe @fjl I observed a weird difference in behavior that seem to be due to this PR. I wrote a small repro:

db, _ := ethdb.NewMemDatabase()
sdb1, _ := state.New(common.Hash{}, state.NewDatabase(db))
addr := common.HexToAddress("aaaa")
sdb1.SetBalance(addr, big.NewInt(42))
sdb2 := sdb1.Copy()
fmt.Printf("Balance before: %v\n", sdb2.GetBalance(addr))
sdb3 := sdb2.Copy()
fmt.Printf("Balance after: %v\n", sdb3.GetBalance(addr))

// Output with PR #15225:
// Balance before: 42
// Balance after: 0

// Output without PR #15225:
// Balance before: 42
// Balance after: 42

I would expect it to print 42 twice since copy should create a deep copy. Is this not the desired behavior anymore?

holiman · 2018-04-10T18:22:45Z

Nice catch, thank you!

holiman · 2018-04-10T19:02:57Z

I've made a PR which fixes this, I hope nothing else is br0ken... #16485

wuestholz · 2018-04-10T21:02:17Z

@holiman Great! Thank you for suggesting a fix this quickly!

siong1987 · 2018-04-17T16:04:10Z

@holiman how do you produce the graphs above? I am in the process of testing different DBs other than LevelDB. instructions on how to do the benchmark will be amazing and super helpful to me.

holiman · 2018-04-18T06:37:17Z

@siong1987 The graphs are from Datadog, https://app.datadoghq.com/. Here is a part of the yaml config used to deploy the instance, which configures it to expose pprof-data

          "--datadir=/datadir/geth"
          "--syncmode=full"
          "--rpc" "--rpcaddr=0.0.0.0" "--rpccorsdomain=*"
          "--rpcapi=eth,net,web3,txpool,debug"
          "--cache=2048"
          "--nat=extip:{{ inventory_hostname }}"
          "--metrics" "--pprof" "--pprofaddr=0.0.0.0"

If you run it like that, you can visit http://127.0.0.1:6060/debug/metrics and get internal metrics from the instance.

The machine also runs a dd-agent, configured as follows (go-expvar.yaml):

init_config:
  
instances:
  - expvar_url: "http://127.0.0.1:6060/debug/metrics"
    namespace: "geth"
    tags:
      - "client:geth"
    metrics:
      - path: chain.*
        type: counter
      - path: trie.cachemiss
      - path: trie.cacheunload
      - path: txpool.*
      - path: p2p.*
      - path: les.*
      - path: eth.downloader.*

The dd-agent is also configured to pick up external metrics, such as docker read io stats, disk usage, cpu iowait etc.

Then it's just a matter of composing charts with the available metrics inside datadog. The swarm-team use grafana more, which seems like a nice alternative -- you can use whatever graphing tool you like, but pprof is a good (only?) way to expose all the metrics that we have placed inside of geth.

fjl added this to the 1.8.0 milestone Nov 13, 2017

karalabe reviewed Nov 17, 2017

View reviewed changes

karalabe modified the milestones: 1.8.0, 1.8.1, 1.8.2 Feb 13, 2018

karalabe modified the milestones: 1.8.2, 1.8.3 Mar 2, 2018

karalabe force-pushed the test_removefrom_dirtyset branch from 5297257 to 44fc96b Compare March 15, 2018 10:47

karalabe reviewed Mar 15, 2018

View reviewed changes

core/state/journal.go Outdated

}

type journal []journalEntry

//type journal []journalEntry

Copy link

Member

karalabe Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls delete this

karalabe reviewed Mar 15, 2018

View reviewed changes

holiman force-pushed the test_removefrom_dirtyset branch from 7c528ad to c6dfe74 Compare March 15, 2018 12:56

holiman force-pushed the test_removefrom_dirtyset branch from c6dfe74 to 513a55a Compare March 15, 2018 12:57

karalabe modified the milestones: 1.8.3, 1.8.4 Mar 27, 2018

karalabe force-pushed the test_removefrom_dirtyset branch from 513a55a to 8500470 Compare March 27, 2018 12:18

holiman commented Mar 27, 2018

View reviewed changes

core/state: rework dirty handling to avoid quadratic overhead

958ed4f

karalabe force-pushed the test_removefrom_dirtyset branch from 8500470 to 2a2e368 Compare March 28, 2018 06:31

core/state: avoid linear overhead on journal dirty listing

d985b90

karalabe force-pushed the test_removefrom_dirtyset branch from 2a2e368 to d985b90 Compare March 28, 2018 06:32

holiman force-pushed the test_removefrom_dirtyset branch from 18247ab to a819c8f Compare April 9, 2018 08:30

karalabe requested changes Apr 10, 2018

View reviewed changes

holiman added 2 commits April 10, 2018 11:20

state: handle nil in journal dirties

14c9215

core: add blockchain benchmarks

8c31d28

holiman force-pushed the test_removefrom_dirtyset branch from a819c8f to 8c31d28 Compare April 10, 2018 09:20

karalabe approved these changes Apr 10, 2018

View reviewed changes

karalabe merged commit 39f4c80 into ethereum:master Apr 10, 2018

holiman mentioned this pull request Apr 10, 2018

core/state: fix bug in copy of copy State #16485

Merged

siong1987 mentioned this pull request Apr 30, 2018

Research alternatives to leveldb (like badger) #15717

Closed

This was referenced Jun 19, 2023

Update go-ethereum PR c98tristan/tomochain#5

Merged

Add document and update PR from go-ethereum BuildOnViction/victionchain#357

Closed

endale98 mentioned this pull request Jul 17, 2023

Change handling of dirty objects in state BuildOnViction/victionchain#371

Closed


		dirtyObjects := s.journal.flatten()

		for addr, v := range dirtyObjects {

Change handling of dirty objects in state #15225

Change handling of dirty objects in state #15225

Conversation

holiman commented Oct 2, 2017 • edited Loading

holiman commented Oct 5, 2017

holiman commented Oct 6, 2017 • edited Loading

holiman commented Oct 8, 2017

With cache 2048 and --fakepow

holiman commented Oct 9, 2017

holiman commented Oct 11, 2017

holiman commented Nov 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjl commented Dec 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe Mar 15, 2018 • edited Loading

Choose a reason for hiding this comment

holiman commented Mar 15, 2018

holiman commented Mar 25, 2018

holiman commented Mar 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holiman commented Mar 28, 2018

holiman commented Mar 29, 2018

holiman commented Mar 29, 2018

holiman commented Mar 30, 2018

holiman commented Apr 5, 2018

holiman commented Apr 7, 2018 • edited Loading

holiman commented Apr 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe left a comment

Choose a reason for hiding this comment

wuestholz commented Apr 10, 2018

holiman commented Apr 10, 2018

holiman commented Apr 10, 2018

wuestholz commented Apr 10, 2018

siong1987 commented Apr 17, 2018

holiman commented Apr 18, 2018 • edited Loading

holiman commented Oct 2, 2017 •

edited

Loading

holiman commented Oct 6, 2017 •

edited

Loading

With `cache 2048` and `--fakepow`

karalabe Mar 15, 2018 •

edited

Loading

holiman commented Mar 25, 2018 •

edited

Loading

holiman commented Apr 7, 2018 •

edited

Loading

holiman commented Apr 18, 2018 •

edited

Loading