Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change handling of dirty objects in state #15225

Merged
merged 4 commits into from
Apr 10, 2018

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Oct 2, 2017

This PR has some major changes to how we track dirty accounts, before committing changes to database.

At present (without this PR), dirty accounts are tracked through stateObjectsDirty. This tracking mechanism is not based on the journal, but kept totally paralell. Because of this, the state_objects:s have onDirty callback mechanisms to set the dirtyness. It's quite complicated, to be honest.

It has an additional quadratic overhead: the between-transaction processing will iterate through the entire stateObjectsDirty collection, and update the objects. For a transaction towards the end of the block, this means that ~150 accounts can be processed while in fact there should only be two affected by that transactions.

The PR changes that.

  • It introduces a method to 'flatten' a journal, which returns a list of accounts which are affected by a series of journal-entries.
  • At tx-level, only flattened journals are used to determine which objects are dirty (and processed)
  • The block-level cache, stateObjectsDirty is generated from the flattened journals, which are inserted into the cache.
  • There is no more markDirty handling or callbacks
  • There is no touched attribute on state objects (not needed anymore)
  • The special handling of ripeMD has been moved from journal into the state_object, by explicitly inserting it into the stateObjectsDirty (which means that even if a journal is reverted, it will still reside in the block-level cache ( This could be a (historic) consensus error, since it may mean that the ripeMD isn't completely cleaned until after the end of the block, and not end of transaction. ))

Additionally, it happens to pass a couple of more tests than earlier. In general, this is more robust; instead of having dirty-tracking "in paralell", it answers the question "what is dirty" by looking at the canonical source of truth: the journal.

OBS This is a bit of experimental PR.

@holiman
Copy link
Contributor Author

holiman commented Oct 5, 2017

As I wrote above:

This could be a (historic) consensus error, since it may mean that the ripeMD isn't completely cleaned until after the end of the block, and not end of transaction.

Turned out that was indeed the case, it blew up on 2675119. I was able to take some measurements on import times before that, however.

Both were executed with default settings for cache, and --fakepow. I measured blocks 2473231 to 2674999 :

  • Geth master 00:58:24
  • This pr : 00:50:09

I've changed the logic a bit, and will try a full import again.

@holiman
Copy link
Contributor Author

holiman commented Oct 6, 2017

With the latest changes, I've successfully imported blocks 2M to 3M. That segment contains the entire shanghai attacks and the subsequent statesweep: two hard forks and one unintentional fork due to consensus-issue.
The imports were made with default options for cache (so very low cache) and fakepow.

With the master-code:

Import done in 26h47m47.774517906s.

With this PR:

Import done in 24h22m44.109765619s.

96467s vs 87764s => it's 9% faster.

@holiman
Copy link
Contributor Author

holiman commented Oct 8, 2017

Some more experiments.

With cache 2048 and --fakepow

Master:

Import done in 23h16m48.74176876s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          4 |    1836.58919 |    1260.35082 |       0.00000 |   79468.84025
   1   |         56 |     108.52697 |    1940.85489 |   81491.96636 |   81153.34476
   2   |        580 |    1087.91604 |    4074.31101 |  131188.30989 |  131160.43501
   3   |       7207 |   10899.54581 |   20125.81019 |  675141.48754 |  675142.40703
   4   |      48290 |   93540.77628 |   35187.58902 | 1180787.74085 | 1180397.40719

Trie cache misses:  174745338
Trie cache unloads: 31743610

Object memory: 2574.312 MB current, 8212.398 MB peak
System memory: 10117.967 MB current, 10117.967 MB peak
Allocations:   45777.941 million
GC pause:      6.603554202s

With this PR.

Import done in 22h40m6.14112637s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          1 |     455.30492 |    1250.53102 |       0.00000 |   79744.52129
   1   |        140 |     265.69575 |    2000.89512 |   84784.69367 |   84473.38843
   2   |       1398 |    2677.86621 |    4453.98388 |  150051.81522 |  150029.54390
   3   |       8380 |   12299.58781 |   13889.01611 |  447246.55409 |  447223.28746
   4   |      47500 |   92062.27779 |   43177.72579 | 1485068.41720 | 1484681.04689

Trie cache misses:  174741501
Trie cache unloads: 31731703

Object memory: 2329.977 MB current, 8131.587 MB peak
System memory: 10400.729 MB current, 10400.729 MB peak
Allocations:   36408.961 million
GC pause:      23.292975431s

=> 83808 s vs 81606 s ==> 2.6 % improvement.

So with better cache options, the improvements are less tangible. Also worth noting, is that Allocations went down by around 20%.

@holiman
Copy link
Contributor Author

holiman commented Oct 9, 2017

With this PR, import block 3M to 4M, --fakepow and --cache 2048

Import done in 12h38m9.720592597s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          2 |     926.93876 |    1287.50333 |       0.00000 |   80959.97332
   1   |         52 |     103.66826 |    2106.47193 |   84460.47817 |   83451.95941
   2   |        611 |    1036.40052 |    3831.22480 |  126264.75189 |  126177.00895
   3   |       7304 |   10375.71059 |   11249.37192 |  330391.57060 |  330112.27958
   4   |      51826 |  100938.49129 |   14379.99059 |  400161.66422 |  399955.03715
   5   |      37500 |   73805.25628 |     153.04139 |    4990.58212 |    4990.41243

Trie cache misses:  122443133
Trie cache unloads: 19956674

Object memory: 2814.558 MB current, 4738.710 MB peak
System memory: 7060.628 MB current, 7060.628 MB peak
Allocations:   29276.271 million
GC pause:      17.30536443s

Compacting entire database...
Compaction done in 5h28m56.871360459s.

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          0 |       0.00000 |    1288.43996 |       0.00000 |   81023.06263
   1   |          0 |       0.00000 |    2131.56375 |   85554.17450 |   84530.47110
   2   |          0 |       0.00000 |    3907.27510 |  128740.72071 |  128648.91455
   3   |          0 |       0.00000 |   11663.35599 |  344246.85950 |  343952.13564
   4   |          0 |       0.00000 |   17745.92485 |  516766.85356 |  516517.84837
   5   |      93862 |  187111.56640 |   13437.09847 |  214883.36016 |  214822.26067

@holiman
Copy link
Contributor Author

holiman commented Oct 11, 2017

Import made it to 4189257 before it ran out of disk. So I'd consider full-sync successfull.

@holiman
Copy link
Contributor Author

holiman commented Nov 10, 2017

I think this method maybe needs to be amended, since stateObjectsDirty is no longer a fixed tracker, but generated on the fly from the journal.

func (self *StateDB) Copy() *StateDB {
	self.lock.Lock()
	defer self.lock.Unlock()

	// Copy all the basic fields, initialize the memory ones
	state := &StateDB{
		db:                self.db,
		trie:              self.trie,
		stateObjects:      make(map[common.Address]*stateObject, len(self.stateObjectsDirty)),
		stateObjectsDirty: make(map[common.Address]struct{}, len(self.stateObjectsDirty)),
		refund:            new(big.Int).Set(self.refund),
		logs:              make(map[common.Hash][]*types.Log, len(self.logs)),
		logSize:           self.logSize,
		preimages:         make(map[common.Hash][]byte),
	}
	// Copy the dirty states, logs, and preimages
	for addr := range self.stateObjectsDirty {
		state.stateObjects[addr] = self.stateObjects[addr].deepCopy(state)
		state.stateObjectsDirty[addr] = struct{}{}
	}
	for hash, logs := range self.logs {
		state.logs[hash] = make([]*types.Log, len(logs))
		copy(state.logs[hash], logs)
	}
	for hash, preimage := range self.preimages {
		state.preimages[hash] = preimage
	}
	return state
}

@fjl fjl added this to the 1.8.0 milestone Nov 13, 2017
@@ -450,20 +450,22 @@ func (self *StateDB) Copy() *StateDB {
self.lock.Lock()
defer self.lock.Unlock()

dirtyObjects := self.journal.flatten()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dangerous to me because it iterates over all the previous opcodes. So we have an O(opcodes * calls) complexity here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 'opcodes' technically, just the ones that modify the state in some manner. Why would it be O( n*m) ?
Afict it's O( n ) , n being the length of the journal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, I'm not 100% certain about how to view Copy, and what the semantics of that should be. The copy should (?) have a clean and fresh journal, but needs a map of uncommitted changes.

@fjl
Copy link
Contributor

fjl commented Dec 1, 2017

Didn't have time to look into this yet. Will take a stab next week.

@karalabe karalabe modified the milestones: 1.8.0, 1.8.1, 1.8.2 Feb 13, 2018
@karalabe karalabe modified the milestones: 1.8.2, 1.8.3 Mar 2, 2018
@karalabe karalabe force-pushed the test_removefrom_dirtyset branch from 5297257 to 44fc96b Compare March 15, 2018 10:47
@@ -585,6 +591,12 @@ func (s *StateDB) clearJournalAndRefund() {
func (s *StateDB) Commit(deleteEmptyObjects bool) (root common.Hash, err error) {
defer s.clearJournalAndRefund()

dirtyObjects := s.journal.flatten()

for addr, v := range dirtyObjects {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, you can replace dirtyObjects with s.journal.flatten(). Less code, cleaner.


dirtyObjects := s.journal.flatten()

for addr, v := range dirtyObjects {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, you can replace dirtyObjects with s.journal.flatten(). Less code, cleaner.

}
//func (self *StateDB) MarkStateObjectDirty(addr common.Address) {
// self.stateObjectsDirty[addr] = struct{}{}
//}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls delete this.

}

type journal []journalEntry
//type journal []journalEntry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls delete this


func (ch addPreimageChange) undo(s *StateDB) {
delete(s.preimages, ch.hash)
}
func (ch addPreimageChange) getAccount() *common.Address {
Copy link
Member

@karalabe karalabe Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave a space an empty line between all methods in this file.

@holiman holiman force-pushed the test_removefrom_dirtyset branch from 7c528ad to c6dfe74 Compare March 15, 2018 12:56
@holiman
Copy link
Contributor Author

holiman commented Mar 15, 2018

Addressed the concerns

@holiman holiman force-pushed the test_removefrom_dirtyset branch from c6dfe74 to 513a55a Compare March 15, 2018 12:57
@holiman
Copy link
Contributor Author

holiman commented Mar 25, 2018

block_processing
Some stats about block processing times from the monitoring environement.

@holiman
Copy link
Contributor Author

holiman commented Mar 25, 2018

block_processing_2
This graph shows the percentage, pr / master block processing times. This pr seems to land at about 60-80%, which is a non-insignificant speedup.

@karalabe karalabe modified the milestones: 1.8.3, 1.8.4 Mar 27, 2018
@karalabe karalabe force-pushed the test_removefrom_dirtyset branch from 513a55a to 8500470 Compare March 27, 2018 12:18
@@ -488,7 +488,7 @@ func (self *StateDB) Copy() *StateDB {
func (self *StateDB) Snapshot() int {
id := self.nextRevisionId
self.nextRevisionId++
self.validRevisions = append(self.validRevisions, revision{id, self.journal.Length()})
self.validRevisions = append(self.validRevisions, revision{id, len(self.journal.entries)})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you removed Length ? Is this really cleaner?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not

@karalabe karalabe force-pushed the test_removefrom_dirtyset branch from 8500470 to 2a2e368 Compare March 28, 2018 06:31
@karalabe karalabe force-pushed the test_removefrom_dirtyset branch from 2a2e368 to d985b90 Compare March 28, 2018 06:32
@holiman
Copy link
Contributor Author

holiman commented Mar 28, 2018

Testing with these changes now:

#git diff
diff --git a/core/state/state_object.go b/core/state/state_object.go
index 3c40c20..fdac884 100644
--- a/core/state/state_object.go
+++ b/core/state/state_object.go
@@ -242,7 +242,7 @@ func (c *stateObject) AddBalance(amount *big.Int) {
        // EIP158: We must check emptiness for the objects such that the account
        // clearing (0,0,0 objects) can take effect.
        if amount.Sign() == 0 {
-               if c.empty() {
+               if c.empty() && !c.db.IgnoresTouchedness{
                        c.touch()
                }
 
diff --git a/core/state/statedb.go b/core/state/statedb.go
index c7ef49f..64b637b 100644
--- a/core/state/statedb.go
+++ b/core/state/statedb.go
@@ -79,7 +79,7 @@ type StateDB struct {
        journal        *journal
        validRevisions []revision
        nextRevisionId int
-
+       IgnoresTouchedness bool // transactions prior to EIP 158 does not have a notion about 'touched'
        lock sync.Mutex
 }
 
diff --git a/core/state_processor.go b/core/state_processor.go
index 4dc58b9..c1b334a 100644
--- a/core/state_processor.go
+++ b/core/state_processor.go
@@ -65,6 +65,9 @@ func (p *StateProcessor) Process(block *types.Block, statedb *state.StateDB, cfg
        if p.config.DAOForkSupport && p.config.DAOForkBlock != nil && p.config.DAOForkBlock.Cmp(block.Number()) == 0 {
                misc.ApplyDAOHardFork(statedb)
        }
+       if block.Number().Cmp(p.config.EIP158Block) < 0{
+               statedb.IgnoresTouchedness = true
+       }
        // Iterate over and process the individual transactions
        for i, tx := range block.Transactions() {
                statedb.Prepare(tx.Hash(), block.Hash(), i)

@holiman
Copy link
Contributor Author

holiman commented Mar 29, 2018

expr_vs_master-2
This graph shows blocknumber on y-axis, and time on x-axis, when performing a full sync from zero to 3.9M. This PR is in blue. It performs non-insignificantly faster.

@holiman
Copy link
Contributor Author

holiman commented Mar 29, 2018

For context: at block=1714175, on tx 0x1237f737031e40bcde4a8b7e717b2d15e3ecadfe49bb1bbc71ee9deb09c6fcf2, there's an (inner) CALL to 0x03 (ripemd), which eventually goes OOG (actually I'm not 100% sure if it's the CALL that goes OOG or the transaction.

If we do special treatment of 0x03, that cannot be applied to pre-158.
An alternative fix would be to, just like we have different opcode tables, have different statedb:s for different forks. And these state transition mechanisms could we swapped at fork blocks.

That would make some things simpler; we encode the homestead state transition in on implementation, and never again touch that code in future hardforks. And we have a separate one for eip158, which has the notion of touched. But that is a much bigger rewrite --- but if done right, could make our lives a lot simpler going forward.

@holiman
Copy link
Contributor Author

holiman commented Mar 30, 2018

block_processing_3
After a couple of days, this PR is now 11 (ELEVEN) hours ahead of master, blockwise about 160K blocks more advanced.

@holiman
Copy link
Contributor Author

holiman commented Apr 5, 2018

@karalabe A 'cleaner' solution than my fix which made state processor rule-aware, is to simply do a nil-check and continue loop if nil.

An object can be in the journal but not the stateobjects_dirty, in the specific case of a touch (which is journaled) but before touch actually exist as a concept. Thus, we could simply add a nil-check and a comment about when this can occur.

@holiman
Copy link
Contributor Author

holiman commented Apr 7, 2018

expr_vs_master-3
This graph shows the time after both got synced, eventually (yes, slow-sync from 0). This PR performs 0-100% better, i.e sometimes processing in half the time and sometimes in same time. So roughly 50% improvement in block processing time.

I will now try out my cleaner solution, and force-push here if it works out.
Also, the times to sync from 0 to head were:

  • 137h on master
  • 94h with this PR (68%)

@holiman holiman force-pushed the test_removefrom_dirtyset branch from 18247ab to a819c8f Compare April 9, 2018 08:30
@holiman
Copy link
Contributor Author

holiman commented Apr 9, 2018

Pushed new fix

if !exist {
// ripeMD is 'touched' at block 1714175, in tx 0x1237f737031e40bcde4a8b7e717b2d15e3ecadfe49bb1bbc71ee9deb09c6fcf2
// That tx goes out of gas, and although the notion of 'touched' does not exist there, the
// touch-even will still be recorded in the journal. Since ripeMD is a special snowflake,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

touch-evenT

// ripeMD is 'touched' at block 1714175, in tx 0x1237f737031e40bcde4a8b7e717b2d15e3ecadfe49bb1bbc71ee9deb09c6fcf2
// That tx goes out of gas, and although the notion of 'touched' does not exist there, the
// touch-even will still be recorded in the journal. Since ripeMD is a special snowflake,
// it will persist in the journal even though the journal is reverted. In this special circumstance ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

circumstance< no space >,

@holiman holiman force-pushed the test_removefrom_dirtyset branch from a819c8f to 8c31d28 Compare April 10, 2018 09:20
Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karalabe karalabe merged commit 39f4c80 into ethereum:master Apr 10, 2018
@wuestholz
Copy link
Contributor

@holiman @karalabe @fjl I observed a weird difference in behavior that seem to be due to this PR. I wrote a small repro:

db, _ := ethdb.NewMemDatabase()
sdb1, _ := state.New(common.Hash{}, state.NewDatabase(db))
addr := common.HexToAddress("aaaa")
sdb1.SetBalance(addr, big.NewInt(42))
sdb2 := sdb1.Copy()
fmt.Printf("Balance before: %v\n", sdb2.GetBalance(addr))
sdb3 := sdb2.Copy()
fmt.Printf("Balance after: %v\n", sdb3.GetBalance(addr))

// Output with PR #15225:
// Balance before: 42
// Balance after: 0

// Output without PR #15225:
// Balance before: 42
// Balance after: 42

I would expect it to print 42 twice since copy should create a deep copy. Is this not the desired behavior anymore?

@holiman
Copy link
Contributor Author

holiman commented Apr 10, 2018

Nice catch, thank you!

@holiman
Copy link
Contributor Author

holiman commented Apr 10, 2018

I've made a PR which fixes this, I hope nothing else is br0ken... #16485

@wuestholz
Copy link
Contributor

@holiman Great! Thank you for suggesting a fix this quickly!

@siong1987
Copy link

@holiman how do you produce the graphs above? I am in the process of testing different DBs other than LevelDB. instructions on how to do the benchmark will be amazing and super helpful to me.

@holiman
Copy link
Contributor Author

holiman commented Apr 18, 2018

@siong1987 The graphs are from Datadog, https://app.datadoghq.com/. Here is a part of the yaml config used to deploy the instance, which configures it to expose pprof-data

          "--datadir=/datadir/geth"
          "--syncmode=full"
          "--rpc" "--rpcaddr=0.0.0.0" "--rpccorsdomain=*"
          "--rpcapi=eth,net,web3,txpool,debug"
          "--cache=2048"
          "--nat=extip:{{ inventory_hostname }}"
          "--metrics" "--pprof" "--pprofaddr=0.0.0.0"

If you run it like that, you can visit http://127.0.0.1:6060/debug/metrics and get internal metrics from the instance.

The machine also runs a dd-agent, configured as follows (go-expvar.yaml):

init_config:
  
instances:
  - expvar_url: "http://127.0.0.1:6060/debug/metrics"
    namespace: "geth"
    tags:
      - "client:geth"
    metrics:
      - path: chain.*
        type: counter
      - path: trie.cachemiss
      - path: trie.cacheunload
      - path: txpool.*
      - path: p2p.*
      - path: les.*
      - path: eth.downloader.*

The dd-agent is also configured to pick up external metrics, such as docker read io stats, disk usage, cpu iowait etc.

Then it's just a matter of composing charts with the available metrics inside datadog. The swarm-team use grafana more, which seems like a nice alternative -- you can use whatever graphing tool you like, but pprof is a good (only?) way to expose all the metrics that we have placed inside of geth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants