Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client: add VM execution (old) #1017

Closed
wants to merge 22 commits into from
Closed

Conversation

holgerd77
Copy link
Member

Ok, i did some back-and-forth experimentation on where/how to integrate VM execution in the client and I think this should finally do it.

This PR first removes the HeaderFetcher -> BlockFetcher inheritance dependency. This frees the Fetcher classes to now have a clean object relationship to the respective synchronizers (so BlockFetcher <-> FullSynchronizer, HeaderFetcher <-> LightSynchronizer) and subsequently allows for a direct integration of the VM execution into the BlockFetcher, the VM (another time 😄 ) also further moved into this class.

In the BlockFetcher.store() function the functionality from Chain.putBlocks() is then drawn in and decomposed (this should be no problem since this class is going away anyhow with the wrapper class removal @jochem-brouwer is planning).
This allows for atomic alternate block execution and blockchain storage (I've also drawn in the three line Blockchain.putBlocks() functionality) and assures that only successfully executed upon blocks are stored in the chain.

PR is not running yet. Unit tests are passing but integration tests need some modification since the mock setup now triggers some validation checks to fail along the vm.runBlock() run, not sure how to fix this yet.

Client run is also triggering the following error at the moment once a block with transactions is hit: ERROR [12-16|00:31:58] Error: sender doesn't have enough funds to send tx. The upfront cost is: 996205183388591000 and the sender's account only has: 0 (another error ERROR [12-16|00:30:46] TypeError: Cannot read property 'map' of undefined at BlockFetcher.request (/EthereumJS/ethereumjs-vm/packages/client/dist/lib/sync/fetcher/blockfetcher.js:41:62) seems unrelated to this PR and might rather be introduced along the type improvements from @ryanio in some previous PRs, at least that's my current unproven suspicion).

So this is rather open to be picked up and further continued.

@codecov
Copy link

codecov bot commented Dec 15, 2020

Codecov Report

Merging #1017 (4811439) into master (582b4ef) will increase coverage by 0.43%.
The diff coverage is 93.13%.

Impacted file tree graph

Flag Coverage Δ
block 77.92% <ø> (ø)
blockchain 77.92% <ø> (ø)
client 88.06% <91.78%> (+0.92%) ⬆️
common 91.87% <ø> (ø)
devp2p 82.78% <100.00%> (+0.33%) ⬆️
ethash 82.08% <ø> (ø)
tx 86.25% <ø> (+0.21%) ⬆️
vm 83.19% <95.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@holgerd77
Copy link
Member Author

holgerd77 commented Dec 15, 2020

Ok, but that's at least in the center of what we want to test (currently testing at block 76511 of mainnet): 😄

ERROR [12-16|00:53:59] Error: sender doesn't have enough funds to send tx. The upfront cost is: 996205183388591000 and the sender's account only has: 0
    at VM._runTx (/EthereumJS/ethereumjs-vm/packages/vm/dist/runTx.js:67:19)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:94:5)
    at async VM.runTx (/EthereumJS/ethereumjs-vm/packages/vm/dist/runTx.js:29:24)
    at async VM.applyTransactions (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlock.js:161:23)
    at async VM.applyBlock (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlock.js:131:23)
    at async VM.runBlock (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlock.js:70:18)
    at async BlockFetcher.store (/EthereumJS/ethereumjs-vm/packages/client/dist/lib/sync/fetcher/blockfetcher.js:69:13)
    at async Writable._write (/EthereumJS/ethereumjs-vm/packages/client/dist/lib/sync/fetcher/fetcher.js:176:17)

@holgerd77 holgerd77 force-pushed the client-add-vm-execution branch from 0c7adf1 to 14aa171 Compare December 16, 2020 00:00
@holgerd77
Copy link
Member Author

(I could skip both the balance and the nonce checks in the VM with skipBalance, skipNonce set to `true, but would this make sense? 🤔 )

@holgerd77
Copy link
Member Author

The initial error when having deleted the database is:

ERROR [12-16|01:16:23] Error: invalid block stateRoot
    at VM.runBlock (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlock.js:97:19)
    at async BlockFetcher.store (/EthereumJS/ethereumjs-vm/packages/client/dist/lib/sync/fetcher/blockfetcher.js:69:13)
    at async Writable._write (/EthereumJS/ethereumjs-vm/packages/client/dist/lib/sync/fetcher/fetcher.js:176:17)

@jochem-brouwer
Copy link
Member

Okay, this is great! 😄 We should not use any of the skip flags, the client should fully validate each block. Note that block 46147 is the first block in which we have transactions. These probably pass because skipBlockValidation is set to true, this does not check things like the state root. We should fully validate each block so we can immediately detect if there's something wrong.

@jochem-brouwer
Copy link
Member

OK, experimenting a bit here. I have removed skipBlockValidation here.

BlockFetcher either throws this Cannot read property 'map' of undefined (peer dumps in no data at all?) or it throws on a wrong state root when running block 1.

@jochem-brouwer
Copy link
Member

OK, the root cause of the problem is because stateManager points to an empty trie by default instead of the genesis state trie. This should be fixed once we start using runBlockchain, although we might want to alter that functionality a little bit too.

@holgerd77
Copy link
Member Author

Rebased this.

@holgerd77
Copy link
Member Author

( @jochem-brouwer if you want to do work here you can of course directly push to this branch )

@holgerd77 holgerd77 force-pushed the client-add-vm-execution branch from 731b00d to 38be1be Compare December 21, 2020 12:14
@holgerd77
Copy link
Member Author

Ok, this seems to work and execute the VM correctly, it also now persists the state (I wasn't aware actually that we are not doing this yet 😛 ). 🎉

This needs some further clean up, will let this open 1-2 days more, but otherwise I would cautiously say this is ready.

Following command gives a run with a clean slate:

rm -Rf [HOME_DIR]/Ethereum/ethereumjs/chaindata && rm -Rf ./statedir tsc -p tsconfig.prod.json && node dist/bin/cli.js

Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and questions, great progress so far 😄

packages/client/lib/sync/fetcher/blockfetcher.ts Outdated Show resolved Hide resolved
packages/client/lib/sync/fetcher/blockfetcher.ts Outdated Show resolved Hide resolved
count: BN
}
import VM from '@ethereumjs/vm'
import { DefaultStateManager } from '@ethereumjs/vm/dist/state'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this was possible, nice that this is possible 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have done this for the first time as well 😄 , think it's good that we are finally getting consumers of the StateManager, think this will give us a better feeling here. The import e.g. is not very optimal going through dist, on the next major release we should likely directly expose through the main index.ts file, have added this as a breaking task within a new release v6 (+ friends 😄 ) planning issue #1024.

this.count = options.count
}
if (!this.config.vm) {
const db = level('./statedir')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory you can just use a single directory to store state, but I find it cleaner to store state into the respective directory (mainnet/ropsten). Can we store it in the chaindata directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, this was one of my clean-up tasks. I just added this since I needed to finish and wanted to have a proof that this works at all.

Slight modified suggestion: I would store this in a parallel statedata directory with the chain (e.g. ropsten) as the base directory, this would need some modifications:

  • datadir in Config should reflect the base directory (so without the chaindata or statedata part
  • Config.getSyncDataDirectory() (used in bin/cli.ts) should be renamed to getSystemDefaultDataDirectory() (or something similar) and also return the (then system specific (somewhat)) base directory
  • The other parts (chaindata or lightchaindata,statedatashould be added on the blockchain and trie instantiations

Does this make sense? If so feel free to just pick up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So under datadir we would have a chaindata and statedata directory? It is currently configured that this datadir can only point to a single chain, so if you want to run both mainnet and ropsten then you'd need two datadirs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, likely datadir should also be without the chain, right? 🤔 So just the upper most level base dir without any configuration selection.

const block: Block = blocks[i]

await this.chain.blockchain.putBlock(block)
await this.vm.runBlockchain()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice trick to only run only block (at most). However, if we want to dump in a chain-reorg (of multiple blocks), then the following will happen.

Let's say we dump 3 blocks of reorg in the chain: A, B and C. If we put in block A, then this will (in most cases) not change the canonical chain. Therefore, blockchain still points to the old chain head. Same happens with B. When we put C, blockchain sees we have a reorg, and therefore rolls back the head pointers to the parent block hash of block A. Then, when we call runBlockchain, it will only run block A, while we expect that it runs up to block C. Can we keep running the blockchain until the head block of the VM pointer in the blockchain does not change after we've called runBlockchain? (Use getHead of blockchain)

Block.fromValuesArray(b.raw(), { common: this.config.common })
)
await this.chain.blockchain.initPromise
for (let i = 0; i < blocks.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works if we only download blocks from a single peer (which seems to be the case). I don't know if it is compatible when we switch to using multiple peers. It is OK for now, but we should make this compatible with multiple peers ASAP. What we probably want is to call a VM update method: this keeps running blocks until no new blocks exist. If this method already runs, then return early, otherwise execute the method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if parallel execution should be prioritized that high, from the runs I've done it seems to me that VM execution (and not block download) is our bottleneck by a large margin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah giving this some thought you are right, parallel peers should not have such a high priority.

protected pool: PeerPool

protected first: BN
protected count: BN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the Fetcher only compatible with Block/BlockHeader downloads. We probably want to extend Fetcher in the future to also support downloading state types, which is not directly supported if we encapsulate Fetcher as this. We will then have to refactor this out later to support this, I am OK with it now (I will probably refactor it out in #1023, where I already moved some methods around).

this.timeout = options.timeout ?? 8000
this.interval = options.interval ?? 1000
this.banTime = options.banTime ?? 60000
this.maxQueue = options.maxQueue ?? 16
this.maxPerRequest = options.maxPerRequest ?? 128
this.maxPerRequest = options.maxPerRequest ?? 25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we change this to 25?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since VM execution makes everything substantially slower I reduced this to have chunks which terminate in a reasonable timeframe (something under 1 minute)

@@ -11,7 +11,7 @@ export interface HeaderFetcherOptions extends BlockFetcherOptions {
* Implements an les/1 based header fetcher
* @memberof module:sync/fetcher
*/
export class HeaderFetcher extends BlockFetcher {
export class HeaderFetcher extends Fetcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this makes a lot of sense, I did something similar in #1023, it doesn't make a lot of sense that HeaderFetcher builds upon BlockFetcher. (In fact, it overrides all methods except tasks)

trie,
})

this.vm = new VM({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like that if we use BlockFetcher, we also require that we start running blocks. This would probably lead to problems when we implement Beam Sync. I think it makes more sense in FullSync.

What we could do to ensure we only run one block at a time, is to add an option in VM.runBlockchain which describes how many blocks we want to run (default: all), which sets maxBlocks of blockchain.iterator. When we putBlocks, in the Chain of Client, we can attach to the updated event, and then start running these new blocks one-by-one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I did this is that I wanted to have this as close as possible to the blockchain.putBlock() call (I even thought that the VM execution might actually be suited best to be placed in the Blockchain class itself (yes - I know - that atm this would cause a circular dependency). I think to have this connected by an event (updated) is not enough, this needs be connected in both directions. So if a VM execution fails, this should also stop block processing, otherwise it is a large waste of resources if blocks are continuedly put in the blockchain when some prior block failed in the VM.

I haven't thought about the beam sync implications though.

Would it be a way to leave this like this for now and have this open for refactoring later? I am just happy that this is working right now and my fear is that we bring this back in a non-working state or at least introduce new problems if we move here (you can proof me wrong though with a PR here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so please always make sure by mainnet test runs that both initial sync from 0 and continued sync from some blocks already stored is still working including VM execution 😁 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well one thing we'd need is a lot of extra tests. Since we don't ban a peer currently if they feed us wrong blocks, this implies that the current (bad) peer is still the "best peer" and will thus be used as the (only) peer to sync these (bad) blocks. So I think it would keep running these bad blocks. It is a good point though, have not thought about the mechanism on what to do if peers feed us bad blocks. You are right that there needs to be a two-way-direction here, we want VM to report bad peers, but then we'd indeed need to know which peer reported that block.

@jochem-brouwer
Copy link
Member

I will continue here a bit (but will create a new branch) to propose some changes.

blocks = blocks.map((b: Block) =>
Block.fromValuesArray(b.raw(), { common: this.config.common })
)
await this.chain.blockchain.initPromise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: internally, in blockchain, initPromise is always awaited on methods where this is necessary.

vm: add maxBlocks option to runBlockchain
@jochem-brouwer
Copy link
Member

Just checked this out, this is extremely cool 😄

@holgerd77
Copy link
Member Author

Speed increase is due to the removed block checkpointing in VM for blocks with zero transactions (see other messages), has nothing to do with the maxRequests reduction. 😀

@holgerd77
Copy link
Member Author

(might also be a hint that there might be something very wrong with the current checkpointing implementation in MPT. I mean creating a checkpoint on a still such tiny trie along these first blocks just can't take do much time. Likely something for a closet look early next year or so 🙂)

@holgerd77
Copy link
Member Author

@jochem-brouwer ah, and just another updated note, seems we were discussing two things at once here respectively you were rather referring to the larger maxPerRequest. When this was higher, there was no VM speed decrease or something, there were just various additional (mostly peer connectivity) related error messages and a sync was more or less not happening (I think I got 1 successful run out of 10 or so). So - yes - at some point this might also be worth to further investigate and have a look why this is happening. The things you stated (pending promises) might play a role here I guess, but I think you have got a much better feeling for real-world behavior on stuff like that.

Note that maxPerRequest is referring to the number of blocks to download though and has nothing to do with the peer pool size (these kind of constant parameters should be named more clearly, here e.g. maxBlocksPerRequest would already be a large improvement, we should likely always do when we stumble upon ambiguous naming).

@holgerd77 holgerd77 force-pushed the client-add-vm-execution branch from 6bb5030 to 4000529 Compare January 1, 2021 16:38
@holgerd77 holgerd77 changed the title Client: add VM execution Client: add VM execution (old) Jan 2, 2021
@holgerd77
Copy link
Member Author

Continued in #1028

@holgerd77 holgerd77 closed this Jan 2, 2021
@holgerd77 holgerd77 deleted the client-add-vm-execution branch January 2, 2021 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants