Stateless/pure transaction processing #195

pipermerriam · 2017-12-05T03:35:16Z

We need a pure function for applying transactions.

As inputs it would take:

the current state (including state root, tx index, etc)
the transaction to be processed
a witness proof for all of the parts of the state that the transaction can/will touch.

It would output the resulting state.

Implementation is up-in-the-air but current leading idea is to process the transaction as normal, modifying the state as we go and then discarding the state changes, only returning the updated state. There is an underlying assumption that our state database engine supports lazy fetching of parts of the overall state.

Tasks

Adding a new db wrapper - TrackedDB: Adding a new db wrapper - TrackedDB #204
Removing the transaction logic from the Block objects
Implement VMState Object Implement State object #236
Refactoring VM, VMState, and Computation

The text was updated successfully, but these errors were encountered:

pipermerriam · 2017-12-06T17:34:15Z

Thoughts on implementation. I suspect that one of the first things this will need is a database wrapper which keeps track of all of the parts of the state trie which get touched as part of transaction processing. I think it would be good to do this as a subclass of evm.db.state.StateDB. It may also require some changes to the ChainDB class since it is the entry point for interacting with the state database.

The StateDB would be responsible for collecting all the touched keys.
The ChainDB would be responsible for persisting the set of all of the touched keys each time the statedb is used.

hwwhww · 2017-12-07T20:07:21Z

@pipermerriam

The ChainDB would be responsible for persisting the set of all of the touched keys each time the statedb is used.

Q1: Is the `ChainDB` here means `BaseChainDB` class? Or we have to abstract `BaseChainDB` and create a new `ChainDB` subclass?

Q2: `pure function for applying transactions` feature is for both shard chain and main chain?

If we make both types of chains have purified add_transaction function, something different if that it returns State in main chain and StateObj in shard chain where StateObj only contains properties.

Q3: What would purified function need

In our prerious Q&A, you mentioned about the minimal changes to the current apply_transaction API: That statement was in reference to these pieces of functionality and anything that allows it to access external APIs directly like the chaindb. This would not be a significant architecture change.

Do you mean we modify evm.vm.base.VM.apply_transaction?
^^^^ I may fully mistook, but if it is, that is for non-stateless case:
evm.vm.base.VM.apply_transaction: py-evm/base.py at master · ethereum/py-evm · GitHub

+  def apply_transaction(prev_state, transaction, db):
+       self.chaindb = db
 
        # ....somehow update it to be pure function....
	
        computation = self.execute_transaction(transaction)
        self.clear_journal()
+       state = self.block.add_transaction(transaction, computation, db)
        return computation, state

# Usage: use VM as a disposable container object?
vm = <latest_fork_vm>(header=header, chaindbchaindb)
db = chaindb.clone()
statedb = chaindb.get_state_db(self.block.header.state_root, read_only=False, 	stateless=True)
prev_state = State(statedb, root_hash= prev_state_obj.root_hash, read_only=False)
state = vm.apply_transaction(prev_state, transaction, db)

On the other hand, note that @vbuterin suggested that:

A chain object should NOT be necessary to process a block
Recent headers and hashes should be part of the state object

I am trying to combine both of your suggestions an hope we can have more pictures of the ultimate goal - the true pure function:

Update evm.vm.rlp.block.BaseBlock
- Remove BaseBlock.add_transaction function
Update evm.vm.forks.frontier.blocks.FrontierBlock
- Remove FrontierBlock.add_transaction function
For main chain, add
- Pure evm.vm.base.apply_transaction function
- Pure evm.vm.forks.byzantium.blocks.add_transaction function
For shard chain, add
- Pure evm.vm.base.apply_shard_transaction function
- Pure evm.vm.sharding.collations.add_transaction function
- Pure evm.vm.base.apply_shard_transaction_stateless function
- Pure evm.vm.sharding.collations.add_transaction_stateless function

Description of add_transaction_stateless

Design of add_transaction:
- For shard chain, create new vm_class
- Pull out some logic of
  - evm.vm.base.VM.apply_transaction (py-evm/base.py at master · ethereum/py-evm · GitHub)
  - evm.vm.forks.frontier.FrontierBlock.make_receipt (py-evm/blocks.py at master · ethereum/py-evm · GitHub)

apply_transaction triggers add_transaction

def apply_transaction(prev_state_obj, chaindb, tx):
    statedb = chaindb.get_state_db(self.block.header.state_root, read_only=False, 	stateless=True)
    state = State(statedb, root_hash= prev_state_obj.root_hash, read_only=False)
    add_transaction(state, tx)
    state_obj = StateObj(root=state.root_hash.....)
    
    return state_obj, statedb.get_reads(), statedb.get_reads()

How to call apply_transaction

# In apply_collation
# ......
for tx in block.transactions:
    state_obj, _reads, _writes = apply_transaction(stateobj, db, tx)
    db = union(db, _writes)
# .....

Q4: I want to make sure when would `ChainDB` need to store the set of all of the touched keys.

Main chain client: they don’t need witness to apply transaction
Shard archival node:
- Cache recent tx / collation witness data
  - ^^^^^ This is the only case that ChainDB store the touched keys?
- [TBD] Do they need to cache visited account key-value?
Shard stateless client:
- The touched keys(reads, writes set) would be union during processing txs. So the ChainDB may only need to store latest union set.

Q5: If the scenario of Q4 is right, regarding to updating `BaseChainDB`, my instinct for direction is:

Updating BaseChainDB.get_state_db(state_root, read_only)

    def get_state_db(self, state_root, read_only, stateless=False):
        db = StateDB(self.db) if stateless else self.db
        return State(db=db, root_hash=state_root, read_only=read_only)

And only shard client would call this function.

Sorry for so many questions! Thank you for your time.

pipermerriam · 2017-12-08T18:38:12Z

Q1: Is the ChainDB here means BaseChainDB class?

My bad, anywhere you see ChainDB I mean BaseChainDB.

pipermerriam · 2017-12-08T18:39:49Z

Q2: pure function for applying transactions feature is for both shard chain and main chain?

This makes sense but I didn't see a question. If there is one can you clarify?

pipermerriam · 2017-12-08T18:42:56Z

Q3: What would purified function need

Everything you state under this section looks good and inline with my thinking. Removing the transaction logic from the Block objects seems like a nice isolated first step that can be done independently. I would suggest moving that API up into the VM class as VM.add_transaction_to_block(block, transaction). This could even be implemented in a pure form such that it doesn't mutate the block object but rather initializes a new one and returns it.

RE: vbuterin's comment: A chain object should NOT be necessary to process a block

Yes and No.

Yes in that the Chain class is largely just a convenience wrapper around the VM class when it comes to applying transactions.
No in that in the case where the list of previous headers crosses a fork boundry (early headers are in fork rules A, later headers are in fork rules B). In this case you'll need something above the VM to be able to retrieve the appropriate headers for the previous VM rules. This currently shouldn't be an issue since all of the VM classes share the same header RPL object but I suspect that will change at some point so we should be prepared for that.

pipermerriam · 2017-12-08T18:48:53Z

Q4: I want to make sure when would ChainDB need to store the set of all of the touched keys.

Everything you say here is inline with my understanding.

I'm not sure if this is the right approach, but it may be useful to

pipermerriam · 2017-12-08T18:51:45Z

Q5: If the scenario of Q4 is right, regarding to updating BaseChainDB, my instinct for direction is:

This looks like a solid approach but I'll point the following out. The StateDB object is ephemeral in that it comes into existence as a context manager when accessing the state is necessary and then is discarded after. That means that the ChainDB will need to be responsible for persisting the touched keys. This may be fine if the db instance passed into the StateDB is where the tracking occurs, at which point the StateDB can be blissfully unaware that all of the keys it touches are being tracked.

pipermerriam · 2017-12-08T18:52:09Z

@hwwhww I think I answered everything you asked. Please follow up if you need clarification on anything.

hwwhww · 2018-01-13T06:20:08Z

close via #247

pipermerriam mentioned this issue Dec 5, 2017

Sharding Roadmap #190

Closed

15 tasks

hwwhww added the eth2.0 label Dec 5, 2017

pipermerriam changed the title ~~STUB: Stateless/pure transaction processing~~ Stateless/pure transaction processing Dec 5, 2017

hwwhww self-assigned this Dec 6, 2017

hwwhww mentioned this issue Dec 7, 2017

Adding a new db wrapper - TrackedDB #204

Merged

hwwhww added the PR state: WIP label Dec 7, 2017

hwwhww mentioned this issue Dec 20, 2017

Pure apply_transaction #235

Closed

hwwhww removed the PR state: WIP label Dec 21, 2017

hwwhww mentioned this issue Jan 3, 2018

Refactored VM, VMState, and Block + new apply_transaction #247

Merged

hwwhww closed this as completed Jan 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateless/pure transaction processing #195

Stateless/pure transaction processing #195

pipermerriam commented Dec 5, 2017 •

edited by hwwhww

Loading

pipermerriam commented Dec 6, 2017

hwwhww commented Dec 7, 2017

pipermerriam commented Dec 8, 2017 •

edited

Loading

pipermerriam commented Dec 8, 2017 •

edited

Loading

pipermerriam commented Dec 8, 2017 •

edited

Loading

pipermerriam commented Dec 8, 2017

pipermerriam commented Dec 8, 2017

pipermerriam commented Dec 8, 2017

hwwhww commented Jan 13, 2018

Stateless/pure transaction processing #195

Stateless/pure transaction processing #195

Comments

pipermerriam commented Dec 5, 2017 • edited by hwwhww Loading

Tasks

pipermerriam commented Dec 6, 2017

hwwhww commented Dec 7, 2017

Q1: Is the ChainDB here means BaseChainDB class? Or we have to abstract BaseChainDB and create a new ChainDB subclass?

Q2: pure function for applying transactions feature is for both shard chain and main chain?

Q3: What would purified function need

Q4: I want to make sure when would ChainDB need to store the set of all of the touched keys.

Q5: If the scenario of Q4 is right, regarding to updating BaseChainDB, my instinct for direction is:

pipermerriam commented Dec 8, 2017 • edited Loading

pipermerriam commented Dec 8, 2017 • edited Loading

pipermerriam commented Dec 8, 2017 • edited Loading

pipermerriam commented Dec 8, 2017

pipermerriam commented Dec 8, 2017

pipermerriam commented Dec 8, 2017

hwwhww commented Jan 13, 2018

pipermerriam commented Dec 5, 2017 •

edited by hwwhww

Loading

Q1: Is the `ChainDB` here means `BaseChainDB` class? Or we have to abstract `BaseChainDB` and create a new `ChainDB` subclass?

Q2: `pure function for applying transactions` feature is for both shard chain and main chain?

Q4: I want to make sure when would `ChainDB` need to store the set of all of the touched keys.

Q5: If the scenario of Q4 is right, regarding to updating `BaseChainDB`, my instinct for direction is:

pipermerriam commented Dec 8, 2017 •

edited

Loading

pipermerriam commented Dec 8, 2017 •

edited

Loading

pipermerriam commented Dec 8, 2017 •

edited

Loading