TEVM Transpiled EVM: accelerate EVM improvement R&D, but learning from eWASM

TEVM

EVM bytecode may be transpiled into a different representation for more efficient execution and for other benefits. Perhaps the most important benefit is related to the change process - it is very hard to change EVM gradually, with every step being relatively small and backwards compatible at the same time.

Transpilation of EVM into Web Assembly (for performance reasons) was the idea behind starting eWASM (ethereum Web Assembly) project. Lessons need to be learnt from this. For example, are these statements true?:

One EVM instruction gets translated into many Web Assembly instructions, thus creating additional overhead.
Executing many Web Assembly instructions takes longer than executing a corresponding single EVM instruction. Having higher-level instruction may be inconvenient (as referred later), but it does have a benefit of amortising the cost of the interpreter loop.
Web Assembly code is harder to meter efficiently than EVM bytecode, again due to the finer granularity of operations. Because of this finer granularity, the relative overhead of metering for Web Assembly code can be much higher than for EVM bytecode.
Problems above are not unsolveable

EVM, being a quite low-level virtual machine, has certain features that would belong to a high-level one, and these features make analysis of EVM code and optimisations harder than they should be. TEVM is an attempt to design another Virtual Machine, in such a way that EVM bytecode can be transpiled into TEVM code. Then, the TEVM can be executed instead of original bytecode, with the same result (perhaps up to some caveats). As mentioned above, splitting higher level instructions into lower-level ones needs to be done with a great care, because overhead of interpreting and metering increases with the increased granularity of operations.

Here is a sample (not complete) list of features that EVM has, but TEVM would not have, and some ideas of how to transpile these features into TEVM.

Path-dependent I/O (state access). EVM is lacking intra-frame communication facilities, like Transient Storage (https://eips.ethereum.org/EIPS/eip-1153) that would be useful to implement things like mutexes. Instead, the contract storage is used for such things, and the relatively high cost of SSTORE and SLOAD operations led to the situation where the gas costs of these operations are highly variable dependening on what EVM was doing before (assuming some form of caching happening). This situation resembles "invisible" CPU caches creates a lot of drawback for various analysis and optimisations. In TEVM, there would instead be state accessing operations that have constant gas cost, and never assume any caching. In addition, there would be instructions for associative memory (more complex than EIP-1153, of course, to allow for namespaces and write protection where required) that can be used for caching.
Pre-compiles. Precompiles were introduced as short-cuts mostly for compute-intensive cryptographic operations. Although they do solve important problem, the solution is far from elegant. Current idea is inspired by the work done on EVM384 and its generalisation, EVMn (where n - length of the word), to implement cryptography in TEVM.

Features that TEVM would have that EVM does not, may include:

Operation immediates, to allow more optimal implementation of cryptography, as well as static jumps.
Priviledged operations (only available in the "system/kernel" mode), with the ability to perform state modifications that are forbidden in EVM, for example, changing the balances of the accounts. This may allow expressing things like mining reward in TEVM instead of hard-coding it into the consensus engine.

Path forward

The most immediate use-case for TEVM is the separation of Consensus Engine. Here is the problem: if consensus engine is separated from the Core, we can image how such functions as VerifyHeader and ChooseBestHeader, or SealHeader would work. But there is another important function, let's call it FinaliseBlock. For EtHash, for example, this is where miner rewards get added. Since this needs to happen for every single block, and it is not clear how to make this work across the interface (because giving out mining reward requires write-access to the state). Currently the idea of the solution (not the easiest thing to do though) is to express "FinaliseBlock" as a piece of code. First thought was to express it in EVM. But EVM does not have instructions to simply AddBalance to an account. Using privileged TEVM code would help. Solving this use case does not require full functionality of EVM, but it might be an interesting experiment of replacing EVM interpreter with TEVM interpreter, and adding transpilation step at deploy time and at sync time for already deployed contract.

Once the first use case is solved, and necessary infrastruture for TEVM interpreter and EVM->TEVM transpiler is created, multiple further experiments can be started:

Converting existing and future (BLS curve) precompiles to TEVM (with extended word instructions) and tailoring TEVM architecture and instructions for optimal result.
Unrolling SSTORE and SLOAD instructions into TEVM code that uses new instructions for associative memory, some control flow. Also, to match the gas cost of the SSTORE and SLOAD, transpiler may need to emit gas balancing instructions (priviledged).
Turning static analysis that builds up Control Flow Graph into TEVM code that directly supports static jumps and branching via specialised instructions.

On priviledged instructions

So far, couple of use cases for priviledged instructions in TEVM appeared. Firsly, adding balance to an account for emulating mining rewards. Secondly, gas balancing instructions that can be inserted into TEVM instruction set, to add or subtract to/from current gas counter to make sure the unrolling of SLOAD, SSTORE and precomplies match their EVM gas costs. But what are the privileged instructions? TEVM can run in "user" mode and "system" (priviledged) mode. Priviledged instructions can only be executed in the "system" mode. When does the TEVM work in the "system" (priviledged) mode? So far two cases make sense:

When executing TEVM code generated by the EVM->TEVM transpiler.
When executing TEVM code given by consensus engine to be added at the block finalisation (mining rewards).
When executing irregular state transitions (e.g. DAO hard fork).

Potential endgame

If the development of TEVM, driven by use cases of modularity, optimisation, static analysis, goes well, it may reaches the state where it is a virtual machine without pre-compiles, with fixed-cost instructions, and potentially some other nice things, like superior interpreter speed. At that point the question may arise about replacement of EVM with TEVM and Ethereum supporting TEVM natively, and then perhaps sun-setting EVM. Given the current change process in Ethereum, it is unlikely that serious improvements can make it into EVM in reasonable time. Instead, we can expect more and more modifications trying to balance backwards compatibility and some pressing need from application developers, and due to this balance being sub-optimal for the EVM architecture. Given the above, TEVM might be a much more realistic way to actually make real improvement in EVM at reasonable speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEVM Transpiled EVM: accelerate EVM improvement R&D, but learning from eWASM

TEVM

Path forward

On priviledged instructions

Potential endgame

Clone this wiki locally