-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIP-615: Subroutines and Static Jumps for the EVM #615
Comments
This is a very useful proposal, especially for anything that involves decompilation of bytecode. I have a few questions regarding the variable access section. |
Yeah, it's to support local variables on the stack that are too far away to reach with DUP or SWAP. So more efficient bytecode, and also a better target for transpiling from eWASM. |
Thanks. What is the semantics of GETLOCAL n, when n is greater than the number of arguments specified in BEGINSUB? I'd suspect an exception is thrown in that case? It seems that the number of return variables specified in BEGINSUB, isn't used anywhere. |
Could you please clarify if GETLOCAL n, where n is positive, is accessing an argument or local variable? It seems that the former is true, but it would be good to be more explicit about this in the proposal. |
Actually, the following is really confusing to me:
Shouldn't this be rather "set FP to SP" or "set FP to SP - n_args" because the arguments have already been placed on the stack at this point. |
Perhaps, Seed. It confuses me too. It's been a long while since I wrote and implemented this, so it will take a little study. Getting near bedtime here, and I've got 100 miles of driving to get home tomorrow, so please be patient. It's on my stack to do a cleanup pass and new PR for this and 616, so I much appreciate your review. (If you read C++ your questions might be answered in the code at https://github.com/ethereum/cpp-ethereum/tree/develop/libevm) |
We started specifying the semantics of new instructions in Lem. One of the things we noticed is that it makes more sense to setup the stack-frame with the BEGINSUB instruction because we have the number of function arguments at hand. It also makes sense when compared native assembly like x86, where the stack-frame is created in the body of the function, not by the CALL instruction. I'll be continuing the specification next week and will probably have more comments to make. |
You ask
It pushes FP[-n] on the stack.
Close. |
As for setting up the stack frame, the semantics is given as: Execution of a subroutine begins with
I'm not sure it matters where or in what order anything in between encountering the JUMPSUB and executing the BEGINSUB is described as happening, so long as the program winds up executing the instruction after the BEGINSUB with the correct stack. So get the semantics right and we can get the English right. |
@seed I've edited to make clear that SP is mu-sub-s in the yellow paper, and removed the reference in the text to SP being the stack size. The rest of the pointer arithmetic in the text is (I think) now consistent with the stack growing towards lower addresses, so that positive offsets off the stack pointer and negative offsets off the frame pointer both reach into the stack frame. The validation pseudocode I have not fixed. |
Thanks for updating the proposal. It does make a lot more sense. Note that μs growing towards lower addresses doesn't make much sense to me in the context of the yellow paper because the EVM doesn't expose stack addresses in any way. Let me try to confirm my understanding of the proposal with an example: where μs[0] = local2, μs[1] = local1, μs[2] = arg2, ... Few questions: |
@seed The EVM doesn't expose stack addresses, but in 9.5 the Yellow Paper says, "Stack items are added or removed from the left-most, lower-indexed portion of the series." And throughout Appendix H we have notation like
the stack looks like
and |
@gcolvin for some reason I had assumed FP = SP at entry of subroutine. Never mind, with the example it all makes sense now, thanks. I would actually suggest adding the example to the proposal. |
The proposal uses the minus sign (-) instead of the em-dash (—) symbols in a few places which can lead to confusion.
I believe the first - should be a —. Also the syntax introduced, FP[-n], doesn't seem to be used elsewhere.
Here again em-dashes should be used. |
You are right on the punctuation. I was too lazy to crawl through layers of menus to find the em-dash. I took the array syntax from the yellow paper (e.g. µs[0]). It would be nice if markdown wasn't too primitive (that is, more primitive than what we used 40 years ago) to support subscripts, but might we still do better to switch from SP and FP to µs and µf? You are also right about the example diagram. |
We finished the Lem implementation of JUMPIF, JUMPTO, JUMPSUB, RETURNSUB, BEGINSUB, GETLOCAL, PUTLOCAL. Few questions/comments resulting from this:
To properly test both the Lem code and the cpp-Ethereum client at this point we need the Solidity compiler to output subroutine instructions. |
It can compile "Solidity inline assembly" programs to EVM 1.5:
using results in the following bytecode:
|
@axic thanks, I'll look into it. |
@seed it is not tested since the testing client (cpp-ethereum) build we have didn't support it. It could be added if some submits a PR. |
@seed Are you talking about initializing the return stack in the cpp-ethereum code or in the validation pseudocode? No JUMPDEST is involved. Just a JUMPSUB to a BEGINSUB, and a RETURNSUB back to the JUMPSUB. The x86 CALL instruction pushes the PC on the stack and RET pops it and jumps to it. Then the PC gets incremented in the usual way. That was the model I had in mind, it's how the C++ code is written for JUMP, and I think it comports with (136) in the Yellow Paper. (The C++ code is not written to match this model for RETURNSUB, although the behavior is the same.) |
So far as a formal spec (or implementation) goes I think you can specify it however you like, so long as the PC winds up in the right place. It could be changed here too if it helps, but I find it more clear the way it is. |
For formal analysis purposes, all of this is implementable right now as an intermediate language that's easily compilable to evm. PUTLOCAL and GETLOCAL are indeed very useful but there's no need for separate opcodes - stack could be mapped to negative memory and accessed with MLOAD/MSTORE (ie. the stack starts at (-32) and every push goes downwards). Space of one-byte opcodes is limited. Why not add it as a precompiled contract? Ie. the very first instruction (in EVM) calls it with code as an argument. The gas cost of that particular call could be set to zero, in effect working only as a costless (except storage) marking for a different vm version. This would totally avoid problems due to deprecating opcodes and backwards compatibility, allowing for a much more fundamental changes. In principle I very much dislike the idea of breaking backwards compatibility. What if someone has a compiled contract on some private blockchain? It would be impossible to deploy on the public one without recompilation. What if there's some private or just unknown language that compiles to evm? Its compiler would have to be rewritten. Even if one doesn't exist at the moment (most likely), potential ones that may exist may not be written at all, because the risk of compatibility-breaking changes would be judged as too high. For all these reasons modern processors still support old 16 bit code - even including undocumented accidental instructions (ie. ffreep)! |
The difference with PUTLOCAL and GETLOCAL is that they have to respect frame boundaries. This proposal was meant to break compatibility less than eWasm, which breaks it completely. |
Respect how? You replied previously that they don't throw "It pushes FP[-n] on the stack.". So they act as memory access just with fp as the offset. |
Validity condition 3. Programs that try to get or put a stack slot outside of the local frame are invalid. Memory access isn't constrained that way. |
Here a very brief review. Who said static analysis is valuable?This proposal recommends a super-large change, deprecating op-codes and introducing an entire subroutine mechanism into the machine for the benefit "tightening the security". The justification is static analysis of (prospective) contracts will be easier and that formal methods are "essential [tools] against loss". The proposal will be much stronger if examples (a single example?) are included where static analysis has previously been successful in detecting any problem. I don't know of any. My understanding is just the opposite -- every bug with $1M+ loss on Ethereum has been due to ****-poor coding and failure to pay bug bounties higher than the cost of a nice dinner. This is already possible WITHOUT implementing the EIPCertain statements are made such as:
Here is a 5-minute analysis of an actual contract.
BAM! The practical cost of static analysis is already reduced by an order of magnitude from the theoretical statement above. (Because they are already using literal jumps.) Stronger support and more analysis of existing contracts should be presented in this EIP before recommending such a large change. PS
|
I don't know if this is a super-large proposal or not. A fair comparison might be to eWasm, which provides essentially the same functionality. There is no current eWasm EIP and implementation for direct comparison, but the Wasm spec itself is 150 pages, and only floating point and a few other non-deterministic operations aren't relevant. eWasm also requires an Environment Interface between the VM and the client, and a Transcompiler of EVM to eWasm bytecode for backwards compatibility. The ewasm/design repo, which is currently mostly empty, gives a feel for the scope of the effort remaining. This spec is about 15 pages, and it took about 300 lines of C++ to add it to the aleth interpreter. It is backwards-compatible by default. @bmann @expede Boris, Brooke and I have spoken with VM and program specification and proof experts, program auditors, providers of tools like disassemblers, decompilers, lints, and others. They use static analysis, find dynamic jumps a serious obstacle, and support this proposal. Perhaps an appendix of links to their companies and repos would help. This spec also supports fast, near-linear-time compilation to machine code, LLVM IR, or Wasm, with no quadratic "compiler bombs." The current EVM makes that difficult-to-impossible. Thanks for noticing typos. I've removed the "edited-date' field, as Github now maintains a history of edits. |
I mean... lots of people. Entire companies and divisions have sprung up around smart contract analysis (generally assisted by a checker). For example, here's a bunch of automated smart contract security and analysis. And you're right: it's hard to do right now, largely because of the difficult to trace control flow. This proposal makes this much more tractable in a way that is in-line with modern VMs such as Wasm and the JVM. We've spoken with a bunch of folks from SecurETH who feel that adding structured control flow will substantially improve the depth and reliability of their analyses.
Yes, and you still need these bounties for previously undiscovered cases. However, humans are notoriously bad at doing thorough analyses, and automated tools to check that properties hold is faster, cheaper, and works every time. Why not eliminate entire classes of bug before deployment, regardless of coding skill or size of cheque book?
** deep sigh ** Oh sweet summer child. Were it only that simple. EDIT expanded first section for clarity |
I can get behind @expede 's comment and say that this EIP would be beneficial to researchers working on formal verification at ConsenSys and at Diligence, specifically. |
Happy to see this EIP getting traction again! I can assist in getting this formalized in KEVM a.k.a. the Jello paper. |
Which class of bug is solved by the proposed kind of static analysis? Has one bug ever been found? Evidence here would be much more persuasive in making arguments than citing other people that agree static analysis is valuable. Yes, it literally is that simple. Just replace |
FormalizationFirst off, thank you for your work on the KEVM! Formalization of EIP-615 in the JP would be amazing 🙌 As an aside, you may have already seen this EM thread, but just in case: Jello Paper as Canonical EVM Spec PerformanceFor performance we don't have any hard numbers as of yet. We'll need to look at actual real-world cases to give you a better idea (some funding in this direction would really help such a study, if anyone has a line to grants or contracts). As part of a larger strategy, we have lots of ideas for improving the speed of the EVM (for mainnet, this is most likely via incremental JIT optimization), most of which are not touched on here. AOT compilation is more viable for private EVM-based networks, which isn't the primary concern here, but also good to be aware of. For EIP-615, below are a couple back-of-the-napkin calculations for optimizations off the top of my head (to at least give you a flavour). They are made easier when you can trace control flow more easily. Today we can already infer some cases (ex. direct Client Gas Aggregationie: (structural) cost dynamics We could do a lot of this already without EIP-615 (but don't for some reason?). However, this proposal can help with anything involving loops (especially tight loops), can be improved anywhere from 3-50% (based on ops/instruction, and for cases that can be statically found to converge, but that is in theory most/all of what we do on the EVM). Loop UnrollingStatic unrolling (removing jumps and inlining code) can see a large improvement for common cases (40-80% from lower bookkeeping overhead). This can also be done dynamically in more complex cases, but obviously this improvement is generally smaller (roughly 20-50%). Most improvement if compiled to native by the client where hardware optimization kicks in (predictive branch execution, &c). More study needed to know if this is practical at the relative small size of most Ethereum transactions (we're not doing heavy numeric analysis or machine learning). Parallel Executionvia loop dependency analysis Not viable in all implementations (only parallel and concurrent languages), but some multiple improvement based on how many isolated branches or BTW, I would love feedback on the above (both additions and subtractions!). The more of these we have up our sleeve, the better we can improve both the EVM (and the Ethereum-specific portions of eWasm.) |
Yes, these are isomorphic. However, the cases where we don't do exactly As a total aside for fun, it's worth noting that we're having essentially the same discussion here as folks had about structured programming in the 1970s. IIRC, this is the classic Dijkstra paper about
Since we're talking about all possible paths through a program, there are many types of bug that this can catch. Tracing if code can be jumped to from a specific point can trigger subtle bugs. Off the top of my head, we can statically check for the following:
|
@MrChico @expede The paper is too vague on performance, and Brooke's examples would help. The main point is that these optimizations can be done in at worst N log N time, whereas dynamic jumps can force the optimizer into N**2 time. The relative performance of native C, Wasm, and EVM compilers vs. EVM interpreters can be seen in this graph for a few tests. We can't get to native C, but we can do OK. |
Want to link EIP-1712 to here: #1712 |
I think Greg closed this because a lot of the discussion moved to https://ethereum-magicians.org/t/eip-615-subroutines-and-static-jumps-for-the-evm/2728. |
eip: 615
title: Subroutines and Static Jumps for the EVM
status: Draft
type: Standards Track
category: Core
author: Greg Colvin greg@colvin.org, Brooklyn Zelenka (@expede), Paweł Bylica (@chfast), Christian Reitwiessner (@chriseth)
discussions-to: https://ethereum-magicians.org/t/eip-615-subroutines-and-static-jumps-for-the-evm-last-call/3472
created: 2016-12-10
Simple Summary
In the 21st century, on a blockchain circulating billions of ETH, formal specification and verification are an essential tool against loss. Yet the design of the EVM makes this unnecessarily difficult. Further, the design of the EVM makes near-linear-time compilation to machine code difficult. We propose to move forward with proposals to resolve these problems by tightening EVM security guarantees and reducing barriers to performance.
Abstract
EVM code is currently difficult to statically analyze, hobbling critical tools for preventing the many expensive bugs our blockchain has experienced. Further, none of the current implementations of the Ethereum Virtual Machine—including the compilers—are sufficiently performant to reduce the need for precompiles and otherwise meet the network's long-term demands. This proposal identifies dynamic jumps as a major reason for these issues, and proposes changes to the EVM specification to address the problem, making further efforts towards a safer and more performant the EVM possible.
We also propose to validate—in near-linear time—that EVM contracts correctly use subroutines, avoid misuse of the stack, and meet other safety conditions before placing them on the blockchain. Validated code precludes most runtime exceptions and the need to test for them. And well-behaved control flow and use of the stack makes life easier for interpreters, compilers, formal analysis, and other tools.
Motivation
Currently the EVM supports only dynamic jumps, where the address to jump to is an argument on the stack. Worse, the EVM fails to provide ordinary, alternative control flow facilities like subroutines and switches provided by Wasm and most CPUs. So dynamic jumps cannot be avoided, yet they obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since many jumps can potentially be to any jump destination in the code, the number of possible paths through the code can go up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.
However, given Ethereum's security requirements, near-linear
n log n
time complexity is essential. Otherwise, Contracts can be crafted or discovered with quadratic complexity to use as denial of service attack vectors against validations and optimizations.But absent dynamic jumps code can be statically analyzed in linear time. That allows for linear time validation. It also allows for code generation and such optimizations as can be done in
log n
time to comprise ann log n
time compiler.And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including
The result is that all of the following validations and optimizations can be done at deployment time with near-linear
(n log n)
time complexity.Specification
Dependencies
Proposal
We propose to deprecate two existing instructions—
JUMP
andJUMPI
—and propose new instructions to support their legitimate uses. In particular, it must remain possible to compile Solidity and Vyper code to EVM bytecode, with no significant loss of performance or increase in gas price.Especially important is efficient translation to and from eWasm and to machine code. To that end we maintain a close correspondence between Wasm, x86, ARM and proposed EVM instructions.
Preliminaries
These forms
name an
INSTRUCTION
with no, one and two arguments, respectively. An instruction is represented in the bytecode as a single-byte opcode. Any arguments are laid out as immediate data bytes following the opcode inline, interpreted as fixed length, MSB-first, two's-complement, two-byte positive integers. (Negative values are reserved for extensions.)Branches and Subroutines
The two most important uses of
JUMP
andJUMPI
are static jumps and return jumps. Conditional and unconditional static jumps are the mainstay of control flow. Return jumps are implemented as a dynamic jump to a return address pushed on the stack. With the combination of a static jump and a dynamic return jump you can—and Solidity does—implement subroutines. The problem is that static analysis cannot tell the one place the return jump is going, so it must analyze every possibility (a heavy analysis).Static jumps are provided by
To support subroutines,
BEGINSUB
,JUMPSUB
, andRETURNSUB
are provided. Brief descriptions follow, and full semantics are given below.Switches, Callbacks, and Virtual Functions
Dynamic jumps are also used for
O(1)
indirection: an address to jump to is selected to push on the stack and be jumped to. So we also propose two more instructions to provide for constrained indirection. We support these with vectors ofJUMPDEST
orBEGINSUB
offsets stored inline, which can be selected with an index on the stack. That constrains validation to a specified subset of all possible destinations. The danger of quadratic blow up is avoided because it takes as much space to store the jump vectors as it does to code the worst case exploit.Dynamic jumps to a
JUMPDEST
are used to implementO(1)
jumptables, which are useful for dense switch statements. Wasm and most CPUs provide similar instructions.Dynamic jumps to a
BEGINSUB
are used to implementO(1)
virtual functions and callbacks, which take at most two pointer dereferences on most CPUs. Wasm provides a similar instruction.Variable Access
These operations provide convenient access to subroutine parameters and local variables at fixed stack offsets within a subroutine. Otherwise only sixteen variables can be directly addressed.
Local variable
n
is the nth stack item below the frame pointer,FP[-n]
, as defined below.Data
There needs to be a way to place unreachable data into the bytecode that will be skipped over and not validated. Indirect jump vectors will not be valid code. Initialization code must create runtime code from data that might not be valid code. And unreachable data might prove useful to programs for other purposes.
Structure
Valid EIP-615 EVM bytecode begins with a valid header. This is the magic number ‘\0evm’ followed by the semantic versioning number '\1\5\0'. (For Wasm the header is '\0asm\1').
Following the header is the BEGINSUB opcode for the main routine. It takes no arguments and returns no values. Other subroutines may follow the main routine, and an optional BEGINDATA opcode may mark the start of a data section.
Semantics
Jumps to and returns from subroutines are described here in terms of
JUMPSUB
andJUMPSUBV
offsets.We will adopt the following conventions to describe the machine state:
PC
is (as usual) the byte offset of the currently executing instruction.SP
corresponds to the Yellow Paper's substates
of the machine state.SP[0]
is where a new item is can be pushed on the stack.SP[1]
is the first item on the stack, which can be popped off the stack.FP
is set toSP + n_args
at entry to the currently executing subroutine.FP - SP
, is the frame size.The frame pointer and return stacks are internal to the subroutine mechanism, and not directly accessible to the program. This is necessary to prevent the program from modifying its own state in ways that could be invalid.
Execution of EVM bytecode begins with the main routine with no arguments,
SP
andFP
set to 0, and with one value on the return stack—code_size - 1
. (Executing the virtual byte of 0 after this offset causes an EVM to stop. Thus executing aRETURNSUB
with no priorJUMPSUB
orJUMBSUBV
—that is, in the main routine—executes aSTOP
.)Execution of a subroutine begins with
JUMPSUB
orJUMPSUBV
, whichPC
on the return stack,FP
on the frame stackFP
toSP + n_args
, andPC
to the specifiedBEGINSUB
addressExecution of a subroutine is suspended during and resumed after execution of nested subroutines, and ends upon encountering a
RETURNSUB
, whichFP
to the top of the virtual frame stack and pops the stack,SP
toFP + n_results
,PC
to top of the return stack and pops the stack, andPC
to the next instructionthus resuming execution of the enclosing subroutine or main routine. A
STOP
orRETURN
also ends the execution of a subroutine.For example, starting from this stack,
and after pushing two arguments and branching with
JUMPSUB
to aBEGINSUB 2, 3
and initializing three local variables
the stack looks like this
After some amount of computation the stack could look like this
and after
RETURNSUB
would look like thisValidity
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state, but we must validate code in linear time. So our validation does not consider the code’s data and computations, only its control flow and stack use. This means we will reject programs with invalid code paths, even if those paths are not reachable. Most conditions can be validated, and will not need to be checked at runtime; the exceptions are sufficient gas and sufficient stack. As such, static analysis may yield false negatives belonging to well-understood classes of code requiring runtime checks. Aside from those cases, we can validate large classes at validation time and with linear complexity.
Execution is as defined in the Yellow Paper—a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines five such states.
We propose to expand and extend the Yellow Paper conditions to handle the new instructions we propose.
To handle the return stack we expand the conditions on stack size:
Given our more detailed description of the data stack we restate condition 3—stack underflow—as
Since the various
DUP
andSWAP
operations—as well asPUTLOCAL
andGETLOCAL
—are defined as taking items off the stack and putting them back on, this prevents them from accessing data below the frame pointer, since taking too many items off of the stack would mean thatSP
is less thanFP
.To handle the new jump instructions and subroutine boundaries, we expand the conditions on jumps and jump destinations.
We have two new conditions on execution to ensure consistent use of the stack by subroutines:
Finally, we have one condition that prevents pathological uses of the stack:
In practice, we must test at runtime for conditions 1 and 2—sufficient gas and sufficient stack. We don’t know how much gas there will be, we don’t know how deep a recursion may go, and analysis of stack depth even for non-recursive programs is nontrivial.
All of the remaining conditions we validate statically.
Costs & Codes
All of the instructions are
O(1)
with a small constant, requiring just a few machine operations each, whereas aJUMP
orJUMPI
typically does anO(log n)
binary search of an array ofJUMPDEST
offsets before every jump. With the cost ofJUMPI
being high and the cost ofJUMP
being mid, we suggest the cost ofJUMPV
andJUMPSUBV
should be mid,JUMPSUB
andJUMPIF
should be low, andJUMPTO
and the rest should be verylow. Measurement will tell.We suggest the following opcodes:
Backwards Compatibility
These changes would need to be implemented in phases at decent intervals:
If desired, the period of deprecation can be extended indefinitely by continuing to accept code not versioned as new—but without validation. That is, by delaying or canceling phase 2.
Regardless, we will need a versioning scheme like EIP-1702 to allow current code and EIP-615 code to coexist on the same blockchain.
Rationale
This design was highly constrained by the existing EVM semantics, the requirement for eWasm compatibility, and the security demands of the Ethereum environment. It was also informed by the lead author's previous work implementing Java and Scheme interpreters. As such there was very little room for alternative designs.
As described above, the approach was simply to deprecate the problematic dynamic jumps, then ask what opcodes were necessary to provide for the features they supported. These needed to include those provided by eWasm, which themselves were modeled after typical hardware. The only real innovation was to move the frame pointer and the return pointer to their own stacks, so as to prevent any possibility of overwriting them. (Although Forth also uses a return stack.) This allowed for treating subroutine arguments as local variables, and facilitated the return of multiple values.
Implementation
Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. The bulk of the effort is the validator, which in most languages can almost be transcribed from the pseudocode above.
A lightly tested C++ reference implementation is available in Greg Colvin's Aleth fork. This version required circa 110 lines of new interpreter code and a well-commented, 178-line validator.
Appendix A
Validation
Validation comprises two tasks:
We sketch out these two validation functions in pseudo-C below. To simplify the presentation only the five primitives are handled (
JUMPV
andJUMPSUBV
would just add more complexity to loop over their vectors), we assume helper functions for extracting instruction arguments from immediate data and managing the stack pointer and program counter, and some optimizations are forgone.Validating Jumps
Validating that jumps are to valid addresses takes two sequential passes over the bytecode—one to build sets of jump destinations and subroutine entry points, another to check that addresses jumped to are in the appropriate sets.
Note that code like this is already run by EVMs to check dynamic jumps, including building the jump destination set every time a contract is run, and doing a lookup in the jump destination set before every jump.
Subroutine Validation
This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertices and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is
O(|E|+|V|)
: The sum of the number of edges and number of vertices in the graph.The basic approach is to call
validate_subroutine(i, 0, 0)
, fori
equal to the first instruction in the EVM code through eachBEGINDATA
offset.validate_subroutine()
traverses instructions sequentially, recursing whenJUMP
andJUMPI
instructions are encountered. When a destination is reached that has been visited before it returns, thus breaking cycles. It returns true if the subroutine is valid, false otherwise.Appendix B
EVM Analysis
There is a large and growing ecosystem of researchers, authors, teachers, auditors, and analytic tools--providing software and services focused on the correctness and security of EVM code. A small sample is given here.
Some Tools
Some Papers
*A Lem formalization of EVM 1.5
Copyright
Copyright and related rights waived via CC0.
The text was updated successfully, but these errors were encountered: