eip | title | description | author | discussions-to | status | type | category | created | requires |
---|---|---|---|---|---|---|---|---|---|
3540 |
EOF - EVM Object Format v1 |
EOF is an extensible and versioned container format for EVM bytecode with a once-off validation at deploy time. |
Alex Beregszaszi (@axic), Paweł Bylica (@chfast), Andrei Maiboroda (@gumb0), Matt Garnett (@lightclient) |
Review |
Standards Track |
Core |
2021-03-16 |
3541, 3860 |
We introduce an extensible and versioned container format for the EVM with a once-off validation at deploy time. The version described here brings the tangible benefit of code and data separation, and allows for easy introduction of a variety of changes in the future. This change relies on the reserved byte introduced by EIP-3541.
To summarise, EOF bytecode has the following layout:
magic, version, (section_kind, section_size_or_sizes)+, 0, <section contents>
On-chain deployed EVM bytecode contains no pre-defined structure today. Code is typically validated in clients to the extent of JUMPDEST
analysis at runtime, every single time prior to execution. This poses not only an overhead, but also a challenge for introducing new or deprecating existing features.
Validating code during the contract creation process allows code versioning without an additional version field in the account. Versioning is a useful tool for introducing or deprecating features, especially for larger changes (such as significant changes to control flow, or features like account abstraction).
The format described in this EIP introduces a simple and extensible container with a minimal set of changes required to both clients and languages, and introduces validation.
The first tangible feature it provides is separation of code and data. This separation is especially beneficial for on-chain code validators (like those utilised by layer-2 scaling tools, such as Optimism), because they can distinguish code and data (this includes deployment code and constructor arguments too). Currently, they a) require changes prior to contract deployment; b) implement a fragile method; or c) implement an expensive and restrictive jump analysis. Code and data separation can result in ease of use and significant gas savings for such use cases. Additionally, various (static) analysis tools can also benefit, though off-chain tools can already deal with existing code, so the impact is smaller.
A non-exhaustive list of proposed changes which could benefit from this format:
- Including a
JUMPDEST
-table (to avoid analysis at execution time) and/or removingJUMPDEST
s entirely. - Introducing static jumps (with relative addresses) and jump tables, and disallowing dynamic jumps at the same time.
- Multibyte opcodes without any workarounds.
- Representing functions as individual code sections instead of subroutines.
- Introducing special sections for different use cases, notably Account Abstraction.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 and RFC 8174.
In order to guarantee that every EOF-formatted contract in the state is valid, we need to prevent already deployed (and not validated) contracts from being recognized as such format. This is achieved by choosing a byte sequence for the magic that doesn't exist in any of the already deployed contracts.
If code starts with the MAGIC
, it is considered to be EOF formatted, otherwise it is considered to be legacy code. For clarity, the MAGIC
together with a version number n is denoted as the EOFn prefix, e.g. EOF1 prefix.
EOF-formatted contracts are created using new instructions which are introduced in a separate EIP.
The opcode 0xEF
is currently an undefined instruction, therefore: It pops no stack items and pushes no stack items, and it causes an exceptional abort when executed. This means legacy initcode or already deployed legacy code starting with this instruction will continue to abort execution.
Unless otherwised specified, all integers are encoded in big-endian byte order.
We introduce code validation for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.
Legacy code is not affected by EOF code validation.
Code validation is performed during contract creation, and is elaborated on in separate EIPs. The EOF format itself and its formal validation are described in the following sections.
EOF container is a binary format with the capability of providing the EOF version number and a list of EOF sections.
The container starts with the EOF prefix:
description | length | value | |
---|---|---|---|
magic | 2-bytes | 0xEF00 | |
version | 1-byte | 0x01–0xFF | EOF version number |
The EOF prefix is followed by at least one section header. Each section header contains two fields, section_kind
and either section_size
or section_size_list
, depending on the kind. section_size_list
is a list of size values when multiple sections of this kind are allowed, encoded as a count of items followed by the items.
description | length | value | |
---|---|---|---|
section_kind | 1-byte | 0x01–0xFF | uint8 |
section_size | 2-bytes | 0x0000–0xFFFF | uint16 |
section_size_list | dynamic | n/a | uint16, uint16+ |
The list of section headers is terminated with the section headers terminator byte 0x00
. The body content follows immediately after.
version
MUST NOT be0
.section_kind
MUST NOT be0
. The value0
is reserved for section headers terminator byte.- There MUST be at least one section (and therefore section header).
- Stray bytes outside of sections MUST NOT be present. This includes trailing bytes after the last section.
EOF version 1 is made up of several EIPs, including this one. Some values in this specification are only discussed briefly. To understand the full scope of EOF, it is necessary to review each EIP in-depth.
The EOF version 1 container consists of a header
and body
.
container := header, body
header :=
magic, version,
kind_type, type_size,
kind_code, num_code_sections, code_size+,
[kind_container, num_container_sections, container_size+,]
kind_data, data_size,
terminator
body := types_section, code_section+, container_section*, data_section
types_section := (inputs, outputs, max_stack_height)+
note: ,
is a concatenation operator, +
should be interpreted as "one or more" of the preceding item, *
should be interpreted as "zero or more" of the preceding item, and [item]
should be interpreted as an optional item.
name | length | value | description |
---|---|---|---|
magic | 2 bytes | 0xEF00 | |
version | 1 byte | 0x01 | EOF version |
kind_type | 1 byte | 0x01 | kind marker for type section |
type_size | 2 bytes | 0x0004-0x1000 | 16-bit unsigned big-endian integer denoting the length of the type section content, 4 bytes per code section |
kind_code | 1 byte | 0x02 | kind marker for code size section |
num_code_sections | 2 bytes | 0x0001-0x0400 | 16-bit unsigned big-endian integer denoting the number of the code sections |
code_size | 2 bytes | 0x0001-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the code section content |
kind_container | 1 byte | 0x03 | kind marker for container size section |
num_container_sections | 2 bytes | 0x0001-0x0100 | 16-bit unsigned big-endian integer denoting the number of the container sections |
container_size | 2 bytes | 0x0001-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the container section content |
kind_data | 1 byte | 0x04 | kind marker for data size section |
data_size | 2 bytes | 0x0000-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the data section content (*) |
terminator | 1 byte | 0x00 | marks the end of the header |
(*) For not yet deployed containers this can be greater than the actual content length.
name | length | value | description |
---|---|---|---|
types_section | variable | n/a | stores code section metadata |
inputs | 1 byte | 0x00-0x7F | number of stack elements the code section consumes |
outputs | 1 byte | 0x00-0x7F | number of stack elements the code section returns |
max_stack_height | 2 bytes | 0x0000-0x03FF | maximum number of elements ever placed onto the operand stack by the code section |
code_section | variable | n/a | arbitrary bytecode |
container_section | variable | n/a | arbitrary EOF-formatted container |
data_section | variable | n/a | arbitrary sequence of bytes |
See EIP-4750 for more information on the type section content.
NOTE: A special value of outputs
being 0x80
is designated to denote non-returning functions as defined in a separate EIP.
The following validity constraints are placed on the container format:
types_size
is divisible by4
- the number of code sections must be equal to
types_size / 4
- data body length may be shorter than
data_size
for a not yet deployed container - the total size of a container must not exceed
MAX_INITCODE_SIZE
(as defined in EIP-3860)
For an EOF contract:
- Execution starts at the first byte of code section 0
CODESIZE
,CODECOPY
,EXTCODESIZE
,EXTCODECOPY
,EXTCODEHASH
,GAS
are rejected by validation in EOF contracts, with no replacementsCALL
,DELEGATECALL
,STATICCALL
are rejected by validation in EOF contracts, replacement instructions to be introduced in a separate EIP.EXTDELEGATECALL
(DELEGATECALL
replacement) from an EOF contract to a non-EOF contract (legacy contract, EOA, empty account) is disallowed, and it should fail in the same mode as if the call depth check failed. We allow legacy to EOF path for existing proxy contracts to be able to use EOF upgrades.
For a legacy contract:
- If the target account of
EXTCODECOPY
is an EOF contract, then it will copy up to 2 bytes fromEF00
, as if that would be the code. - If the target account of
EXTCODEHASH
is an EOF contract, then it will return0x9dbf3648db8210552e9c4f75c6a1c3057c0ca432043bd648be15fe7be05646f5
(the hash ofEF00
, as if that would be the code). - If the target account of
EXTCODESIZE
is an EOF contract, then it will return 2.
NOTE Like for legacy targets, the aforementioned behavior of EXTCODECOPY
, EXTCODEHASH
and EXTCODESIZE
does not apply to EOF contract targets mid-creation, i.e. those report same as accounts without code.
EVM and/or account versioning has been discussed numerous times over the past years. This proposal aims to learn from them. See "Ethereum account versioning" on the Fellowship of Ethereum Magicians Forum for a good starting point.
This specification introduces creation time validation, which means:
- All created contracts with EOFn prefix are valid according to version n rules. This is very strong and useful property. The client can trust that the deployed code is well-formed.
- In the future, this allows to serialize
JUMPDEST
map in the EOF container and eliminate the need of implicitJUMPDEST
analysis required before execution. - Or to completely remove the need for
JUMPDEST
instructions. - This helps with deprecating EVM instructions and/or features.
- The biggest disadvantage is that deploy-time validation of EOF code must be enabled in two hard-forks. However, the first step (EIP-3541) is already deployed in London.
The alternative is to have execution time validation for EOF. This is performed every single time a contract is executed, however clients may be able to cache validation results. This alternative approach has the following properties:
- Because the validation is consensus-level execution step, it means the execution always requires the entire code. This makes code merkleization impractical.
- Can be enabled via a single hard-fork.
- Better backwards compatibility: data contracts starting with the
0xEF
byte or the EOF prefix can be deployed. This is a dubious benefit, however.
-
The first byte
0xEF
was chosen because it is reserved for this purpose by EIP-3541. -
The second byte
0x00
was chosen to avoid clashes with three contracts which were deployed on Mainnet:0xca7bf67ab492b49806e24b6e2e4ec105183caa01
:EFF09f918bf09f9fa9
0x897da0f23ccc5e939ec7a53032c5e80fd1a947ec
:EF
0x6e51d4d9be52b623a3d3a2fa8d3c5e3e01175cd0
:EF
-
No contracts starting with
0xEF
bytes exist on public testnets: Goerli, Ropsten, Rinkeby, Kovan and Sepolia at their London fork block.
NOTE: This EIP MUST NOT be enabled on chains which contain bytecodes starting with MAGIC
and not being valid EOF.
The version number 0 will never be used in EOF, so we can call legacy code EOF0. Also, implementations may use APIs where 0 version number denotes legacy code.
We have considered different questions for the sections:
- Streaming headers (i.e.
section_header, section_data, section_header, section_data, ...
) are used in some other formats (such as WebAssembly). They are handy for formats which are subject to editing (adding/removing sections). That is not a useful feature for EVM. One minor benefit applicable to our case is that they do not require a specific "header terminator". On the other hand they seem to play worse with code chunking / merkleization, as it is better to have all section headers in a single chunk. - Whether to have a header terminator or to encode
number_of_sections
ortotal_size_of_headers
. Both raise the question of how large of a value these fields should be able to hold. A terminator byte seems to avoid the problem of choosing a size which is too small without any perceptible downside, so it is the path taken. - (EOF1) Whether to encode section sizes as fixed 16-bit values or some kind of variable length field (e.g. LEB128). We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see EIP-170 and EIP-3860). Should this be limiting in the future, a new EOF version could change the format. Besides simplifying client implementations, not using LEB128 also greatly simplifies on-chain parsing.
- Whether or not to have more structure to the container header for all EOF versions to follow. In order to allow future formats optimized for chunking and merkleization (verkleization) it was decided to keep it generic and specify the structure only for a specific EOF version.
See section Lack of EXTDATACOPY
in EIP-7480.
Currently contracts can selfdestruct in three different ways (directly through SELFDESTRUCT
, indirectly through CALLCODE
and indirectly through DELEGATECALL
). EIP-3670 disables the first two possibilities, however the third possibility remains. Allowing EOF1 contracts to only DELEGATECALL
other EOF1 contracts allows the following strong statement: EOF1 contract can never be destructed. Attacks based on SELFDESTRUCT
completely disappear for EOF1 contracts. These include destructed library contracts (e.g. Parity Multisig).
Imposing an EOF-validation time limit for the size of EOF containers provides a reference limit of how large the containers should EVM implementations be able to handle when validating and processing containers. MAX_INITCODE_SIZE
was chosen for EOF1, as it is what contract creation currently allows for.
Given one of the main reasons for the limit is to avoid attack vectors on JUMPDEST
-analysis, and EOF removes the need for JUMPDEST
-analysis and introduces a cost structure for deploy-time analysis, in the future this limit could be increased or even lifted for EOF.
This is a breaking change given that any code starting with 0xEF
was not deployable before (and resulted in exceptional abort if executed), but now some subset of such codes can be deployed and executed successfully.
The choice of MAGIC
guarantees that none of the contracts existing on the chain are affected by the new rules.
With the anticipated EOF extensions, the validation is expected to have linear computational and space complexity. We think that the validation cost is sufficiently covered by:
- EIP-3860 for initcode,
- high per-byte cost of deploying code.
Copyright and related rights waived via CC0.