EVM <> FVM mapping #39

raulk · 2021-10-26T14:31:09Z

This document elaborates on the EVM <> FVM mapping by analyzing various aspects such as the addressing scheme, memory model, storage model, and more.

…ferenced.

raulk · 2021-10-26T19:30:21Z

04-evm-mapping.md

+
+On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation.
+
+However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist.


Actually, this applies more generally. If I pass a handle 0x0000..01 to a smart contract as an input parameter, and it decides to persist it immediately, I would have no opportunity to resolve that handle into the actual address. This solution doesn't work at all.

Yeah, this solution seems brittle.

Stebalien · 2021-10-26T22:26:38Z

04-evm-mapping.md

-## Address semantics
+## Addressing scheme
+
+Ethereum uses 160-bit (20-byte) addresses. Addresses are the keccak-256 hash of the public key of an account, truncated to preserve the 20 rightmost bytes. Solidity and the [Contract ABI spec](https://docs.soliditylang.org/en/v0.5.3/abi-spec.html) represent addresses with the `address` type, equivalent to `uint160`.


This covers account addresses. But see https://ethereum.stackexchange.com/questions/760/how-is-the-address-of-an-ethereum-contract-computed for contract addresses.

Yep, for contract addresses, it's the RLP-encoding of sender + nonce. I'll add that for reference.

Stebalien · 2021-10-27T06:06:11Z

04-evm-mapping.md

+
+1. EVM smart contracts can't send to inexisting, stable account addresses, and rely on account actor auto-creation, as those addressess can't be used with EVM opcodes (see problem 1). Potential solution: have the caller create the account on chain prior to invoking the EVM smart contract.
+2. ID addresses are vulnerable to reorg within the current finality window, so submitting EVM transactions involving actors created recently (900 epochs; 7.5 hours) would be unsafe. Potential solution: have the runtime detect and fail calls involving recently-created actors.
+


So, it would be possible to assign every actor a stable address (including account actors) and use that everywhere. That would mean addresses would be 20 bytes and unambiguous.

However, we'd have to make a few changes:

Account actors currently don't have stable addresses (just key-based addresses).

There's currently no reverse map from ID addresses to stable addresses. We'd likely need to add a "stable address" field to every actor.

But this should be doable.

But there's a whole other can of worms...

In the EVM, it's possible to send funds to any address.

If that address turns out to be the hash of a public key, it's possible to then use that address to send messages (an account).

There's a CREATE2 instruction that allows an actor to create another actor in some "owned" address space (effectively using an actor specific KDF with an actor controlled salt).

If that address turns out to be part of the address space "owned" by another actor, code can later be deployed to this address (and it gets to keep the existing funds).

Basically:

We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.

We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.

We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).

All of this is leading me to believe that we're going to need a bit of an indirection layer. Possibly a registry mapping "EVM" addresses to the rest of the FVM address space.

This is all a bit confusing so I'll try to explain more in the standup. Unfortunately, documentation is scattered and almost universally of the "here's how to take your first steps in Ethereum" form not the "this is how this thing actually works" form.

Account actors currently don't have stable addresses (just key-based addresses).

Aren't pubkey addresses stable addresses? What's the nuance here?

Related: account actors are also bound to an ID at creation, so every actor is guaranteed to have an ID address, which is volatile during the current finality window.

Notes:

CREATE calculates the address by RLP encoding a struct containing the address of the sender (an Externally Owned Account, EAO) and their nonce. Such addresses can be trivially computed ahead of time.

CREATE2 hashes the RLP encoding of [0xff, sender_addr, user_provided_salt, bytecode]). These can be precomputed by knowing the inputs ahead of time, and it's the basis for "counterfactual deployments" -- use cases in which we interact with the contract ahead of time.

Sending ETH to an address doesn't turn it into an EOA; it can still be the target of code deployment. This property is also the basis for some counterfactual interactions.

It is not possible to conduct an appropriation attack by exploiting the knowledge of a future contract address ahead of time.

First, you'd need to defeat has collision resistance to find a private key whose Eth address matches the target contract address (computational expense is estimated to be 2**80 hashes, as per various sources, including this).

Second, as of EIP-684, the protocol aborts CREATE or CREATE2 instructions that generate an address with a non-zero nonce.

In conclusion, even if you found a colliding key, you can do one of two things: (a) not use it, in which case when the contract account is created, you'd be locked out of that address because non-EAO addresses can't perform transactions (I think); or (b) use it, in which case it would be marked as an EO, but its nonce would be non-zero, so CREATE/CREATE2 would abort.

Relevant references (in addition to the yellow paper).

EIP-1014: Skinny CREATE2. Clarifications section elaborates on collisions.

EIP-684: Prevent overwriting contracts

EIP-161: State trie clearing (invariant-preserving alternative)

About counterfactual contract deployments.

We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.

If the pubkey address exists ahead of time, the contract can use a reorg-stable ID address (I'll post a proposal shortly).

If the address doesn't exist ahead of time, this becomes harder because the CALL opcode consumes a single word for the recipient address (and probably truncates it to 160 bits), yet our pubkey addresses can span up to 2 Ethereum words.

We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.

Yes, 100% agreed.

We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).

Yes, but this should be straightforward IMO; we'd generate an f2 address using the user-provided inputs to assemble the preimage passed to address.NewActorAddress(preimage). The output can be a reorg-stable ID address (which I'm defining in a subsequent PR).

Aren't pubkey addresses stable addresses? What's the nuance here?

Sorry, f2 address. You're right, "stable" just means "not f0".

Stebalien · 2021-10-27T06:09:01Z

04-evm-mapping.md

+1. The worst case scenario is larger than the width of the Ethereum address type. Even if BLS addresses were prohibited in combination with EVM actors, class 1 and class 2 still miss the limit by 1 byte (due to the prefix).
+2. It exceeds the EVM's 256 bit architecture.
+
+Problem 1 renders Solidity smart contracts instantly incompatible with the Filecoin addressing scheme, as well as EVM opcodes that take or return addresses for arguments, e.g. CALLER, CALL, CALLCODE, DELEGATECALL, COINBASE, etc. This problem is hard to work around, and would require a fork of the EVM to modify existing opcodes for semantic awareness of addresses (although this is really hard to get right), or to introduce a Filecoin-specific opcode family to deal Filecoin addresses (e.g. FCALL, FCALLCODE, etc.) The latter would break as-is deployability of existing smart contracts.


Note: CALLER and COINBASE (and likely others) won't have this issue. All runtime APIs (in the current VM) return ID addresses, but accept (and resolve) other address types.

Stebalien · 2021-10-27T06:09:15Z

04-evm-mapping.md

+
+On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation.
+
+However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist.


Yeah, this solution seems brittle.

raulk

Notes from our sync discussion.

We dissected various solutions, including EVM <> FVM mapping: f4 address class to solve reorg-instability of ID addresses #40 and segregating the Ethereum address space entirely via a new address class.
The solution we converged on a priori is to turn the f2 class into a unified stable address space all actors, including account actors who are also identified by their f1/f3 pubkey addresses.
That is, account actors also get an f2 address.
Length: f2 addresses have a fixed 20-byte payloads, so they could be used as-is inside the EVM dropping the prefix.
Calculation of f2 addresses:
- For f1 account actors (secp256k1): identical payload (already a blake2b-160 hash of pubkey).
- For f3 account actors (bls): blake2b-160 hash of pubkey.
- For FVM native actors: current algorithm is preserved (hash of sender, nonce, number of actors created in the message)
- For EVM foreign actors: preserves Ethereum semantics for CREATE and CREATE2.
Colission probability: 2^80, same as Ethereum network. Link: informal proposal to increase space to 256 bits

To support "prospective actor interactions" (or what Ethereum calls "counterfactural deployments"), we need:

The ability to send value to f2 actors that don't exist yet.
The concept of "undistinguished actors", that graduate out of that status into an actual typed actor by one of two events:
- A transaction being sent from an f1 or f3 address that hashes to an existing f2 undistinguished actor => actor typed as an account actor, and pubkey address linked.
- An existing actor creating a typed actor, with an explicit on-chain message to the InitActor, or within code.
The code of the "undistinguished actor" could handle the conversion.

raulk · 2021-10-28T12:39:09Z

The InitActor will need to be extended with f2 indices.

raulk · 2021-10-28T12:55:25Z

Proposed address taxonomy

Class	Desc	Actor type	Payload width	Total width	Payload value	Usage	Stable?
0	ID address	All	1-9 bytes	2-10 bytes	uvarint64 counter	Internal, compact representation in state tree; unsafe to use externally until final	N
1	secp256k1 pubkey (account actor)	Account	20 bytes	21 bytes	blake2b-160 hash of secp256k1 pubkey	Externally, to refer to an account actor with its pubkey	Y
2	universal actor address	All	20 bytes	21 bytes	protocol-derived from relevant cryptographic material	Externally and internally to refer to any actor	Y
3	bls pubkey (account actor)	Account	48 bytes	49 bytes	inlined bls public key	Externally, to refer to an account actor with its pubkey	Y

raulk · 2021-10-28T16:55:17Z

Given that we're doing a address taxonomy revision, @Stebalien and I agreed to expand the universal actor address space to 32 bytes. With a high probability, the hash function will remain BLAKE2, with a 256-bit digest size.

The proposal consists of adding a class 4 that serves as the 256-bit canonical universal actor address. Class 2 will serve as an alias/symlink to the corresponding canonical f4 address.

Considerations:

The EVM VM would use prefixless f2 addresses. The FVM will translate those to f4 addresses for free, under the hood.
The state tree would track f2 and f4 address for all actors, maintaining the relevant indices to ID addresses (which is how ultimately actors are keyed in the state tree).
Migration:
- For BLS account actors, the input material to derive f4 addresses is inlined (the BLS public key), so backfilling f4 addresses is trivial.
- The input material is not available in the state tree for secp256k1 account actors (pubkey) and non-account actors (creator, nonce, number of interim actors created), so we need to resort to coercion by left or right padding the f2 address to form a synthetic f4 address.
  - This is an acceptable compromise, because with an 2^256 address space, collision resistance should be quantum-safe.

)

Stebalien

Main things we need to discuss are:

Code CIDs (want to make sure we're on the same page).
Address computation.
Gas.

But I'm happy to "discuss" by submitting a followup PR.

Stebalien · 2021-11-12T15:07:20Z

04-evm-mapping.md

+}
+```
+
+Notice that EVM foreign actors are typed in the state tree with a CodeCID that does not correspond to their EVM bytecode. Instead, the CodeCID points to the WASM bytecode of the **EVM foreign runtime**.


This should just be the CID of the EVM itself. We currently use strings, but we only do that because the code doesn't live on-chain.

Yep, that's what I meant to say but it was a long-winded way of doing so. I'll adjust the text.

Stebalien · 2021-11-12T15:09:08Z

04-evm-mapping.md

+
+### Mechanics and interfaces
+
+EVM smart contracts (also known as EVM foreign actors in Filecoin terminology), are represented this way in the Filecoin state tree:


Let's explicitly state that it's just an actor.

I.e., that the following object is a normal actor object (or just leave off the explicit actor struct).

Stebalien · 2021-11-12T15:31:14Z

04-evm-mapping.md

+3. For FVM native actors, the preimage is `sender || nonce || # of actors created during message execution`.
+4. For EVM foreign actors, the preimage is inherited from CREATE and CREATE2.


Unfortunately, this would make the EVM "special". I'd prefer not to have "special" address construction methods for every foreign runtime.

I'd rather use the same algorithm everywhere (EVM and FVM): sender || target code cid || salt || init params. This isn't identical to how the EVM currently works, but provides the same guarantees.

We'd need to do a bit of research on how predictable addresses are used, but I believe they're mostly used by external tooling for some payment channel use-cases. If that's the case, it should be pretty trivial to drop-in a replacement function as long as that replacement function has the same inputs.

Stebalien · 2021-11-12T15:35:01Z

04-evm-mapping.md

+
+Class 2 and 4 addresses are protocol-derived, by hashing the relevant cryptographic or input material.
+
+With a high probability, the hash function of class 4 will remain BLAKE2, with a 256-bit digest size. Class 2 will continue relying on blake2b-160.


So... we could save some space if we define class 2 as a truncation of class 4. Instead of mapping both class-2 and class-4 addresses to ID addresses, we'd map class 2 addresses to (ID, class4-suffix) tuples.

We'd want to determine how much space this map is really taking up on chain before making any decisions (this optimization may not be worth it).

Stebalien · 2021-11-12T15:40:20Z

04-evm-mapping.md

+
+## Gas accounting and execution halt semantics
+
+The execution halt is determined by Filecoin gas and not by EVM gas. Therefore, EVM runtime is made to run with unlimited gas. The FVM is responsible for metering execution and halting it when gas limits are exceeded. Refer to the [Gas accounting](01-architecture.md#gas-accounting) section of the Architecture doc for more details.


This isn't quite what we discussed. Top-level messages will be run with infinite (well, max int) EVM gas. But an EVM contract may call other contracts with some limited amount of gas and that gas limit should be respected.

Basically:

We obey both gas models at the same time.

The EVM gas model starts with infinite gas.

I believe this may be important for contracts that need to be able to invoke untrusted contracts with some limited amount of gas.

Aside: I'm wondering if we can optimize EVM execution by being a bit looser about gas usage. If infinite gas is available, we could go down a codepath that doesn't bother tracking gas.

Alternatively, we could let the message sender specify the amount of EVM gas. Basically, when doing gas estimation, the caller would try the message with infinite EVM gas. Then, the caller would replace the "infinite" EVM gas with the used EVM gas (times some reasonable overestimation multiplier).

Given a high enough overestimation, this reduces the chances of spuriously running out of EVM gas. However, having a somewhat realistic gas value lets the contract make decisions based on the amount of gas left.

As long as we allow the message sender to specify the EVM gas limit, they can pick their model based on their use-case. If, e.g., they have a contract that really cares about EVM gas, they can pass in a gas estimation. Otherwise, they can pass in "infinite" gas.

Stebalien · 2021-11-12T15:41:01Z

04-evm-mapping.md

+
+## Blockchain timing
+
+Ethereum target block times are ~10 seconds, whereas Filecoin's is ~30 seconds. A priori, this difference has no impact on the protocol or this spec, but it may impact the behaviour of smart contracts ported over from Ethereum that expect 10-second block timing.


Note: we could consider translating epochs, but we probably shouldn't.

Stebalien · 2021-11-12T15:41:51Z

04-evm-mapping.md

+* `GASLIMIT`: returns the gas limit as per Filecoin gas system.
+* `CHAINID`: returns a fixed value `0`.
+* `GAS`: returns the gas remaining as per Filecoin gas system. One divergence from Ethereum is the return value does not include the full cost of this operation (because the cost of stack copy and program advance is not known when the value is captured).
+* `COINBASE`: returns the Filecoin class 4 address of the block producer including this message.


Class 2? Class 4 won't work until we get 32 byte addresses.

Stebalien · 2021-11-12T15:43:15Z

04-evm-mapping.md

+* `GASLIMIT`: returns the gas limit as per Filecoin gas system.
+* `CHAINID`: returns a fixed value `0`.
+* `GAS`: returns the gas remaining as per Filecoin gas system. One divergence from Ethereum is the return value does not include the full cost of this operation (because the cost of stack copy and program advance is not known when the value is captured).


Using the FIlecoin gas here diverges from what near does and may cause more problems than it solves because the gas models won't match up.

raulk · 2022-01-07T14:21:07Z

I'm going to merge this and will iterate on master.

raulk added 4 commits October 26, 2021 15:30

EVM <> FVM mapping: memory model.

459fe92

EVM <> FVM mapping: add version of the Ethereum yellow paper being re…

a81c284

…ferenced.

EVM <> FVM mapping: add a ToC.

1ad5a06

EVM <> FVM mapping: add addressing scheme analysis.

2d59250

raulk force-pushed the raulk/evm-mapping branch from 11a1962 to 2d59250 Compare October 26, 2021 19:23

raulk commented Oct 26, 2021

View reviewed changes

Stebalien reviewed Oct 27, 2021

View reviewed changes

raulk commented Oct 28, 2021

View reviewed changes

raulk added 2 commits October 30, 2021 14:08

EVM <> FVM mapping: introduce f4 canonical universal actor address. (#44

e23b1dd

)

finish EVM <> FVM spec sans logs/events and crypto.

55c2c76

raulk mentioned this pull request Nov 2, 2021

Add a VM to Filecoin (EVM, WASM, SES, LLVM, etc) filecoin-project/FIPs#113

Closed

raulk marked this pull request as ready for review November 11, 2021 12:39

raulk requested a review from Stebalien November 11, 2021 12:39

raulk mentioned this pull request Nov 11, 2021

analyse SputnikVM in detail as the basis of the EVM foreign runtime filecoin-project/ref-fvm#773

Closed

Stebalien approved these changes Nov 12, 2021

View reviewed changes

raulk mentioned this pull request Dec 2, 2021

Filecoin Core Devs Meeting 32 Agenda filecoin-project/core-devs#77

Closed

raulk merged commit c0a9f0b into main Jan 7, 2022

raulk deleted the raulk/evm-mapping branch January 7, 2022 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVM <> FVM mapping #39

EVM <> FVM mapping #39

raulk commented Oct 26, 2021

raulk Oct 26, 2021

Stebalien Oct 27, 2021

Stebalien Oct 26, 2021

raulk Oct 27, 2021

Stebalien Oct 27, 2021

Stebalien Oct 27, 2021

raulk Oct 27, 2021 •

edited

Loading

raulk Oct 27, 2021 •

edited

Loading

raulk Oct 27, 2021 •

edited

Loading

Stebalien Oct 28, 2021

Stebalien Oct 27, 2021

Stebalien Oct 27, 2021

raulk left a comment

raulk commented Oct 28, 2021

raulk commented Oct 28, 2021

raulk commented Oct 28, 2021 •

edited

Loading

Stebalien left a comment

Stebalien Nov 12, 2021

raulk Nov 15, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

Stebalien Nov 12, 2021

raulk commented Jan 7, 2022


		On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation.

		However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist.


		1. EVM smart contracts can't send to inexisting, stable account addresses, and rely on account actor auto-creation, as those addressess can't be used with EVM opcodes (see problem 1). Potential solution: have the caller create the account on chain prior to invoking the EVM smart contract.
		2. ID addresses are vulnerable to reorg within the current finality window, so submitting EVM transactions involving actors created recently (900 epochs; 7.5 hours) would be unsafe. Potential solution: have the runtime detect and fail calls involving recently-created actors.


		### Mechanics and interfaces

		EVM smart contracts (also known as EVM foreign actors in Filecoin terminology), are represented this way in the Filecoin state tree:

		3. For FVM native actors, the preimage is `sender \|\| nonce \|\| # of actors created during message execution`.
		4. For EVM foreign actors, the preimage is inherited from CREATE and CREATE2.


		Class 2 and 4 addresses are protocol-derived, by hashing the relevant cryptographic or input material.

		With a high probability, the hash function of class 4 will remain BLAKE2, with a 256-bit digest size. Class 2 will continue relying on blake2b-160.


		## Gas accounting and execution halt semantics

		The execution halt is determined by Filecoin gas and not by EVM gas. Therefore, EVM runtime is made to run with unlimited gas. The FVM is responsible for metering execution and halting it when gas limits are exceeded. Refer to the [Gas accounting](01-architecture.md#gas-accounting) section of the Architecture doc for more details.


		## Blockchain timing

		Ethereum target block times are ~10 seconds, whereas Filecoin's is ~30 seconds. A priori, this difference has no impact on the protocol or this spec, but it may impact the behaviour of smart contracts ported over from Ethereum that expect 10-second block timing.

EVM <> FVM mapping #39

EVM <> FVM mapping #39

Conversation

raulk commented Oct 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

raulk Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

raulk Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk left a comment

Choose a reason for hiding this comment

raulk commented Oct 28, 2021

raulk commented Oct 28, 2021

Proposed address taxonomy

raulk commented Oct 28, 2021 • edited Loading

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk commented Jan 7, 2022

raulk Oct 27, 2021 •

edited

Loading

raulk Oct 27, 2021 •

edited

Loading

raulk Oct 27, 2021 •

edited

Loading

raulk commented Oct 28, 2021 •

edited

Loading