Skip to content
This repository has been archived by the owner on Aug 18, 2022. It is now read-only.

EVM <> FVM mapping #39

Merged
merged 6 commits into from
Jan 7, 2022
Merged

EVM <> FVM mapping #39

merged 6 commits into from
Jan 7, 2022

Conversation

raulk
Copy link
Member

@raulk raulk commented Oct 26, 2021

This document elaborates on the EVM <> FVM mapping by analyzing various aspects such as the addressing scheme, memory model, storage model, and more.


On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation.

However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this applies more generally. If I pass a handle 0x0000..01 to a smart contract as an input parameter, and it decides to persist it immediately, I would have no opportunity to resolve that handle into the actual address. This solution doesn't work at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this solution seems brittle.

## Address semantics
## Addressing scheme

Ethereum uses 160-bit (20-byte) addresses. Addresses are the keccak-256 hash of the public key of an account, truncated to preserve the 20 rightmost bytes. Solidity and the [Contract ABI spec](https://docs.soliditylang.org/en/v0.5.3/abi-spec.html) represent addresses with the `address` type, equivalent to `uint160`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for contract addresses, it's the RLP-encoding of sender + nonce. I'll add that for reference.


1. EVM smart contracts can't send to inexisting, stable account addresses, and rely on account actor auto-creation, as those addressess can't be used with EVM opcodes (see problem 1). Potential solution: have the caller create the account on chain prior to invoking the EVM smart contract.
2. ID addresses are vulnerable to reorg within the current finality window, so submitting EVM transactions involving actors created recently (900 epochs; 7.5 hours) would be unsafe. Potential solution: have the runtime detect and fail calls involving recently-created actors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it would be possible to assign every actor a stable address (including account actors) and use that everywhere. That would mean addresses would be 20 bytes and unambiguous.

However, we'd have to make a few changes:

  1. Account actors currently don't have stable addresses (just key-based addresses).
  2. There's currently no reverse map from ID addresses to stable addresses. We'd likely need to add a "stable address" field to every actor.

But this should be doable.

But there's a whole other can of worms...

  1. In the EVM, it's possible to send funds to any address.
  2. If that address turns out to be the hash of a public key, it's possible to then use that address to send messages (an account).
  3. There's a CREATE2 instruction that allows an actor to create another actor in some "owned" address space (effectively using an actor specific KDF with an actor controlled salt).
  4. If that address turns out to be part of the address space "owned" by another actor, code can later be deployed to this address (and it gets to keep the existing funds).

Basically:

  1. We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.
  2. We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.
  3. We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).

All of this is leading me to believe that we're going to need a bit of an indirection layer. Possibly a registry mapping "EVM" addresses to the rest of the FVM address space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all a bit confusing so I'll try to explain more in the standup. Unfortunately, documentation is scattered and almost universally of the "here's how to take your first steps in Ethereum" form not the "this is how this thing actually works" form.

Copy link
Member Author

@raulk raulk Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Account actors currently don't have stable addresses (just key-based addresses).

Aren't pubkey addresses stable addresses? What's the nuance here?

Related: account actors are also bound to an ID at creation, so every actor is guaranteed to have an ID address, which is volatile during the current finality window.

Copy link
Member Author

@raulk raulk Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • CREATE calculates the address by RLP encoding a struct containing the address of the sender (an Externally Owned Account, EAO) and their nonce. Such addresses can be trivially computed ahead of time.
  • CREATE2 hashes the RLP encoding of [0xff, sender_addr, user_provided_salt, bytecode]). These can be precomputed by knowing the inputs ahead of time, and it's the basis for "counterfactual deployments" -- use cases in which we interact with the contract ahead of time.
  • Sending ETH to an address doesn't turn it into an EOA; it can still be the target of code deployment. This property is also the basis for some counterfactual interactions.
  • It is not possible to conduct an appropriation attack by exploiting the knowledge of a future contract address ahead of time.
    • First, you'd need to defeat has collision resistance to find a private key whose Eth address matches the target contract address (computational expense is estimated to be 2**80 hashes, as per various sources, including this).
    • Second, as of EIP-684, the protocol aborts CREATE or CREATE2 instructions that generate an address with a non-zero nonce.
    • In conclusion, even if you found a colliding key, you can do one of two things: (a) not use it, in which case when the contract account is created, you'd be locked out of that address because non-EAO addresses can't perform transactions (I think); or (b) use it, in which case it would be marked as an EO, but its nonce would be non-zero, so CREATE/CREATE2 would abort.

Relevant references (in addition to the yellow paper).

Copy link
Member Author

@raulk raulk Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.

If the pubkey address exists ahead of time, the contract can use a reorg-stable ID address (I'll post a proposal shortly).

If the address doesn't exist ahead of time, this becomes harder because the CALL opcode consumes a single word for the recipient address (and probably truncates it to 160 bits), yet our pubkey addresses can span up to 2 Ethereum words.

  1. We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.

Yes, 100% agreed.

  1. We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).

Yes, but this should be straightforward IMO; we'd generate an f2 address using the user-provided inputs to assemble the preimage passed to address.NewActorAddress(preimage). The output can be a reorg-stable ID address (which I'm defining in a subsequent PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't pubkey addresses stable addresses? What's the nuance here?

Sorry, f2 address. You're right, "stable" just means "not f0".

1. The worst case scenario is larger than the width of the Ethereum address type. Even if BLS addresses were prohibited in combination with EVM actors, class 1 and class 2 still miss the limit by 1 byte (due to the prefix).
2. It exceeds the EVM's 256 bit architecture.

Problem 1 renders Solidity smart contracts instantly incompatible with the Filecoin addressing scheme, as well as EVM opcodes that take or return addresses for arguments, e.g. CALLER, CALL, CALLCODE, DELEGATECALL, COINBASE, etc. This problem is hard to work around, and would require a fork of the EVM to modify existing opcodes for semantic awareness of addresses (although this is really hard to get right), or to introduce a Filecoin-specific opcode family to deal Filecoin addresses (e.g. FCALL, FCALLCODE, etc.) The latter would break as-is deployability of existing smart contracts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: CALLER and COINBASE (and likely others) won't have this issue. All runtime APIs (in the current VM) return ID addresses, but accept (and resolve) other address types.


On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation.

However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this solution seems brittle.

Copy link
Member Author

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from our sync discussion.

  • We dissected various solutions, including EVM <> FVM mapping: f4 address class to solve reorg-instability of ID addresses #40 and segregating the Ethereum address space entirely via a new address class.
  • The solution we converged on a priori is to turn the f2 class into a unified stable address space all actors, including account actors who are also identified by their f1/f3 pubkey addresses.
  • That is, account actors also get an f2 address.
  • Length: f2 addresses have a fixed 20-byte payloads, so they could be used as-is inside the EVM dropping the prefix.
  • Calculation of f2 addresses:
    • For f1 account actors (secp256k1): identical payload (already a blake2b-160 hash of pubkey).
    • For f3 account actors (bls): blake2b-160 hash of pubkey.
    • For FVM native actors: current algorithm is preserved (hash of sender, nonce, number of actors created in the message)
    • For EVM foreign actors: preserves Ethereum semantics for CREATE and CREATE2.
  • Colission probability: 2^80, same as Ethereum network. Link: informal proposal to increase space to 256 bits

To support "prospective actor interactions" (or what Ethereum calls "counterfactural deployments"), we need:

  • The ability to send value to f2 actors that don't exist yet.
  • The concept of "undistinguished actors", that graduate out of that status into an actual typed actor by one of two events:
    • A transaction being sent from an f1 or f3 address that hashes to an existing f2 undistinguished actor => actor typed as an account actor, and pubkey address linked.
    • An existing actor creating a typed actor, with an explicit on-chain message to the InitActor, or within code.
  • The code of the "undistinguished actor" could handle the conversion.

@raulk
Copy link
Member Author

raulk commented Oct 28, 2021

The InitActor will need to be extended with f2 indices.

@raulk
Copy link
Member Author

raulk commented Oct 28, 2021

Proposed address taxonomy

Class Desc Actor type Payload width Total width Payload value Usage Stable?
0 ID address All 1-9 bytes 2-10 bytes uvarint64 counter Internal, compact representation in state tree; unsafe to use externally until final N
1 secp256k1 pubkey (account actor) Account 20 bytes 21 bytes blake2b-160 hash of secp256k1 pubkey Externally, to refer to an account actor with its pubkey Y
2 universal actor address All 20 bytes 21 bytes protocol-derived from relevant cryptographic material Externally and internally to refer to any actor Y
3 bls pubkey (account actor) Account 48 bytes 49 bytes inlined bls public key Externally, to refer to an account actor with its pubkey Y

@raulk
Copy link
Member Author

raulk commented Oct 28, 2021

Given that we're doing a address taxonomy revision, @Stebalien and I agreed to expand the universal actor address space to 32 bytes. With a high probability, the hash function will remain BLAKE2, with a 256-bit digest size.

The proposal consists of adding a class 4 that serves as the 256-bit canonical universal actor address. Class 2 will serve as an alias/symlink to the corresponding canonical f4 address.

Considerations:

  • The EVM VM would use prefixless f2 addresses. The FVM will translate those to f4 addresses for free, under the hood.
  • The state tree would track f2 and f4 address for all actors, maintaining the relevant indices to ID addresses (which is how ultimately actors are keyed in the state tree).
  • Migration:
    • For BLS account actors, the input material to derive f4 addresses is inlined (the BLS public key), so backfilling f4 addresses is trivial.
    • The input material is not available in the state tree for secp256k1 account actors (pubkey) and non-account actors (creator, nonce, number of interim actors created), so we need to resort to coercion by left or right padding the f2 address to form a synthetic f4 address.
      • This is an acceptable compromise, because with an 2^256 address space, collision resistance should be quantum-safe.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main things we need to discuss are:

  1. Code CIDs (want to make sure we're on the same page).
  2. Address computation.
  3. Gas.

But I'm happy to "discuss" by submitting a followup PR.

}
```

Notice that EVM foreign actors are typed in the state tree with a CodeCID that does not correspond to their EVM bytecode. Instead, the CodeCID points to the WASM bytecode of the **EVM foreign runtime**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be the CID of the EVM itself. We currently use strings, but we only do that because the code doesn't live on-chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I meant to say but it was a long-winded way of doing so. I'll adjust the text.


### Mechanics and interfaces

EVM smart contracts (also known as EVM foreign actors in Filecoin terminology), are represented this way in the Filecoin state tree:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explicitly state that it's just an actor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e., that the following object is a normal actor object (or just leave off the explicit actor struct).

Comment on lines +152 to +153
3. For FVM native actors, the preimage is `sender || nonce || # of actors created during message execution`.
4. For EVM foreign actors, the preimage is inherited from CREATE and CREATE2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this would make the EVM "special". I'd prefer not to have "special" address construction methods for every foreign runtime.

I'd rather use the same algorithm everywhere (EVM and FVM): sender || target code cid || salt || init params. This isn't identical to how the EVM currently works, but provides the same guarantees.

We'd need to do a bit of research on how predictable addresses are used, but I believe they're mostly used by external tooling for some payment channel use-cases. If that's the case, it should be pretty trivial to drop-in a replacement function as long as that replacement function has the same inputs.


Class 2 and 4 addresses are protocol-derived, by hashing the relevant cryptographic or input material.

With a high probability, the hash function of class 4 will remain BLAKE2, with a 256-bit digest size. Class 2 will continue relying on blake2b-160.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... we could save some space if we define class 2 as a truncation of class 4. Instead of mapping both class-2 and class-4 addresses to ID addresses, we'd map class 2 addresses to (ID, class4-suffix) tuples.

We'd want to determine how much space this map is really taking up on chain before making any decisions (this optimization may not be worth it).


## Gas accounting and execution halt semantics

The execution halt is determined by Filecoin gas and not by EVM gas. Therefore, EVM runtime is made to run with unlimited gas. The FVM is responsible for metering execution and halting it when gas limits are exceeded. Refer to the [Gas accounting](01-architecture.md#gas-accounting) section of the Architecture doc for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite what we discussed. Top-level messages will be run with infinite (well, max int) EVM gas. But an EVM contract may call other contracts with some limited amount of gas and that gas limit should be respected.

Basically:

  1. We obey both gas models at the same time.
  2. The EVM gas model starts with infinite gas.

I believe this may be important for contracts that need to be able to invoke untrusted contracts with some limited amount of gas.


Aside: I'm wondering if we can optimize EVM execution by being a bit looser about gas usage. If infinite gas is available, we could go down a codepath that doesn't bother tracking gas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could let the message sender specify the amount of EVM gas. Basically, when doing gas estimation, the caller would try the message with infinite EVM gas. Then, the caller would replace the "infinite" EVM gas with the used EVM gas (times some reasonable overestimation multiplier).

Given a high enough overestimation, this reduces the chances of spuriously running out of EVM gas. However, having a somewhat realistic gas value lets the contract make decisions based on the amount of gas left.

As long as we allow the message sender to specify the EVM gas limit, they can pick their model based on their use-case. If, e.g., they have a contract that really cares about EVM gas, they can pass in a gas estimation. Otherwise, they can pass in "infinite" gas.


## Blockchain timing

Ethereum target block times are ~10 seconds, whereas Filecoin's is ~30 seconds. A priori, this difference has no impact on the protocol or this spec, but it may impact the behaviour of smart contracts ported over from Ethereum that expect 10-second block timing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we could consider translating epochs, but we probably shouldn't.

* `GASLIMIT`: returns the gas limit as per Filecoin gas system.
* `CHAINID`: returns a fixed value `0`.
* `GAS`: returns the gas remaining as per Filecoin gas system. One divergence from Ethereum is the return value does not include the full cost of this operation (because the cost of stack copy and program advance is not known when the value is captured).
* `COINBASE`: returns the Filecoin class 4 address of the block producer including this message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class 2? Class 4 won't work until we get 32 byte addresses.

Comment on lines +198 to +200
* `GASLIMIT`: returns the gas limit as per Filecoin gas system.
* `CHAINID`: returns a fixed value `0`.
* `GAS`: returns the gas remaining as per Filecoin gas system. One divergence from Ethereum is the return value does not include the full cost of this operation (because the cost of stack copy and program advance is not known when the value is captured).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the FIlecoin gas here diverges from what near does and may cause more problems than it solves because the gas models won't match up.

@raulk
Copy link
Member Author

raulk commented Jan 7, 2022

I'm going to merge this and will iterate on master.

@raulk raulk merged commit c0a9f0b into main Jan 7, 2022
@raulk raulk deleted the raulk/evm-mapping branch January 7, 2022 14:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants