-
Notifications
You must be signed in to change notification settings - Fork 32
Conversation
11a1962
to
2d59250
Compare
|
||
On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation. | ||
|
||
However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this applies more generally. If I pass a handle 0x0000..01
to a smart contract as an input parameter, and it decides to persist it immediately, I would have no opportunity to resolve that handle into the actual address. This solution doesn't work at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this solution seems brittle.
## Address semantics | ||
## Addressing scheme | ||
|
||
Ethereum uses 160-bit (20-byte) addresses. Addresses are the keccak-256 hash of the public key of an account, truncated to preserve the 20 rightmost bytes. Solidity and the [Contract ABI spec](https://docs.soliditylang.org/en/v0.5.3/abi-spec.html) represent addresses with the `address` type, equivalent to `uint160`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This covers account addresses. But see https://ethereum.stackexchange.com/questions/760/how-is-the-address-of-an-ethereum-contract-computed for contract addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, for contract addresses, it's the RLP-encoding of sender + nonce. I'll add that for reference.
|
||
1. EVM smart contracts can't send to inexisting, stable account addresses, and rely on account actor auto-creation, as those addressess can't be used with EVM opcodes (see problem 1). Potential solution: have the caller create the account on chain prior to invoking the EVM smart contract. | ||
2. ID addresses are vulnerable to reorg within the current finality window, so submitting EVM transactions involving actors created recently (900 epochs; 7.5 hours) would be unsafe. Potential solution: have the runtime detect and fail calls involving recently-created actors. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it would be possible to assign every actor a stable address (including account actors) and use that everywhere. That would mean addresses would be 20 bytes and unambiguous.
However, we'd have to make a few changes:
- Account actors currently don't have stable addresses (just key-based addresses).
- There's currently no reverse map from ID addresses to stable addresses. We'd likely need to add a "stable address" field to every actor.
But this should be doable.
But there's a whole other can of worms...
- In the EVM, it's possible to send funds to any address.
- If that address turns out to be the hash of a public key, it's possible to then use that address to send messages (an account).
- There's a CREATE2 instruction that allows an actor to create another actor in some "owned" address space (effectively using an actor specific KDF with an actor controlled salt).
- If that address turns out to be part of the address space "owned" by another actor, code can later be deployed to this address (and it gets to keep the existing funds).
Basically:
- We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.
- We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.
- We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).
All of this is leading me to believe that we're going to need a bit of an indirection layer. Possibly a registry mapping "EVM" addresses to the rest of the FVM address space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all a bit confusing so I'll try to explain more in the standup. Unfortunately, documentation is scattered and almost universally of the "here's how to take your first steps in Ethereum" form not the "this is how this thing actually works" form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Account actors currently don't have stable addresses (just key-based addresses).
Aren't pubkey addresses stable addresses? What's the nuance here?
Related: account actors are also bound to an ID at creation, so every actor is guaranteed to have an ID address, which is volatile during the current finality window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
CREATE
calculates the address by RLP encoding a struct containing the address of the sender (an Externally Owned Account, EAO) and their nonce. Such addresses can be trivially computed ahead of time.CREATE2
hashes the RLP encoding of[0xff, sender_addr, user_provided_salt, bytecode])
. These can be precomputed by knowing the inputs ahead of time, and it's the basis for "counterfactual deployments" -- use cases in which we interact with the contract ahead of time.- Sending ETH to an address doesn't turn it into an EOA; it can still be the target of code deployment. This property is also the basis for some counterfactual interactions.
- It is not possible to conduct an appropriation attack by exploiting the knowledge of a future contract address ahead of time.
- First, you'd need to defeat has collision resistance to find a private key whose Eth address matches the target contract address (computational expense is estimated to be 2**80 hashes, as per various sources, including this).
- Second, as of EIP-684, the protocol aborts CREATE or CREATE2 instructions that generate an address with a non-zero nonce.
- In conclusion, even if you found a colliding key, you can do one of two things: (a) not use it, in which case when the contract account is created, you'd be locked out of that address because non-EAO addresses can't perform transactions (I think); or (b) use it, in which case it would be marked as an EO, but its nonce would be non-zero, so CREATE/CREATE2 would abort.
Relevant references (in addition to the yellow paper).
- EIP-1014: Skinny CREATE2. Clarifications section elaborates on collisions.
- EIP-684: Prevent overwriting contracts
- EIP-161: State trie clearing (invariant-preserving alternative)
- About counterfactual contract deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.
If the pubkey address exists ahead of time, the contract can use a reorg-stable ID address (I'll post a proposal shortly).
If the address doesn't exist ahead of time, this becomes harder because the CALL opcode consumes a single word for the recipient address (and probably truncates it to 160 bits), yet our pubkey addresses can span up to 2 Ethereum words.
- We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.
Yes, 100% agreed.
- We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).
Yes, but this should be straightforward IMO; we'd generate an f2 address using the user-provided inputs to assemble the preimage passed to address.NewActorAddress(preimage)
. The output can be a reorg-stable ID address (which I'm defining in a subsequent PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't pubkey addresses stable addresses? What's the nuance here?
Sorry, f2 address. You're right, "stable" just means "not f0".
1. The worst case scenario is larger than the width of the Ethereum address type. Even if BLS addresses were prohibited in combination with EVM actors, class 1 and class 2 still miss the limit by 1 byte (due to the prefix). | ||
2. It exceeds the EVM's 256 bit architecture. | ||
|
||
Problem 1 renders Solidity smart contracts instantly incompatible with the Filecoin addressing scheme, as well as EVM opcodes that take or return addresses for arguments, e.g. CALLER, CALL, CALLCODE, DELEGATECALL, COINBASE, etc. This problem is hard to work around, and would require a fork of the EVM to modify existing opcodes for semantic awareness of addresses (although this is really hard to get right), or to introduce a Filecoin-specific opcode family to deal Filecoin addresses (e.g. FCALL, FCALLCODE, etc.) The latter would break as-is deployability of existing smart contracts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: CALLER and COINBASE (and likely others) won't have this issue. All runtime APIs (in the current VM) return ID addresses, but accept (and resolve) other address types.
|
||
On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation. | ||
|
||
However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this solution seems brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from our sync discussion.
- We dissected various solutions, including EVM <> FVM mapping: f4 address class to solve reorg-instability of ID addresses #40 and segregating the Ethereum address space entirely via a new address class.
- The solution we converged on a priori is to turn the
f2
class into a unified stable address space all actors, including account actors who are also identified by their f1/f3 pubkey addresses. - That is, account actors also get an
f2
address. - Length:
f2
addresses have a fixed 20-byte payloads, so they could be used as-is inside the EVM dropping the prefix. - Calculation of
f2
addresses:- For f1 account actors (secp256k1): identical payload (already a blake2b-160 hash of pubkey).
- For f3 account actors (bls): blake2b-160 hash of pubkey.
- For FVM native actors: current algorithm is preserved (hash of sender, nonce, number of actors created in the message)
- For EVM foreign actors: preserves Ethereum semantics for CREATE and CREATE2.
- Colission probability: 2^80, same as Ethereum network. Link: informal proposal to increase space to 256 bits
To support "prospective actor interactions" (or what Ethereum calls "counterfactural deployments"), we need:
- The ability to send value to f2 actors that don't exist yet.
- The concept of "undistinguished actors", that graduate out of that status into an actual typed actor by one of two events:
- A transaction being sent from an f1 or f3 address that hashes to an existing f2 undistinguished actor => actor typed as an account actor, and pubkey address linked.
- An existing actor creating a typed actor, with an explicit on-chain message to the InitActor, or within code.
- The code of the "undistinguished actor" could handle the conversion.
The InitActor will need to be extended with f2 indices. |
Proposed address taxonomy
|
Given that we're doing a address taxonomy revision, @Stebalien and I agreed to expand the universal actor address space to 32 bytes. With a high probability, the hash function will remain BLAKE2, with a 256-bit digest size. The proposal consists of adding a class Considerations:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main things we need to discuss are:
- Code CIDs (want to make sure we're on the same page).
- Address computation.
- Gas.
But I'm happy to "discuss" by submitting a followup PR.
} | ||
``` | ||
|
||
Notice that EVM foreign actors are typed in the state tree with a CodeCID that does not correspond to their EVM bytecode. Instead, the CodeCID points to the WASM bytecode of the **EVM foreign runtime**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be the CID of the EVM itself. We currently use strings, but we only do that because the code doesn't live on-chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what I meant to say but it was a long-winded way of doing so. I'll adjust the text.
|
||
### Mechanics and interfaces | ||
|
||
EVM smart contracts (also known as EVM foreign actors in Filecoin terminology), are represented this way in the Filecoin state tree: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's explicitly state that it's just an actor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e., that the following object is a normal actor object (or just leave off the explicit actor struct).
3. For FVM native actors, the preimage is `sender || nonce || # of actors created during message execution`. | ||
4. For EVM foreign actors, the preimage is inherited from CREATE and CREATE2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this would make the EVM "special". I'd prefer not to have "special" address construction methods for every foreign runtime.
I'd rather use the same algorithm everywhere (EVM and FVM): sender || target code cid || salt || init params
. This isn't identical to how the EVM currently works, but provides the same guarantees.
We'd need to do a bit of research on how predictable addresses are used, but I believe they're mostly used by external tooling for some payment channel use-cases. If that's the case, it should be pretty trivial to drop-in a replacement function as long as that replacement function has the same inputs.
|
||
Class 2 and 4 addresses are protocol-derived, by hashing the relevant cryptographic or input material. | ||
|
||
With a high probability, the hash function of class 4 will remain BLAKE2, with a 256-bit digest size. Class 2 will continue relying on blake2b-160. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... we could save some space if we define class 2 as a truncation of class 4. Instead of mapping both class-2 and class-4 addresses to ID addresses, we'd map class 2 addresses to (ID, class4-suffix) tuples.
We'd want to determine how much space this map is really taking up on chain before making any decisions (this optimization may not be worth it).
|
||
## Gas accounting and execution halt semantics | ||
|
||
The execution halt is determined by Filecoin gas and not by EVM gas. Therefore, EVM runtime is made to run with unlimited gas. The FVM is responsible for metering execution and halting it when gas limits are exceeded. Refer to the [Gas accounting](01-architecture.md#gas-accounting) section of the Architecture doc for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite what we discussed. Top-level messages will be run with infinite (well, max int) EVM gas. But an EVM contract may call other contracts with some limited amount of gas and that gas limit should be respected.
Basically:
- We obey both gas models at the same time.
- The EVM gas model starts with infinite gas.
I believe this may be important for contracts that need to be able to invoke untrusted contracts with some limited amount of gas.
Aside: I'm wondering if we can optimize EVM execution by being a bit looser about gas usage. If infinite gas is available, we could go down a codepath that doesn't bother tracking gas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could let the message sender specify the amount of EVM gas. Basically, when doing gas estimation, the caller would try the message with infinite EVM gas. Then, the caller would replace the "infinite" EVM gas with the used EVM gas (times some reasonable overestimation multiplier).
Given a high enough overestimation, this reduces the chances of spuriously running out of EVM gas. However, having a somewhat realistic gas value lets the contract make decisions based on the amount of gas left.
As long as we allow the message sender to specify the EVM gas limit, they can pick their model based on their use-case. If, e.g., they have a contract that really cares about EVM gas, they can pass in a gas estimation. Otherwise, they can pass in "infinite" gas.
|
||
## Blockchain timing | ||
|
||
Ethereum target block times are ~10 seconds, whereas Filecoin's is ~30 seconds. A priori, this difference has no impact on the protocol or this spec, but it may impact the behaviour of smart contracts ported over from Ethereum that expect 10-second block timing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we could consider translating epochs, but we probably shouldn't.
* `GASLIMIT`: returns the gas limit as per Filecoin gas system. | ||
* `CHAINID`: returns a fixed value `0`. | ||
* `GAS`: returns the gas remaining as per Filecoin gas system. One divergence from Ethereum is the return value does not include the full cost of this operation (because the cost of stack copy and program advance is not known when the value is captured). | ||
* `COINBASE`: returns the Filecoin class 4 address of the block producer including this message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class 2? Class 4 won't work until we get 32 byte addresses.
* `GASLIMIT`: returns the gas limit as per Filecoin gas system. | ||
* `CHAINID`: returns a fixed value `0`. | ||
* `GAS`: returns the gas remaining as per Filecoin gas system. One divergence from Ethereum is the return value does not include the full cost of this operation (because the cost of stack copy and program advance is not known when the value is captured). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the FIlecoin gas here diverges from what near does and may cause more problems than it solves because the gas models won't match up.
I'm going to merge this and will iterate on master. |
This document elaborates on the EVM <> FVM mapping by analyzing various aspects such as the addressing scheme, memory model, storage model, and more.