diff --git a/crates/utils/src/constants.cairo b/crates/utils/src/constants.cairo index fab2e648f..6b4a02ce1 100644 --- a/crates/utils/src/constants.cairo +++ b/crates/utils/src/constants.cairo @@ -1,4 +1,4 @@ -// FELT PRIME +// FELT PRIME // 2^251 + 17 * 2^192 + 1 const FELT252_PRIME: u256 = 0x800000000000011000000000000000000000000000000000000000000000001; diff --git a/docs/general/account_state.png b/docs/general/account_state.png new file mode 100644 index 000000000..815461f45 Binary files /dev/null and b/docs/general/account_state.png differ diff --git a/docs/general/contract_bytecode.md b/docs/general/contract_bytecode.md new file mode 100644 index 000000000..bf073113e --- /dev/null +++ b/docs/general/contract_bytecode.md @@ -0,0 +1,158 @@ +# Storing Contract Bytecode + +The bytecode is the compiled version of your contract, and it is what the +Kakarot EVM will execute when you call the contract. As Kakarot is developped on +top of Starknet, you cannot really "deploy" an EVM contract on Kakarot: what +actually happens is that the EVM bytecode of your contract is stored on the +blockchain, and the Kakarot EVM will be able to load it when you want to execute +it. + +There are several different ways to store the bytecode of a contract, and this +document will provide a quick overview of the different options, to choose the +most optimized one for our use case. The three main ways of handling contract +bytecode are: + +- Storing the bytecode inside a storage variable, using Ethereum as an L1 data + availability layer. +- Storing the bytecode inside a storage variable, using another data + availability layer. +- Storing the bytecode directly in the contract code, not being a part of the + contract's storage. + +These three solutions all have their pros and cons, and we will go over them in +the following sections. + +## Foreword: Data availability + +In Validity Rollups, verifying the validity proof on L1 is sufficient to +guarantee the validity of a transaction execution on L2, with no need to have +the detailed transaction information sent to Ethereum. + +However, in order to allow the independent verification of the L2 chain's state +and prevent malicious operators from censoring or freezing the chain, some +amount of data is still required to be posted on a Data Availability (DA) layer +to make the Starknet state available, even in the case where the operator +suddenly ceases operations. Data availability refers to the fact that a user can +always reconstruct the state of the rollup by deriving its current state from +the data posted by the rollup operator. + +Without this, users would not be able to query an L2 contract's state in case +the operator becomes unavailable. It provides users the security of knowing that +if the Starknet sequencer ever stops functioning, they can prove custody of +their funds using the data posted on the DA Layer. If that DA Layer is Ethereum +itself, then they inherit from Ethereum's security guarantees. + +## Using Ethereum as a DA Layer + +Starknet currently uses Ethereum as its DA Layer. Each state update verified +on-chain is accompanied by the state diff between the previous and new state, +sent as calldata to Ethereum, allowing anyone that observes Ethereum to +reconstruct the current state of Starknet. This security comes with a +significant price, as the publication of state diffs on Ethereum accounted for +[over 93% of the transaction fees paid on Starknet](https://community.starknet.io/t/volition-hybrid-data-availability-solution/97387). + +The first choice when it comes to storing contract bytecode is to store it as a +regular storage variable, whose state diff is posted on Ethereum acting as the +DA Layer. Following the design choices made in +[Contract Storage](./contract_storage.md), deploying a new contract on Kakarot +would not result in the deployment of a contract on Starknet, but rather in the +storage of the contract bytecode in a storage variable of the KakarotCore +contract. + +In this situation the following data would reach L1: + +- The KakarotCore contract address +- The number of updated keys in that contract +- The keys to update +- The new values for these keys + +On Starknet, the associated storage update fee for a transaction updating $n$ +unique contracts and $m$ unique keys is: + +$$ gas\ price \cdot c_w \cdot (2n + 2m) $$ + +where $c_w$ is the calldata cost (in gas) per 32-byte word. + +In our case, we would update one single contract (KakarotCore), and update $m$ +keys, where $m = B / 16$ with $B$ the size of the bytecode to store. + + + +Considering a gas price of 34 gwei (average gas price in 2023, according to +[Etherscan](https://etherscan.io/chart/gasprice)), and a calldata cost of 16 per +byte and the size of a typical ERC20 contract being 2174 bytes, we would have +have $m = 136$. The associated storage update fee would be: + +$$ fee = 34 \cdot (16 \cdot 32) \cdot (2 + 272) = 4,769,792 \text{ gwei}$$ + +## Using Starknet's volition mechanism + +Volition is a hybrid data availability solution, providing the ability to choose +the data availability layer used for contracts data. It allows users to choose +between using Ethereum as a DA Layer, or using Starknet itself as a DA Layer. +The security of state transitions, verified by STARK proofs on L1, remains the +same in both L2 and L1 data availability modes - the difference lies in the data +availability guarantees. When a state transition is verified on L1, we are +ensured that the state update is correct - however, we don't know on L1 what the +actual state of the L2 is. By posting state diffs on L1, we can reconstruct the +current state of Starknet from the ground up, but this comes has a significant +cost. + +![Volition](volition.png) + +Volition will allow developers to choose whether data will be stored in the L1DA +or L2DA mode, making it possible to store data on L2, which is a lot less +expensive than storing it on L1. Depending on the data stored, it can be +interesting if the cost associated to storing the data on L1 is higher than the +intrinsic value of the data itself. For examples, an Volition-ERC20 token +standard would have two different balances stored, one on L1DA for maxmial +security (e.g. you would keep most of your assets in this balance), and one on +L2DA for lower security, which would be used to reduce the fees associated to +small transactions. + +In our case, we would store the contract bytecode in a storage variable that is +settled on the L2DA instead of the L1DA. This would make contract deployment +extremely cheap on Kakarot, as we will save the cost of posting the state diff +associated to the update of our stored bytecode on Ethereum. + +### Associated Risks + +There are some risks that must be considered when using Volition. Consider the +case of an attack by a majority of malicious sequencers colluding who decide to +not share a change in the L2DA with other sequencers and full nodes. Once the +attack is finished, the honest sequencers won't have the data needed to +reconstruct and compute the new root of the L2DA. In a such situation, not only +the L2DA is not accessible anymore, but any execution relying on L2DA will not +be executable and provable anymore, as sequencers won't have access the the L2DA +state. + +Even though this event is unlikely to happen, it remains a possibility that must +be taken into account as L2DA is less secure than L1DA. If an event like this +were ever to happen, then the stored bytecode would be lost, and the deployed +contract would not be executable anymore. + +> Note: While we could potentially use Volition to store the bytecode on L2DA in +> the future, this is not possible at the moment, as Volition is not yet +> implemented on Starknet. + +## Storing the EVM bytecode in the Cairo contract code + +The last option is to store the EVM bytecode directly in the Cairo contract +code. This has the advantage of also being cheap, as this data is not posted on +L1. + +On Starknet, there is a distinction between classes which is the definition of a +contract containing the Cairo bytecode, and contracts which are instances of +classes. When you declare a contract on Starknet, its information is added to +the +[Classes Tree](https://docs.starknet.io/documentation/architecture_and_concepts/Network_Architecture/starknet-state/#classes_tree), +which encodes informations about the existing classes in the state of Starknet +by mapping class hashes to their. This class tree is itself a part of the +Starknet State Commitment, which is verified on Ethereum during state updates. + +Implementing this solution would require us to declare a new class everytime a +contract is deployed using Kakarot. This new class would contain the EVM +bytecode of the contract, exposed inside a view function that would return the +entire bytecode when queried. To achieve that, we would need to have the RPC +craft a custom Starknet contract that would contain this EVM bytecode, and +declare it on Starknet - which is not ideal from security perspectives. diff --git a/docs/general/contract_storage.md b/docs/general/contract_storage.md new file mode 100644 index 000000000..748a544c5 --- /dev/null +++ b/docs/general/contract_storage.md @@ -0,0 +1,243 @@ +# Kakarot Storage + +## Storage in Ethereum + +The top-level data structure that holds information about the state of the +Ethereum blockchain is called the _world state_, and is a mapping of Ethereum +addresses (160-bit values) to accounts. Each Ethereum address represents an +account composed by a _nonce_, an _ether balance_, a _storage_, and a _code_. We +make the distinction between EOA (Externally Owned Accounts) that have no code +and an empty storage, and contracts that can have code and storage. + +![Account state](account_state.png) + +_Account state associated to an Ethereum address. Source: +[EVM Illustrated](https://takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf)_ + +In traditional EVM clients, like Geth, the _world state_ is stored as a _trie_, +and informations about account are stored in the world state trie and can be +retrieved through queries. Each account in the world state trie is associated +with an account storage trie, which stores all of the information related to the +account. When Geth updates the storage of a contract by executing the SSTORE +opcodes, it does the following: + +- It updates the `value` associated to a `key` of the storage of a contract + deployed at a specific `address`. However, it updates a `dirtyStorage`, which + refers to storage entries that have been modified in the current transaction + execution. +- It tracks the storage modifications in a `journal` so that it can be reverted + in case of a revert opcode or an exception in the transaction execution. +- At the end of the execution of a transaction, all dirty storage slots are + copied across to `pendingStorage`, which in turn is copied across to + `originStorage` when the trie is finally updated. This effectively updates the + storage root of the account state. + +The behavior for the SLOAD opcode is very complementary to the SSTORE opcode. +When Geth executes the SLOAD opcode, it does the following: + +- It starts by doing a check on `dirtyStorage` to see if it contains a value for + the queried key, and returns it if so. +- Otherwise, it retrieves the value from the committed account storage trie. + +Since one transaction can access a storage slot multiple times, we must ensure +that the result returned is the most recent value. This is why Geth first checks +`dirtyStorage`, which is the most up-to-date state of the storage. + +```mermaid +flowchart TD; + A[Start: Run Bytecode] -->|SSTORE| B[Update value in dirtyStorage] + B --> C[Track modifications in journal] + C --> D[End of current execution] + D -->|Execution reverted| M[Clear dirtyStorage from entries in journal] + D -->|Execution successful| E[ ] + A -->|SLOAD| H[Check dirtyStorage for queried key] + H -->|Key found| I[Return value from dirtyStorage] + H -->|Key not found| J[Retrieve value from committed account storage trie] + J --> K[Return retrieved value] + style A fill:#DB5729,stroke:#333,stroke-width:2px; + style B fill:#296FDB,stroke:#333,stroke-width:2px; + style C fill:#296FDB,stroke:#333,stroke-width:2px; + style D fill:#296FDB,stroke:#333,stroke-width:2px; + style E fill:#296FDB,stroke:#333,stroke-width:2px; + style H fill:#136727,stroke:#333,stroke-width:2px; + style I fill:#136727,stroke:#333,stroke-width:2px; + style J fill:#136727,stroke:#333,stroke-width:2px; + style K fill:#136727,stroke:#333,stroke-width:2px; + style M fill:#DB2929,stroke:#333,stroke-width:2px; +``` + +_Simplified process representation of SSTORE and SLOAD Opcodes in the Geth EVM +Client_ + +## Storage in Kakarot + +As Kakarot is a contract that is deployed on Starknet and is not a client that +can directly manipulate a storage database, our approach differs from one of a +traditional client. We do not have a world state trie, and we do not have a +storage trie. Instead, we have access to Kakarot's contract storage on the +Starknet blockchain, that we can query using syscalls to read and update the +value of a of a storage slot. + +There are two different ways of handling Storage in Kakarot. + +### One storage space per Kakarot Contract + +The first approach is to have one storage space per Kakarot contract. This means +that for every contract that is deployed on Kakarot, we will deploy an +underlying Starknet contract, which has its own state which can only be queried +by itself. + +The current contract storage design in Kakarot Zero is organized as such: + +- The two different kinds of EVM accounts - Externally Owned Accounts (EOA) and + Contract Accounts (CA) - are both represented by Starknet smart contracts. + Each account is mapped to a unique Starknet contract. Each contract has its + own storage. +- Each contract is deployed by Kakarot, and contains its own bytecode in storage + in the case of a smart contract (no bytecode for an EOA). +- Each contract account has external functions that can be called by Kakarot to + read the bytecode it stores and to read / write to its storage. This makes + Kakarot an effective "admin" to all contracts with rights to modify their + storage. +- SLOAD/SSTORE opcodes are used to read/write to storage and perform a + `contract_call_syscall` to modify the storage of the remote contract. + +However, this design has some limitations: + +- We perform a `call_contract_syscall` for each SLOAD/SSTORE, which is + expensive. Given that only KakarotCore can modify the storage of a Kakarot + contract, we could directly store the whole world state in the main Kakarot + contract storage. +- It adds external entrypoints with admin rights to read and write from storage + in each Kakarot contract. This is not ideal from a security perspective. +- It moves away from the traditional EVM design, in which execution clients + store account states in a common database backend. + +Therefore, we will not use this design in SSJ. We will instead use the second +design presented thereafter. + +### A shared storage space for all Kakarot Contracts + +The second approach is to have a unified storage space for all contract accounts in the main Kakarot smart contract. +While Kakarot is not a traditional Ethereum Client, we can still use a design +that is similar. Traditional clients hold a state database in which the account +states are stored. We can do the same, but instead of storing the account states +in a database, we store them in the KakarotCore contract storage. Therefore, we +do not need to deploy a Starknet contract for each Kakarot account contract, +which saves users costs related to deploying contracts. + +A contract’s storage on Starknet is a persistent storage space where you can +read, write, modify, and persist data. The storage is a map with $2^{251}$ +slots, where each slot is a felt which is initialized to 0. + +This new model doesn't expose read and write methods on Kakarot contracts. +Instead of having $n$ contracts with `write_storage` and `read_storage` +entrypoints, the only way to update the storage of a Kakarot contract is now +through executing SLOAD / SSTORE internally to KakarotCore. + +```mermaid +sequenceDiagram + participant C as Caller + participant K as KakarotCore + participant M as Machine + participant J as Journal + participant S as ContractState + + C->>K: Executes Kakarot contract + K->>M: Executes Opcode (Either SSTORE or SLOAD) + + Note over K,M: If it's an SSTORE operation, it writes to Storage. + Note over K,M: If it's an SLOAD operation, it reads from Storage. + + alt SSTORE + M-->>M: key = hash(evm_address, storage_slot) + M->>J: journal.insert(key,value) + else SLOAD + M-->>M: key = hash(evm_address, storage_slot) + M->>J: journal.get(key) + J -->> M: Nullable + alt Journal returns value + + else Journal returns nothing + M->>S: storage_read(key) + S-->>M: value + end + end + Note over K,M: Committing journal entries to storage + K->>M: Commit + M->>J: Get all journal entries + J -->>M: entries + loop for each journal entry + M->>S: storage_write(key,value) + end + + Note over S: Storage is now updated with the final state of all changes made during the transaction. +``` + +### Eventual security risks + +According to +[an engineer from ElectricCapital](https://twitter.com/n4motto/status/1554853912074522624?s=20), +44M contracts have been deployed on Ethereum so far. If we assume that Kakarot +could reach the same number of contracts, that would leave us with a total of +$2^{251} / 44\cdot10^6 \approx 2^{225}$ slots per contract. Even with a +hypothetical number of 100 billion contracts, we would still have around +$2^{214}$ storage slots available per contract. + +Considering the birthday paradox, the probability of a collision occurring, +given $2^{214}$ randomly chosen slots, is roughly $1/2^{107}$. This is a very +low probability, which is considered secure by today's standards. We can +therefore consider that the collision risk is negligible and that this storage +layout doesn't introduce any security risk to Kakarot. For reference, Ethereum +has 80 bits of security on its account addresses, which are 160 bits long. + +### Tracking and reverting storage changes + +This design allows reverting storage changes by using a concept similar to +Geth's journal. Each storage change will be stored in a `Journal` implemented +using a `Felt252Dict` data structure, that will associate each modified storage +address to its new value. This allows us to perform three things: + +- When executing a transaction, instead of using one `storage_write_syscall` per + SSTORE opcode, we can simply store the storage changes in this journal. At the + end of the transaction, we can finalize all the storage writes together and + perform only one `storage_write_syscall` per modified storage address. +- When reading from storage, we can first read from the journal to see if the + storage slot has been modified. If it's the case, we can read the new value + from the journal instead of performing a `storage_read_syscall`. +- If the transaction reverts, we won't need to revert the storage changes + manually. Instead, we can simply not finalize the storage changes present in + the journal, which can save a lot of gas. + +### Implementation + +The SSTORE and SLOAD opcodes are implemented to first read and write to the +`Journal` instead of directly writing to the KakarotCore contract's storage. + +Using the `storage_read_syscall` and `storage_write_syscall` syscalls, we can +arbitrarily read and write to a contract's storage. Therefore, we will be able +to simply implement the SSTORE and SLOAD opcodes as follows: + +```rust + // SSTORE + let storage_address = poseidon_hash(evm_address, storage_slot); + self.journal.insert(storage_address, NullableTrait::new(value)); +``` + +```rust + // SLOAD + let storage_address = poseidon_hash(evm_address, storage_slot); + let value = match_nullable(self.journal.get(storage_address)) { + FromNullableResult::Null => storage_read_syscall(storage_address), + FromNullableResult::NotNull(value) => value.unbox(), + } +``` + +```rust + // Finalizing storage updates + for keys in journal_keys{ + storage_write_syscall(key, journal.get(key)); + } +``` + +> Note: these code snippets are in pseudocode, not valid Cairo code. diff --git a/docs/general/volition.png b/docs/general/volition.png new file mode 100644 index 000000000..6b530141a Binary files /dev/null and b/docs/general/volition.png differ