Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EIP draft for paged memory #3336

Merged
merged 6 commits into from
Mar 6, 2021
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions EIPS/eip-3336.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
eip: 3336
title: Paged memory allocation for the EVM
author: Nick Johnson (@arachnid)
discussions-to: https://ethereum-magicians.org/t/eips-3336-and-3337-improving-the-evms-memory-model/5482
status: Draft
type: Standards Track
category: Core
created: 2021-03-06
---

## Simple Summary
Changes the memory model for the EVM to use pagination, which permits more flexible use of memory by compilers without unduly complicating implementations.
Arachnid marked this conversation as resolved.
Show resolved Hide resolved

## Abstract
Presently, the EVM charges for memory as a linear array starting at address 0 and extending to the highest address that has been read from or written to. This suffices for simple uses, but means that compilers have to generate programs that use memory compactly, which leads to wasted gas with reallocation of memory elements, and makes some memory models such as separate heap and stack areas impractical. This EIP proposes changing to a page-based billing model, which adds minimal complexity to implementations, while providing for much more versatility in EVM programs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract should be a terse human readable version of the specification. Currently, this discusses history which is more appropriate for the Motivation section.

(normally I would try to provide an alternative abstract, but I don't think I grok the change well enough to do so... sorry!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Presently, the EVM charges for memory as a linear array starting at address 0 and extending to the highest address that has been read from or written to. This suffices for simple uses, but means that compilers have to generate programs that use memory compactly, which leads to wasted gas with reallocation of memory elements, and makes some memory models such as separate heap and stack areas impractical. This EIP proposes changing to a page-based billing model, which adds minimal complexity to implementations, while providing for much more versatility in EVM programs.
This EIP proposes changing EVM memory to a page-based allocation and gas billing model by dividing the memory address space into 1 kilobyte (32 word) pages, and charging gas on the basis of the number of pages allocated, rather than the address of the highest word referenced in memory. This facilitates more flexible use of memory than is currently possible with a linear model.

How about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EIPs don't propose, they assert. Also recommend removing the Rationale at the end and keeping focused on the specification.

Suggested change
Presently, the EVM charges for memory as a linear array starting at address 0 and extending to the highest address that has been read from or written to. This suffices for simple uses, but means that compilers have to generate programs that use memory compactly, which leads to wasted gas with reallocation of memory elements, and makes some memory models such as separate heap and stack areas impractical. This EIP proposes changing to a page-based billing model, which adds minimal complexity to implementations, while providing for much more versatility in EVM programs.
Makes EVM memory a page-based allocation system and gas billing model by dividing the memory address space into 1 kilobyte (32 word) pages, and charging gas on the basis of the number of pages allocated, rather than the address of the highest word referenced in memory.


## Motivation
Most modern computers implement "virtual memory" for userspace programs, where programs have access to a large address space, with pages of RAM that are allocated as needed by the OS. This allows them to distribute data throughout memory in ways that minimises the amount of reallocation and copying that needs to go on, and permits flexible use of memory for data with different lifetimes. Implementing a simple version of paged memory inside the EVM will provide the same flexibility to compilers targeting the EVM.

## Specification
### Parameters

| Constant | Value |
| - | - |
| `FORK_BLOCK` | TBD |
| `PAGE_BITS` | 10 |
| `PAGE_BASE_COST` | 96 |

For blocks where `block.number >= FORK_BLOCK`, the following changes apply.

### Changes to memory alocation in EVM implementations
Arachnid marked this conversation as resolved.
Show resolved Hide resolved
Memory is now allocated in pages of `2**PAGE_BITS` bytes each. The most significant `256 - PAGE_BITS` bits of each memory address denote the page number, while the least significant `PAGE_BITS` bits of the memory address denote the location in the page. Pages are initialized to contain all zero bytes and allocated when the first byte from a page is read or written.

EVM implementations are encouraged to store the pagetable as an associative array (eg, hashtable or dict) mapping from page number to an array of bytes for the page.

### Changes to memory expansion gas cost
Presently, the total cost to extend the memory to `a` words long is `Cmem(a) = 3 * a + floor(a ** 2 / 512)`. If the memory is already `b` words long, the incremental cost is `Cmem(a) - Cmem(b)`. `a` is the number of words required to cover the range from memory address 0 to the last word that has been read or written by the EVM.

Under this EIP, we define a new memory cost function, based on the number of allocated pages. This function is `Cmem'(p) = max(PAGE_BASE_COST * (p - 1) + floor(2 * (p - 1) ** 2), 0)`. As above, if the memory already contains `q` pages, the incremental cost is `Cmem'(p) - Cmem'(q)`.

Comment on lines +37 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mostly move to the Rationale section. The specification should just say what the gas cost is, not history or why it is what it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important for anyone trying to understand the new cost function to know what the old one is. Removing this would make the standard harder to understand, IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is very valuable information that should be present in the EIP. I just disagree that it is appropriate for the Specification section. The specification should focus purely on what one needs to implement in order to be compliant with this standard. Explaining what a previous iteration defined doesn't help them be compliant with this standard, it helps them be compliant with a previous standard (not the reader's goal).

### Changes to `MLOAD` and `MSTORE`
Loading a word from memory or storing a word to memory requires instantiating any pages that it touches that do not already exist, with the resulting gas cost as described above. If the word being loaded or stored resides in a single page, the gas cost remains unchanged at 3 gas. If the word being loaded spans two pages, the cost is 6 gas.

### Changes to other memory-touching opcodes
`CALLDATACOPY`, `CODECOPY`, `EXTCODECOPY`, `CALL`, `CALLCODE`, `DELEGATECALL`, `STATICCALL`, `CREATE`, `MSTORE8` and any other opcodes that read or write memory are modified as follows:
- Any page they touch for reading or writing is instantiated if it is not already.
- Memory expansion gas is charged as described above.

## Rationale
### Memory expansion gas cost
The new gas cost follows the same curve as the previous one, while ensuring that the new gas cost is always less than or equal to the previous cost. This prevents existing programs that make assumptions about memory allocation gas costs from resulting in errors, without unduly discounting memory below what it costs today. Intuitively, a program that uses up to a page boundary pays for one page less than they would under the old model, while a program that uses one word more than a page boundary pays for one word less than they would under the old model.

We believe that this incremental reduction will not have a significant impact on the effective gas limit, as it gets proportionally smaller as programs use more RAM.

### Additional cost for MLOADs and MSTOREs spanning two pages
Loading or storing data spanning two memory pages requires more work from the EVM implementation, which must split the word at the page boundary and update the two (possibly disjoint) pages. Since we cannot guarantee loads and stores in existing EVM programs are page-aligned, we cannot prohibit this behaviour for efficiency. Instead, we propose treating each as two loads or stores for gas accounting purposes. This discourages the use of this functionality, and accounts for the additional execution cost, without prohibiting it outright.

This will result in additional gas costs for any programs that perform these operations. We believe this to be minimal, and hope to do future analysis to confirm this.

## Backwards Compatibility
The new function for memory expansion gas cost is designed specifically to avoid backwards compatibility issues by always charging less than or equal to the amount the current EVM would charge. Under some circumstances existing programs will be charged more for MLOADs and MSTOREs that span page boundaries as described above; we believe these changes will affect a minimum of programs and have only a small impact on their gas consumption.

## Test Cases
TBD

## Security Considerations
Potential CPU DoS issues arising from additional work done under the new model are alleviated by charging more for non-page-aligned reads and writes. Charges for memory expansion asymptotically approach those currently in force, so this change will not permit programs to allocate substantially more memory than they can today.

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).