The goal of this document is to describe the process of determining gas costs for ewasm instructions.
Each Wasm opcode is assigned an appropriate Intel IA-32 (x86) opcode (or a
series of opcodes). These opcodes have a fixed cycle count (called latency
by Intel). We are selecting one specific CPU model from the Haswell architecture
(family: 06
, model 3C
). This equals to CPUs produced in 2014.
Assumption 1: This specific Haswell CPU represents a good average of the Ethereum nodes. We assume that a 2.2 Ghz model is the average.
According to Intel, the 2.2 Ghz clock rate roughly equals to 2 200 000 000
cycles per second.
Assumption 2: 1 second of CPU execution equals to 10 million gas (i.e. 1 gas equals to 0.1 µs).
This equals to 0.0045
gas per cycle. (10 000 000 / 2 200 000 000
)
To put this in perspective, the average block gas limit as of August 2016 is around 4.7 million. With this assumption we allow contract execution to take up to 0.5 second of the 15 seconds block time, which also has to include other processing, including PoW and network roundtrip times.
Assumption 3: The gas costs are adjusted on a regular basis, at least every 3 years.
We assume that CPUs are continuously improving and the hardware for Ethereum nodes are upgraded every 3 years (as that matches usual depreciation rates).
The upgrade procedure is not part of this document, but it is plausible to expect a dedicated gas cost oracle contract will exist in the future, which can be used by the metering injection contract.
The current gas handling fields do not offer the precision needed assuming that running Wasm opcodes takes significantly less processing power compared to EVM1 opcodes.
Rationale: EVM1 opcodes operate on 256-bits of data, while Wasm opcodes are limited to at most 64-bits, which results in executing four instructions in the best case to match EVM1. Most arithmetic instructions in EVM1 cost 3 gas, which would amount to 0.75 gas for most 64-bit Wasm instructions.
Internally, ewasm gas measurements should be recorded in a 64 bit variable with 4 decimal digits precision. We call this particles. It is a minor implementation detail actually using integers and converting the below gas costs appropriately.
When converting the particles count to Ethereum gas, it has to be divided by
10000
and has to be rounded up. If the result is less than 0
, then it should equal to 1
.
The formula for determining the gas cost is: <cycle count> * <gas per cycle>
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
get_local | 3 | MOV | 0.0135 |
set_local | 3 | MOV | 0.0135 |
tee_local | 3 | MOV | 0.0135 |
get_global | 3 | MOV | 0.0135 |
set_global | 3 | MOV | 0.0135 |
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
i32.load8_s | 3 | MOV | 0.0135 |
i32.load8_u | 3 | MOV | 0.0135 |
i32.load16_s | 3 | MOV | 0.0135 |
i32.load16_u | 3 | MOV | 0.0135 |
i32.load | 3 | MOV | 0.0135 |
i64.load8_s | 3 | MOV | 0.0135 |
i64.load8_u | 3 | MOV | 0.0135 |
i64.load16_s | 3 | MOV | 0.0135 |
i64.load16_u | 3 | MOV | 0.0135 |
i64.load32_s | 3 | MOV | 0.0135 |
i64.load32_u | 3 | MOV | 0.0135 |
i64.load | 3 | MOV | 0.0135 |
grow_memory | ? | (breaking out from the VM) | |
current_memory | ? | (breaking out from the VM) |
Opcode | Cycle | IA-32 eqv. |
---|---|---|
nop | ? | |
block | 0 | (this is only a grouping) |
loop | 0 | (same as block) |
if | 0 | (only a grouping, then/else counts) |
then | 2 | JMP (near) |
else | 2 | JMP (near) |
br | 2 | JMP (near) |
br_if | 3 | CMP, JMP (near) |
br_table | 2 | JMP (near) |
return | 2 | RET or JMP |
Opcode | Cycle | IA-32 eqv. |
---|---|---|
call | 2 | CALL (near) |
call_indirect | ? | CALL (far) (breaking out from the VM) |
call_import | ? | CALL (far) (breaking out from the VM) |
Opcode | Cycle | IA-32 eqv. |
---|---|---|
i32.const | 0 | |
i64.const | 0 |
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
i32.add | 1 | ADD | 0.0045 |
i32.sub | 1 | SUB | 0.0045 |
i32.mul | 3 | MUL | 0.0135 |
i32.div_s | 80 | DIV | 0.36 |
i32.div_u | 80 | DIV | 0.36 |
i32.rem_s | 80 | DIV | 0.36 |
i32.rem_u | 80 | DIV | 0.36 |
i32.and | 1 | AND | 0.0045 |
i32.or | 1 | OR | 0.0045 |
i32.xor | 1 | XOR | 0.0045 |
i32.shl | 1.5 | SHL | 0.0067 |
i32.shr_u | 1.5 | SHR | 0.0067 |
i32.shr_s | 1.5 | SHR | 0.0067 |
i32.rotl | 2 | ROL | 0.0090 |
i32.rotr | 2 | ROR | 0.0090 |
i32.eq | 1 | CMP | 0.0045 |
i32.eqz | 1 | CMP | 0.0045 |
i32.ne | 1 | CMP | 0.0045 |
i32.lt_s | 1 | CMP | 0.0045 |
i32.lt_u | 1 | CMP | 0.0045 |
i32.le_s | 1 | CMP | 0.0045 |
i32.le_u | 1 | CMP | 0.0045 |
i32.gt_s | 1 | CMP | 0.0045 |
i32.gt_u | 1 | CMP | 0.0045 |
i32.ge_s | 1 | CMP | 0.0045 |
i32.ge_u | 1 | CMP | 0.0045 |
i32.clz | 105 | (clz) | 0.4725 |
i32.ctz | 105 | (ctz) | 0.4725 |
i32.popcnt | ? |
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
i64.add | 1 | ADD | 0.0045 |
i64.sub | 1 | SUB | 0.0045 |
i64.mul | 3 | MUL | 0.0135 |
i64.div_s | 80 | DIV | 0.36 |
i64.div_u | 80 | DIV | 0.36 |
i64.rem_s | 80 | DIV | 0.36 |
i64.rem_u | 80 | DIV | 0.36 |
i64.and | 1 | AND | 0.0045 |
i64.or | 1 | OR | 0.0045 |
i64.xor | 1 | XOR | 0.0045 |
i64.shl | 1.5 | SHL | 0.0067 |
i64.shr_u | 1.5 | SHR | 0.0067 |
i64.shr_s | 1.5 | SHR | 0.0067 |
i64.rotl | 2 | ROL | 0.0090 |
i64.rotr | 2 | ROR | 0.0090 |
i64.eq | 1 | CMP | 0.0045 |
i64.eqz | 1 | CMP | 0.0045 |
i64.ne | 1 | CMP | 0.0045 |
i64.lt_s | 1 | CMP | 0.0045 |
i64.lt_u | 1 | CMP | 0.0045 |
i64.le_s | 1 | CMP | 0.0045 |
i64.le_u | 1 | CMP | 0.0045 |
i64.gt_s | 1 | CMP | 0.0045 |
i64.gt_u | 1 | CMP | 0.0045 |
i64.ge_s | 1 | CMP | 0.0045 |
i64.ge_u | 1 | CMP | 0.0045 |
i64.clz | ? | ||
i64.ctz | ? | ||
i64.popcnt | ? |
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
i32.wrap/i64 | 3 | MOV | 0.0135 |
i64.extend_s/i32 | ? | (signextend) | |
i64.extend_u/i32 | 3 | MOV | 0.0135 |
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
drop | 3 | MOV | 0.0135 |
select | 3 | CMP, JMP | 0.0135 |
Opcode | Cycle | IA-32 eqv. | Gas |
---|---|---|---|
unreachable | 0 | (breaking out from the VM - or INT3) |
Some of the above instructions are complex and cannot be mapped 1-by-1 to a machine instruction. For them we assume the below algorithms and calculate gas based on the component costs.
Number of leading 0-bits within 32 bits.
(Based on Warren, Section 5-3: Counting Leading 0's:)
function clz(x)
if x = 0 return 32
n ← 0
if (x & 0xFFFF0000) = 0: n ← n + 16, x ← x << 16
if (x & 0xFF000000) = 0: n ← n + 8, x ← x << 8
if (x & 0xF0000000) = 0: n ← n + 4, x ← x << 4
if (x & 0xC0000000) = 0: n ← n + 2, x ← x << 2
if (x & 0x80000000) = 0: n ← n + 1
return n
For an implementation in WebAssembly, see Appendix A.
Counting the individual instructions results in:
- 6
if
- 6
then
- 2
return
- 5
and
- 5
eqz
, - 11
get_local
, - 11
set_local
, - 5
add
, - 5
shl
These amount to a total of 104.5 cycles in the worst case.
Note: clz
exists natively on most common CPUs, including Intel and ARM CPUs.
We could consider it native for the purposes of Ethereum and assign a lower gas cost.
Number of trailing 0-bits within 32 bits.
(Based on Warren, Section 5-4: Counting Trailing 0's.:)
function ctz (x)
if x = 0 return 32
n ← 0
if (x & 0x0000FFFF) = 0: n ← n + 16, x ← x >> 16
if (x & 0x000000FF) = 0: n ← n + 8, x ← x >> 8
if (x & 0x0000000F) = 0: n ← n + 4, x ← x >> 4
if (x & 0x00000003) = 0: n ← n + 2, x ← x >> 2
if (x & 0x00000001) = 0: n ← n + 1
return n
This algorithm has the very same steps as i32.clz
and therefore their cost equals.
Note: ctz
does not exists natively on ARM CPUs. ARM should be considered an
important platform for Ethereum light clients and therefore it is sensible to assign
a gas cost based on a complex implementation.
Number of 1-bits within 32 bits.
TBD
Note: popcnt
is natively supported by recent CPUs, mostly through SIMD extensions only.
Number of leading 0-bits within 64 bits.
TBD
Number of trailing 0-bits within 64 bits.
TBD
Number of 1-bits within 64 bits.
TBD
Sign extend i32 to i64.
Given WebAssembly supports both arithmetic (shr_s
) and logical right shifts
(shr_u
) it is fairly simple to implement:
function signextend (x)
return (x << 32) >>> 32
This cost equals the sum of i64.shl
and i64.shr_s
, ? cycles.
TBD
Ternary operator.
TBD
Any instruction pausing the VM and transferring data between the ewasm contract and the host is breaking out of the VM.
These instructions include:
current_memory
grow_memory
call_indirect
call_import
TBD
(func $clz
(param $x i32)
(result i32)
(local $n i32)
(if (i32.eqz (get_local $x))
(then
(return (i32.const 32))
)
)
(set_local $n (i32.const 0))
(if (i32.eqz (i32.and (get_local $x) (i32.const 0xffff0000))
(then
(set_local $n (i32.add (get_local $n) (i32.const 16)))
(set_local $x (i32.shl (get_local $x) (i32.const 16)))
)
)
(if (i32.eqz (i32.and (get_local $x) (i32.const 0xff000000))
(then
(set_local $n (i32.add (get_local $n) (i32.const 8)))
(set_local $x (i32.shl (get_local $x) (i32.const 8)))
)
)
(if (i32.eqz (i32.and (get_local $x) (i32.const 0xf0000000))
(then
(set_local $n (i32.add (get_local $n) (i32.const 4)))
(set_local $x (i32.shl (get_local $x) (i32.const 4)))
)
)
(if (i32.eqz (i32.and (get_local $x) (i32.const 0xc0000000))
(then
(set_local $n (i32.add (get_local $n) (i32.const 2)))
(set_local $x (i32.shl (get_local $x) (i32.const 2))
)
)
(if (i32.eqz (i32.and (get_local $x) (i32.const 0x80000000))
(then
(set_local $n (i32.add (get_local $n) (i32.const 1)))
)
)
(return (get_local $n))
)
- Appendix C of Architecture Optimization Manual (http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html)
- Haswell CPU family (http://ark.intel.com/products/codename/42174/Haswell)
- Instruction tables by Agner Fog: http://www.agner.org/optimize/instruction_tables.pdf
- EVM 1.0 gas costs (https://docs.google.com/spreadsheets/d/1m89CVujrQe5LAFJ8-YAUCcNK950dUzMQPMJBxRtGCqs/edit#gid=0)
- Warren, Henry S. (2003). Hacker's Delight (1st ed.). Boston, Mass.: Addison-Wesley. ISBN 0-201-91465-4