Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progpow review #2893

Merged
merged 13 commits into from
Aug 22, 2020
209 changes: 140 additions & 69 deletions EIPS/eip-1057.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Ethereum's approach is to incentivize a geographically-distributed community of
> ... While ASICs exist for a proof-of-work function, both goals are
placed in jeopardy.

It is from these premises that Ethash was designed as an ASIC-resistant proof-of-work.
It is from these premises that Ethash was designed as an ASIC-resistant proof-of-work:

> Two directions exist for ASIC resistance; firstly make it sequential memory-hard, i.e. engineer the function such that the determination of the nonce requires a lot of memory and bandwidth such that the memory cannot be used in parallel to discover multiple nonces simultaneously. The second is to make the type of computation it would need to do general-purpose; the meaning of “specialised hardware” for a general-purpose task set is, naturally, general purpose hardware and as such commodity desktop computers are likely to be pretty close to “specialised hardware” for the task. For Ethereum 1.0 we have chosen the first path.

Expand All @@ -52,10 +52,6 @@ It is from these premises that Ethash was designed as an ASIC-resistant proof-of

ProgPow restores Ethash' ASIC-resistance by extending Ethash with a GPU-specific approach to the second path — making the “specialised hardware” for the PoW task commodity hardware.

*Note that DAG-size growth will defeat the Antminer E3 circa late October, 2020 [at block 11,400,000](https://blog.bitmain.com/en/bitmains-antminer-e3-firmware-update/).*

*(T.B.D. what about Innosilicon A10?)*

### ProgPoW Overview
The design goal of ProgPoW is to have the algorithm’s requirements match what is available on commodity GPUs: If the algorithm were to be implemented on a custom ASIC there should be little opportunity for efficiency gains compared to a commodity GPU.

Expand All @@ -65,7 +61,6 @@ The main elements of the algorithm are:
* Adds a random sequence of math in the main loop.
* Adds reads from a small, low-latency cache that supports random addresses.
* Increases the DRAM read from 128 bytes to 256 bytes.
* *(T.B.D. cite adjustments from 0.9.1 -> 0.9.2 -> 0.9.3 -> 0.9.4)*

The random sequence changes every `PROGPOW_PERIOD` (about 2 to 12 minutes depending on the configured value). When mining source code is generated for the random sequence and compiled on the host CPU. The GPU will execute the compiled code where what math to perform and what mix state to use are already resolved.

Expand Down Expand Up @@ -138,36 +133,60 @@ Ethash requires external memory due to the large size of the DAG. However that

## Specification

The DAG is generated exactly as in Ethash. All the parameters (ephoch length, DAG size, etc) are unchanged. See the original [Ethash](https://github.com/ethereum/wiki/wiki/Ethash) spec for details on generating the DAG.
Up to release 0.9.3 the DAG is generated exactly as in Ethash. All the parameters (epoch length, DAG size, etc) are unchanged. See the original [Ethash](https://github.com/ethereum/wiki/wiki/Ethash) spec for details on generating the DAG.

Release 0.9.3 has been software and hardware audited:
* [Least Authority — ProgPoW Software Audit PDF](https://leastauthority.com/static/publications/Least%20Authority%20-%20ProgPow%20Algorithm%20Final%20Audit%20Report.pdf)
* [Bob Rao - ProgPoW Hardware Audit PDF](https://github.com/ethereum-cat-herders/progpow-audit/raw/master/Bob%20Rao%20-%20ProgPOW%20Hardware%20Audit%20Report%20Final.pdf)

Following the suggestion expressed by Least Authority in their findings, new proposed release 0.9.4 introduces a tweak in DAG generation in order to mitigate the possibility of a "Light Evaluation" attack.
This change implies the modification of `ETHASH_DATASET_PARENTS` from a value of 256 to the new value of 512. Due to this the DAG memory file used by ProgPoW is no longer compatible with the one used by Ethash (epoch length and size increase ratio remain the same though).

After the completion of the audits a clever finding by [Kik](https://github.com/kik/) disclosed an exploitable condition to [bypass ProgPoW memory hardness](https://github.com/kik/progpow-exploit). The condition is present in Ethash but near-impossible to exploit, and requires the availability of a customized node able to accept modified block headers by the miner. To prevent this exploit this release changes the condition modifying the input state of the last keccak pass from
* header (256 bits) +
* seed for mix initiator (64 bits) +
* mix from main loop (256 bits)
* no padding

to
* digest from initial keccak (256 bits) +
* mix from main loop (256 bits) +
* padding
thus widening the constraint to target in keccak [brute force keccak linear searches](https://github.com/kik/progpow-exploit) from 64 to 256 bits.

ProgPoW can be tuned using the following parameters. The proposed settings have been tuned for a range of existing, commodity GPUs:

* `PROGPOW_PERIOD`: Number of blocks before changing the random program
* `PROGPOW_LANES`: The number of parallel lanes that coordinate to calculate a single hash instance
* `PROGPOW_REGS`: The register file usage size
* `PROGPOW_DAG_LOADS`: Number of uint32 loads from the DAG per lane
* `PROGPOW_CACHE_BYTES`: The size of the cache
* `PROGPOW_CNT_DAG`: The number of DAG accesses, defined as the outer loop of the algorithm (64 is the same as ethash)
* `PROGPOW_CNT_DAG`: The number of DAG accesses, defined as the outer loop of the algorithm (64 is the same as Ethash)
* `PROGPOW_CNT_CACHE`: The number of cache accesses per loop
* `PROGPOW_CNT_MATH`: The number of math operations per loop

The value of these parameters has been tweaked between version 0.9.2 (live on the gangnum testnet) and 0.9.3 (proposed for Ethereum adoption). See [this medium post](https://medium.com/@ifdefelse/progpow-progress-da5bb31a651b) for details.
The values of these parameters have been tweaked between version 0.9.2 (live on the Gangnam testnet) and 0.9.3 (proposed for [Ethereum adoption](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-1057.md)). See [this medium post](https://medium.com/@ifdefelse/progpow-progress-da5bb31a651b) for details.
Release 0.9.4 keeps the same tunables of 0.9.3 and includes the tweak for DAG generation.

| Parameter | 0.9.2 | 0.9.3 |
|-----------------------|-----------|-----------|
| `PROGPOW_PERIOD` | `50` | `10` |
| `PROGPOW_LANES` | `16` | `16` |
| `PROGPOW_REGS` | `32` | `32` |
| `PROGPOW_DAG_LOADS` | `4` | `4` |
| `PROGPOW_CACHE_BYTES` | `16x1024` | `16x1024` |
| `PROGPOW_CNT_DAG` | `64` | `64` |
| `PROGPOW_CNT_CACHE` | `12` | `11` |
| `PROGPOW_CNT_MATH` | `20` | `18` |
| Parameter | 0.9.2 | 0.9.3 | 0.9.4 |
|-----------------------|-------|-------|-------|
| `PROGPOW_PERIOD` | `50` | `10` | `10` |
| `PROGPOW_LANES` | `16` | `16` | `16` |
| `PROGPOW_REGS` | `32` | `32` | `32` |
| `PROGPOW_DAG_LOADS` | `4` | `4` | `4` |
| `PROGPOW_CACHE_BYTES` | `16x1024` | `16x1024` | `16x1024` |
| `PROGPOW_CNT_DAG` | `64` | `64` | `64` |
| `PROGPOW_CNT_CACHE` | `12` | `11` | `11` |
| `PROGPOW_CNT_MATH` | `20` | `18` | `18` |

| DAG Parameter | 0.9.2 | 0.9.3 | 0.9.4 |
|--------------------------|-------|-------|-------|
| `ETHASH_DATASET_PARENTS` | `256` | `256` | `512` |

The random program changes every `PROGPOW_PERIOD` blocks to ensure the hardware executing the algorithm is fully programmable. If the program only changed every DAG epoch (roughly 5 days) certain miners could have time to develop hand-optimized versions of the random sequence, giving them an undue advantage.

Sample code is written in C++, this should be kept in mind when evaluating the code in the specification.
The random program changes every `PROGPOW_PERIOD` blocks (default `10`, roughly 2 minutes) to ensure the hardware executing the algorithm is fully programmable. If the program only changed every DAG epoch (roughly 5 days) certain miners could have time to develop hand-optimized versions of the random sequence, giving them an undue advantage.

Sample code is written in C++, this should be kept in mind when evaluating the code in the specification.
All numerics are computed using unsigned 32 bit integers. Any overflows are trimmed off before proceeding to the next computation. Languages that use numerics not fixed to bit lengths (such as Python and JavaScript) or that only use signed integers (such as Java) will need to keep their languages' quirks in mind. The extensive use of 32 bit data values aligns with modern GPUs internal data architectures.

ProgPoW uses a 32-bit variant of **FNV1a** for merging data. The existing Ethash uses a similar variant of FNV1 for merging, but FNV1a provides better distribution properties.
Expand Down Expand Up @@ -215,46 +234,31 @@ Test vectors can be found [in the test vectors file](../assets/eip-1057/test-vec

```cpp
void fill_mix(
uint64_t hash_seed,
uint64_t seed,
uint32_t lane_id,
uint32_t mix[PROGPOW_REGS]
)
{
// Use FNV to expand the per-warp seed to per-lane
// Use KISS to expand the per-lane seed to fill mix
uint32_t fnv_hash = FNV_OFFSET_BASIS;
kiss99_t st;
st.z = fnv1a(FNV_OFFSET_BASIS, seed);
st.w = fnv1a(st.z, seed >> 32);
st.jsr = fnv1a(st.w, lane_id);
st.jcong = fnv1a(st.jsr, lane_id);
st.z = fnv1a(fnv_hash, seed);
st.w = fnv1a(fnv_hash, seed >> 32);
st.jsr = fnv1a(fnv_hash, lane_id);
st.jcong = fnv1a(fnv_hash, lane_id);
for (int i = 0; i < PROGPOW_REGS; i++)
mix[i] = kiss99(st);
mix[i] = kiss99(st);
}
```

Like Ethash Keccak is used to seed the sequence per-nonce and to produce the final result. The keccak-f800 variant is used as the 32-bit word size matches the native word size of modern GPUs. The implementation is a variant of SHAKE with width=800, bitrate=576, capacity=224, output=256, and no padding. The result of keccak is treated as a 256-bit big-endian number - that is result byte 0 is the MSB of the value.

As with Ethash the input and output of the keccak function are fixed and relatively small. This means only a single "absorb" and "squeeze" phase are required. For a pseudo-code imenentation of the `keccak_f800_round` function see the `Round[b](A,RC)` function in the "Pseudo-code description of the permutations" section of the [official Keccak specs](https://keccak.team/keccak_specs_summary.html).

Test vectors can be found [in the test vectors file](../assets/eip-1057/test-vectors.md#keccak_f800_progpow).
As with Ethash the input and output of the keccak function are fixed and relatively small. This means only a single "absorb" and "squeeze" phase are required. For a pseudo-code implementation of the `keccak_f800_round` function see the `Round[b](A,RC)` function in the "Pseudo-code description of the permutations" section of the [official Keccak specs](https://keccak.team/keccak_specs_summary.html).

```cpp
hash32_t keccak_f800_progpow(hash32_t header, uint64_t seed, hash32_t digest)
hash32_t keccak_f800_progpow(uint32_t* state)
{
uint32_t st[25];

// Initialization
for (int i = 0; i < 25; i++)
st[i] = 0;

// Absorb phase for fixed 18 words of input
for (int i = 0; i < 8; i++)
st[i] = header.uint32s[i];
st[8] = seed;
st[9] = seed >> 32;
for (int i = 0; i < 8; i++)
st[10+i] = digest.uint32s[i];

// keccak_f800 call for the single absorb pass
for (int r = 0; r < 22; r++)
keccak_f800_round(st, r);
Expand All @@ -270,7 +274,7 @@ hash32_t keccak_f800_progpow(hash32_t header, uint64_t seed, hash32_t digest)

The inner loop uses FNV and KISS99 to generate a random sequence from the `prog_seed`. This random sequence determines which mix state is accessed and what random math is performed.

Since the `prog_seed` changes only once per `PROGPOW_PERIOD` it is expected that while mining `progPowLoop` will be evaluated on the CPU to generate source code for that period's sequence. The source code will be compiled on the CPU before running on the GPU.
Since the `prog_seed` changes only once per `PROGPOW_PERIOD` (10 blocks or about 2 minutes) it is expected that while mining `progPowLoop` will be evaluated on the CPU to generate source code for that period's sequence. The source code will be compiled on the CPU before running on the GPU. You can see an example sequence and generated source code in [kernel.cu](https://github.com/ifdefelse/ProgPOW/blob/824cd791634204c4cc7e31f84bb76c0c84895bd3/test/kernel.cu).

Test vectors can be found [in the test vectors file](../assets/eip-1057/test-vectors.md#progpowinit).

Expand Down Expand Up @@ -443,30 +447,52 @@ void progPowLoop(
```

The flow of the overall algorithm is:
* A keccak hash of the header + nonce to create a seed
* Use the seed to generate initial mix data
* A keccak hash of the header + nonce to create a digest of 256 bits from keccak_f800 (padding is consistent with custom one in ethash)
* Use first two words of digest as seed to generate initial mix data
* Loop multiple times, each time hashing random loads and random math into the mix data
* Hash all the mix data into a single 256-bit value
* A final keccak hash is computed
* A final keccak hash using carry-over digest from initial data + mix_data final 256 bit value (padding is consistent with custom one in ethash)
* When mining this final value is compared against a `hash32_t` target

```cpp
hash32_t progPowHash(
const uint64_t prog_seed, // value is (block_number/PROGPOW_PERIOD)
const uint64_t prog_seed, // value is (block_number/PROGPOW_PERIOD)
const uint64_t nonce,
const hash32_t header,
const uint32_t *dag // gigabyte DAG located in framebuffer - the first portion gets cached
const uint32_t *dag // gigabyte DAG located in framebuffer - the first portion gets cached
)
{
hash32_t hash_init;
hash32_t hash_final;

uint32_t mix[PROGPOW_LANES][PROGPOW_REGS];
hash32_t digest;
for (int i = 0; i < 8; i++)
digest.uint32s[i] = 0;

// keccak(header..nonce)
hash32_t seed_256 = keccak_f800_progpow(header, nonce, digest);
// endian swap so byte 0 of the hash is the MSB of the value
uint64_t seed = bswap(seed_256[0]) << 32 | bswap(seed_256[1]);
/*
========================================
Absorb phase for initial keccak pass
========================================
*/

{
uint32_t state[25] = {0x0};
// 1st fill with header data (8 words)
for (int i = 0; i < 8; i++)
state[i] = header.uint32s[i];

// 2nd fill with nonce (2 words)
state[8] = nonce;
state[9] = nonce >> 32;

// 3rd apply padding
state[10] = 0x00000001;
state[18] = 0x80008081;

// keccak(header..nonce)
hash_init = keccak_f800_progpow(state);

// get the seed to initialize mix
seed = ((uint64_t)hash_init.uint32s[1] << 32) | hash_init.uint32s[0]);
}

// initialize mix for all lanes
for (int l = 0; l < PROGPOW_LANES; l++)
Expand All @@ -480,24 +506,48 @@ hash32_t progPowHash(
uint32_t digest_lane[PROGPOW_LANES];
for (int l = 0; l < PROGPOW_LANES; l++)
{
digest_lane[l] = FNV_OFFSET_BASIS
digest_lane[l] = FNV_OFFSET_BASIS;
for (int i = 0; i < PROGPOW_REGS; i++)
digest_lane[l] = fnv1a(digest_lane[l], mix[l][i]);
}
// Reduce all lanes to a single 256-bit digest
for (int i = 0; i < 8; i++)
digest.uint32s[i] = FNV_OFFSET_BASIS;
for (int l = 0; l < PROGPOW_LANES; l++)
digest.uint32s[l%8] = fnv1a(digest.uint32s[l%8], digest_lane[l])
digest.uint32s[l%8] = fnv1a(digest.uint32s[l%8], digest_lane[l]);

/*
========================================
Absorb phase for final keccak pass
========================================
*/

{
uint32_t state[25] = {0x0};

// 1st fill with hash_init (8 words)
for (int i = 0; i < 8; i++)
state[i] = hash_init.uint32s[i];

// 2nd fill with digest from main loop
for (int i = 8; i < 16; i++)
state[i] = digest.uint32s[i - 8];

// 3rd apply padding
state[17] = 0x00000001;
state[24] = 0x80008081;

// keccak(header..nonce)
hash_final = keccak_f800_progpow(state);
}

// Compare hash final to target
[...]

// keccak(header .. keccak(header..nonce) .. digest);
keccak_f800_progpow(header, seed, digest);
}
```

## Rationale

*(T.B.D. Review audits)*
## Example / Testcase

ProgPoW utilizes almost all parts of a commodity GPU, excluding:

Expand Down Expand Up @@ -525,12 +575,30 @@ result: 5b7ccd472dbefdd95b895cac8ece67ff0deb5a6bd2ecc6e162383d00c3728ece
```

Additional test vectors can be found [in the test vectors file](../assets/eip-1057/test-vectors.md#progpowhash).

### progpow 0.9.3
[Machine-readable test vectors](https://github.com/ethereum/EIPs/blob/ad4e73f239d53d72a21cfd8fdc89dc81eb9d2688/assets/eip-1057/test-vectors-0.9.3.json)

Additional test vectors can be found [in the test vectors file](../assets/eip-1057/test-vectors.md#progpowhash).

### progpow 0.9.4
[Machine-readable test vectors](https://github.com/ethereum/EIPs/blob/ad4e73f239d53d72a21cfd8fdc89dc81eb9d2688/assets/eip-1057/test-vectors-0.9.4.json) *(T.B.D)*

The random sequence generated for block 30,000 (prog_seed 3,000) can been seen in [kernel.cu](https://github.com/ifdefelse/ProgPOW/blob/824cd791634204c4cc7e31f84bb76c0c84895bd3/test/kernel.cu).

The algorithm run on block 30,000 produces the following digest and result:
```
Header : 0xffeeddccbbaa9988776655443322110000112233445566778899aabbccddeeff
Nonce : 0x123456789abcdef0
Hash init : 0xee304846ddd0a47b98179e96b60ec5ceeae2727834367e593de780e3e6d1892f
Mix seed : 0x7ba4d0dd464830ee
Mix hash : 0x493c13e9807440571511b561132834bbd558dddaa3b70c09515080a6a1aff6d0
Hash final : 0x46b72b75f238bea3fcfd227e0027dc173dceaa1fb71744bd3d5e030ed2fed053
```

Additional test vectors can be found [in the test vectors file](../assets/eip-1057/test-vectors.md#progpowhash).

Machine-readable test vectors (T.B.D)


## Implementation

Expand All @@ -539,7 +607,9 @@ We **Do Not** recommend that this Proposal be deployed at this time. Rather it
* This Proposal should be fully implemented and tested across major clients.
* Clients implementing this Proposal should be deployed and maintained on a testnet.

This leaves open the possibility and threat of future deployment. Some of the authors are engaged in work to [track what devices are mining our network](https://www.overleaf.com/project/5e222c2cac8911000178b239). These and other efforts can provide information relevant to possible deployment.
This leaves open the possibility and threat of future deployment.

Note that DAG-size growth will defeat the Antminer E3 (and some Innosilicon ASICs) in October or November of 2020 [at about block 11,400,000](https://blog.bitmain.com/en/bitmains-antminer-e3-firmware-update/).

### Clients

Expand All @@ -559,7 +629,7 @@ Trinity | Python | Ready | Developing

### Exchanges

| | Support | 0.9.4 | 0.9.4
| | Support | 0.9.3 | 0.9.4
--- | --- | --- | ---
Biki | Yes | Ready |
Bilaxi | Yes | Ready |
Expand All @@ -580,7 +650,7 @@ Nobi | Yes | Ready |

### Pools

| | Support | 0.9.4 | 0.9.4
| | Support | 0.9.3 | 0.9.4
--- | --- | --- | ---
2Miners | Yes | Ready |
antpool | Yes | Ready |
Expand All @@ -600,6 +670,7 @@ Sparkpool | Yes | Ready |
Spiderpool | Yes | Ready |
xnpopol | Yes | Ready |


## License and Copyright

The ProgPoW algorithm and this specification are a new work. Copyright and related rights are waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Expand Down
Loading