Skip to content

Commit

Permalink
Remove the correct stuff
Browse files Browse the repository at this point in the history
  • Loading branch information
Pandapip1 authored Feb 22, 2023
1 parent 29bd68c commit 51ff866
Showing 1 changed file with 0 additions and 35 deletions.
35 changes: 0 additions & 35 deletions EIPS/eip-712.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,41 +41,6 @@ Here we outline a scheme to encode data along with its structure which allows it

## Specification

### Signatures and Hashing overview

A signature scheme consists of hashing algorithm and a signing algorithm. The signing algorithm of choice in Ethereum is `secp256k1`. The hashing algorithm of choice is `keccak256`, this is a function from bytestrings, 𝔹⁸ⁿ, to 256-bit strings, 𝔹²⁡⁢.

A good hashing algorithm should satisfy security properties such as determinism, second pre-image resistance and collision resistance. The `keccak256` function satisfies the above criteria _when applied to bytestrings_. If we want to apply it to other sets we first need to map this set to bytestrings. It is critically important that this encoding function is deterministic and injective. If it is not deterministic then the hash might differ from the moment of signing to the moment of verifying, causing the signature to incorrectly be rejected. If it is not injective then there are two different elements in our input set that hash to the same value, causing a signature to be valid for a different unrelated message.

### Transactions and bytestrings

An illustrative example of the above breakage can be found in Ethereum. Ethereum has two kinds of messages, transactions `𝕋` and bytestrings `𝔹⁸ⁿ`. These are signed using `eth_sendTransaction` and `eth_sign` respectively. Originally the encoding function `encode : 𝕋 βˆͺ 𝔹⁸ⁿ β†’ 𝔹⁸ⁿ` was defined as follows:

* `encode(t : 𝕋) = RLP_encode(t)`
* `encode(b : 𝔹⁸ⁿ) = b`

While individually they satisfy the required properties, together they do not. If we take `b = RLP_encode(t)` we have a collision. This is mitigated in ethereum/go-ethereum#2940 by modifying the second leg of the encoding function:

* `encode(b : 𝔹⁸ⁿ) = "\x19Ethereum Signed Message:\n" β€– len(b) β€– b` where `len(b)` is the ascii-decimal encoding of the number of bytes in `b`.

This solves the collision between the legs since `RLP_encode(t : 𝕋)` never starts with `\x19`. There is still the risk of the new encoding function not being deterministic or injective. It is instructive to consider those in detail.

As is, the definition above is not deterministic. For a 4-byte string `b` both encodings with `len(b) = "4"` and `len(b) = "004"` are valid. This can be solved by further requiring that the decimal encoding of the length has no leading zeros and `len("") = "0"`.

The above definition is not obviously collision free. Does a bytestring starting with `"\x19Ethereum Signed Message:\n42a…"` mean a 42-byte string starting with `a` or a 4-byte string starting with `2a`?. This was pointed out in ethereum/go-ethereum#14794 and motivated Trezor to not implement the standard as-is (see trezor/trezor-mcu#163). Fortunately this does not lead to actual collisions as the total length of the encoded bytestring provides sufficient information to disambiguate the cases.

Both determinism and injectiveness would be trivially true if `len(b)` was left out entirely. The point is, it is difficult to map arbitrary sets to bytestrings without introducing security issues in the encoding function. Yet the current design of `eth_sign` still takes a bytestring as input and expects implementors to come up with an encoding.

### Arbitrary messages

The `eth_sign` call assumes messages to be bytestrings. In practice we are not hashing bytestrings but the collection of all semantically different messages of all different DApps `𝕄`. Unfortunately, this set is impossible to formalize. Instead we approximate it with the set of typed named structures `π•Š`. This standard formalizes the set `π•Š` and provides a deterministic injective encoding function for it.

Just encoding structs is not enough. It is likely that two different DApps use identical structs. When this happens, a signed message intended for one DApp would also be valid for the other. The signatures are compatible. This can be intended behaviour, in which case everything is fine as long as the DApps took replay attacks into consideration. If it is not intended, there is a security problem.

The way to solve this is by introducing a domain separator, a 256-bit number. This is a value unique to each domain that is 'mixed in' the signature. It makes signatures from different domains incompatible. The domain separator is designed to include bits of DApp unique information such as the name of the DApp, the intended validator contract address, the expected DApp domain name, etc. The user and user-agent can use this information to mitigate phishing attacks, where a malicious DApp tries to trick the user into signing a message for another DApp.

## Specification

The set of signable messages is extended from transactions and bytestrings `𝕋 βˆͺ 𝔹⁸ⁿ` to also include structured data `π•Š`. The new set of signable messages is thus `𝕋 βˆͺ 𝔹⁸ⁿ βˆͺ π•Š`. They are encoded to bytestrings suitable for hashing and signing as follows:

* `encode(transaction : 𝕋) = RLP_encode(transaction)`
Expand Down

0 comments on commit 51ff866

Please sign in to comment.