Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompiled BLAKE2b #129

Closed
tjade273 opened this issue Jul 6, 2016 · 35 comments
Closed

Precompiled BLAKE2b #129

tjade273 opened this issue Jul 6, 2016 · 35 comments

Comments

@tjade273
Copy link

tjade273 commented Jul 6, 2016

Title

  Title: Add precompiled BLAKE2b contract
  Author: Tjaden Hess <tah83@cornell.edu>
  Status: Draft
  Type: Standard Track
  Layer: Consensus (hard-fork)
  Created 2016-06-30

Abstract

This EIP introduces a new precompiled contract which implements the BLAKE2b cryptographic hashing algorithm, for the purpose of allowing interoperability between the Zcash blockchain and the EVM.

Parameters

  • METROPOLIS_FORK_BLKNUM: TBD
  • GBLAKEBASE: 30
  • GBLAKEWORD: 6

Specification

Adds a precompile at address 0x0000....0c which accepts a variable length input interpreted as

[OUTSIZE, SALT[0...15], PERSONALIZATION[0...15], D_1, D_2, ..., D_INSIZE]

where INSIZE is the length in bytes of the input. Throws if OUTSIZE is greater than 64. Returns the OUTSIZE-byte BLAKE2b digest, as defined in RFC 7693.

Gas costs would be equal to GBLAKEBASE + GBLAKEWORD * floor(INSIZE / 32)

In order to maintain backwards compatibility, the precompile will return 0 if CURRENT_BLOCKNUM < METROPOLIS_FORK_BLKNUM

Motivation

Besides being a useful cryptographic hash function and SHA3 finalist, BLAKE2b allows for efficient verification of the Equihash PoW used in Zcash, making a BTC Relay - style SPV client possible on Ethereum. One BLAKE2 digest in Soldity, (see https://github.com/tjade273/eth-blake2/blob/optimise_everything/contracts/blake2.sol) currently requires ~480,000 + ~90*INSIZE gas, and a single verification of an Equihash PoW verification requires 25 to 27 iterations of the hash function, making verification of Zcash block headers prohibitively expensive.

The BLAKE2b algorithm is highly optimized for 64-bit CPUs, and is faster than MD5 on modern processors.

Interoperability with Zcash would enable trustless atomic swaps between the chains, which could provide a much needed aspect of privacy to the very public Ethereum blockchain.

Rationale

The most frequent concern with EIPs of this type is that the addition of specific functions at the protocol level is an infringement on Ethereum's "featureless" design. It is true that a more elegant solution to the issue is to simply improve the scalability characteristics of the network so as to make calculating functions requiring millions of gas practical for everyday use. In the meantime, however, I believe that certain operations are worth subsidising via precompiled contracts and there is significant precedent for this, most notably the inclusion of the SHA256 prcompiled contract, which is included largely to allow inter-operation with the Bitcoin blockchain.

Additionally, BLAKE2b is an excellent candidate for precompilation because of the extremely asymetric efficiency which it exhibits. BLAKE2b is heavily optimized for modern 64-bit CPUs, specifically utilizing 24 and 63-bit rotations to allow parallelism through SIMD instructions and is little-endian. These characteristics provide exceptional speed on native CPUs: 3.08 cycles per byte, or 1 gibibyte per second on an Intel i5.

In contrast, the big-endian 32 byte semantics of the EVM are not conducive to efficient implementation of BLAKE2, and thus the gas cost associated with computing the hash on the EVM is disproportionate to the true cost of computing the function natively.

Note that the function can produce a variable-length digest, up to 64 bytes, which is a feature currently missing from the hash functions included in the EVM and is an important component in the Equihash PoW algorithm.

There is very little risk of breaking backwards-compatibility with this EIP, the sole issue being if someone were to build a contract relying on the address at 0x000....0000c being empty. Te likelihood of this is low, and should specific instances arise, the address could be chosen to be any arbitrary value, with negligible risk of collision.

@tjade273
Copy link
Author

tjade273 commented Jul 7, 2016

PRs implementing the changes are in the works

@kumavis
Copy link
Member

kumavis commented Jul 7, 2016

@tjade273 excellent EIP - clear and easy to follow. can you explain how this enables Zcash atomic swaps? BTC relay does not enable atomic BTC swaps, only tx confirmation.

@tjade273
Copy link
Author

tjade273 commented Jul 7, 2016

Even with BTC relay, one can construct a sort of swap contract. If Alice wants to buy 1 BTC from Bob,

Alice can create a contract which pays 100 ETH to whomever can provide a merkle proof of a transaction to Alice's BTC address, or sends the ETH back to Alice if no one provides a valid transaction within some time limit.

The Zrelay would enable the same thing

@kumavis
Copy link
Member

kumavis commented Jul 7, 2016

👍 right ok, requires a market+escrow contract 👍

@daira
Copy link

daira commented Jul 23, 2016

Note that the variation of BLAKE2b used by the Zcash PoW is set to have an output length other than the default 512 bits, and a custom personalisation string (both of these depend on the Equihash parameters, which have not been nailed down yet): https://github.com/zcash/zcash/blob/zc.v0.11.2.latest/src/crypto/equihash.cpp#L30

I recommend allowing arbitrary personalisation strings and output lengths (and maybe salts, which are not used by Zcash but may be used by other protocols) to be specified.

@tjade273
Copy link
Author

Yes, I originally drafted the EIP with arbitrary length output, but in the
interests of getting the idea out there, I made it a little simpler.

I recently added personalisation to Solidity implementation as well, so
that shouldn't be a problem

On Sat, Jul 23, 2016, 8:54 PM Daira Hopwood notifications@github.com
wrote:

Note that the variation of BLAKE2b used by the Zcash PoW is set to have an
output length other than the default 512 bits, and a custom personalisation
string (both of these depend on the Equihash parameters, which have not
been nailed down yet):
https://github.com/zcash/zcash/blob/zc.v0.11.2.latest/src/crypto/equihash.cpp#L30

I recommend allowing arbitrary personalisation strings and output lengths
to be specified.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#129 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADwRiHAsf5PwvCgpgRG5v_taTCYkdY6hks5qYmNJgaJpZM4JGVJJ
.

@zookozcash
Copy link

Way to go!

Questions:

  • Is it likely to go into Metropolis? Is there anything I can do to help get it in?

  • Why does the Solidity version take ~480K gas to set up the state, but then only ~12K gas per 128-byte input block? It shouldn't be that expensive to set up the state! What's going on there?

  • How much gas does a Solidity implementation of BLAKE2s cost? We could switch from BLAKE2b to BLAKE2s and it would be a modest performance benefit in C/C++ — reducing the cost of hashing to about 75% of previous levels. I wonder if it would be an even bigger win in Solidity.

  • Suppose we changed it from:

    X_i = H(block header up to end of nonce | index_i)

    to

    U = H(block header up to end of nonce)
    X_i = H(U | index_i)

    Then an Equihash implementation would need to generate a lot (like 2^5 or 2^7) X_i's, but the smaller input (U | index_i) would fit into a single BLAKE2b or BLAKE2s input block, so it should take about half as much gas per X_i as hashing H(block header up to end of nonce | index_i) for each X_i, like we currently do.

@zookozcash
Copy link

Oh, is the high cost of gas to initialize the state due to the fact that memory isn't free? In that case, I would expect BLAKE2s to cost about half as much gas, for both the initialization and the per-byte processing, as BLAKE2b. So maybe ~240K + 45 * INSIZE.

@tjade273
Copy link
Author

@zookozcash Yes, the gas cost is almost entirely from the high memory costs. BLAKE2s may well be much more practical, I can modify the implementation and run some tests pretty quickly.

@daira
Copy link

daira commented Sep 24, 2016

Right, but note that BLAKE2b with an arbitrary personalisation string and output length is what Equihash needs.

@tjade273
Copy link
Author

tjade273 commented Sep 24, 2016

Yes, the current proposal allows for arbitrary output length. The
personalization and salt are padded to 16 bytes, as per the BLAKE2b specification.

@zookozcash
Copy link

zookozcash commented Sep 27, 2016

GBLAKEBASE: 30
GBLAKEWORD: 6

(See also someone asking what is the cheapest hash function available in solidity.)

According to the yellow paper, SHA3 has base gas cost 30 and per-word gas cost 6.

According to bench.cr.yp.to, BLAKE2b costs roughly 3 times less than SHA3 ("keccakc512") per base and 2.7 times less per byte on a Knight's Landing CPU.

I'd suggest reflecting this efficiency in the gas costs. Dividing by those ratios and rounding up that would be:

GBLAKEBASE: 10
GBLAKEWORD: 3

By the way, here's an answer I just posted on crypto.stackexchange.com about why NIST chose Keccak instead of BLAKE (not BLAKE2) for SHA-3.

@zookozcash
Copy link

zookozcash commented Sep 27, 2016

Sometimes protocols need to incrementally process inputs, i.e. to produce a series of chunks, and after producing each one, add it into the hash computation, and then after many of these chunks, generate the hash result.

If the Ethereum precompile for BLAKE2b has a way to initialize a hasher object, update it with some more bytes, and then finalize it, then Ethereum apps can do the this — updating the hasher object with each successive chunk until all the chunks have been processed, and then producing the hash result by finalizing the hasher object.

With the API in the current proposal, above, Ethereum apps would instead have to accumulate a buffer of all the chunks, wait until they've produced or received the last chunk, and then pass the entire buffer into the hash precompile at once. Depending on the protocol, this could impose a much higher memory cost — the Ethereum app would need to use enough memory to store all of the chunks put together, vs. enough to store the biggest chunk.

Some other protocols (i.e. not Ethereum apps, but pre-existing or external protocols) may assume this kind of incremental hashing ability, and the absence of it could hinder the ability of an Ethereum app to interoperate with that other protocol.

The discussion about the "block header" in #129 (comment) is an example of someone wanting to write an Ethereum app that is compatible with a pre-existing protocol — Zcash — and the absence of the incremental hasher causing the Ethereum implementation to be unnecessarily expensive.

@zmanian
Copy link

zmanian commented Sep 27, 2016

I think @zookozcash's proposal to offer a reduced gas cost to Blake2 is potentially ill considered.

Zooko is obviously correct about the performance benefits of Blake2 on relevant general purpose hardware and Blake2's cryptographic strength relative to sha3.

Ethereum discounts selected cryptographic functions by executing them outside of the Ethereum Virtual Machine. The reason Ethereum does this is that without offering theses discounts there would be incentives to use weak cryptography in contracts to keep the gas prices of contract calls reasonable.

Gas costs on these discounted functions must balance protection against CPU exhaustion attacks, encouraging user safety in smart contracts and the cost of validation for full nodes.

The gas cost of sha3 is an approximation of the optimal value that appears to have been arrived at intuitively. It seems to be a good enough approximation because other operations have benn better low hanging fruit for resource attacks on the networks.

Unless Ethereum wants to signal to users that Blake2 is preferred to Sha3 in all three criteria above, I recommend that the discounted gas cost for digest functions be identical within same order of magnitude of performance.

@AFDudley
Copy link
Contributor

@zmanian Good points. But I suspect external reasons besides price, like interoperability, will decide which hash is used. If this turns out to not be the case, then they should raise the price.

@tjade273
Copy link
Author

@zmanian I'm not sure that SHA3 is actually discounted all that much. The spreadsheet used to determine the original gas costs has the cost of SHA3 set to the average compute time in microseconds, in the same manner applied to the other opcodes. The spreadsheet may not be entirely accurate, and @vbuterin may be able to shed more light on that process.

In the end, I think the best way to determine the gas cost for the new opcodes is to just do a simulation in each of the clients, and compare the compute time and memory relative to the other opcodes. I do think that we'll find that the actual cost should be lower than the one I proposed.

@zmanian
Copy link

zmanian commented Sep 27, 2016

Thank you for that spreadsheet. It's a great source of data.

The discounting is relative to implementing sha3 in EVM op codes rather than as externally callable function.

The approach for approximation documented in that spreadsheet is to trying estimate an optimal gas by treating sha3 as equivalent to another EVM opcode. I don't consider this estimation methodology robust but it is perhaps good enough.

I don't the spreadsheet refutes my argument. But my argument is merely offered for considered.

@tjade273
Copy link
Author

@zookozcash I think that incremental hashing could be useful; implementing that would really make BLAKE2 a precompiled contract, as opposed to the current precompiles which offer very restricted functionality, and are essentially designed to emulate opcodes.

Implementation would be slightly more involved, but there would be the added benefit that the precompile could follow the ABI spec, making it easier for users to interact with the contract directly, instead of through Solidity reserved keywords.

In the end, I think it's a question of what the community sees as the purpose of precompiled contracts: to essentially add new opcodes or to subsidize actual contracts that add important functionality to the ecosystem.

@zookozcash
Copy link

I don't precisely understand the difference between a contract and an
opcode. Could you teach me?

On Sep 27, 2016 11:56 AM, "Tjaden Hess" notifications@github.com wrote:

@zookozcash https://github.com/zookozcash I think that incremental
hashing could be useful; implementing that would really make BLAKE2 a
precompiled contract, as opposed to the current precompiles which offer
very restricted functionality, and are essentially designed to emulate
opcodes.

Implementation would be slightly more involved, but there would be the
added benefit that the precompile could follow the ABI spec, making it
easier for users to interact with the contract directly, instead of through
Solidity reserved keywords.

In the end, I think it's a question of what the community sees as the
purpose of precompiled contracts: to essentially add new opcodes or to
subsidize actual contracts that add important functionality to the
ecosystem.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#129 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APp9QICIlzjJEK7aoiY11YJs2aOxvh3uks5quWbfgaJpZM4JGVJJ
.

@zookozcash
Copy link

I don't understand your objection, Zaki. My naïve assumption was that gas
costs are supposed to approximate CPU time and/or energy costs to the
executor. Is that not the case?

On Sep 27, 2016 11:00 AM, "Zaki Manian" notifications@github.com wrote:

I think @zookozcash https://github.com/zookozcash's proposal to offer a
reduced gas cost to Blake2 is potentially ill considered.

Zooko is obviously correct about the performance benefits of Blake2 on
relevant general purpose hardware and Blake2's cryptographic strength
relative to sha3.

Ethereum discounts selected cryptographic functions by executing them
outside of the Ethereum Virtual Machine. The reason Ethereum does this is
that without offering theses discounts there would be incentives to use
weak cryptography in contracts to keep the gas prices of contract calls
reasonable.

Gas costs on these discounted functions must balance protection against
CPU exhaustion attacks, encouraging user safety in smart contracts and the
cost of validation for full nodes.

The gas cost of sha3 is an approximation of the optimal value that appears
to have been arrived at intuitively. It seems to be a good enough
approximation because other operations have better low hanging fruit for
resource attacks on the networks.

Unless Ethereum wants to signal to users that Blake2 is preferred to Sha3
in all three criteria above, I recommend that the discounted gas cost for
digest functions be identical within same order of magnitude of performance.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#129 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APp9QAIg8yR-OG227mNC3i7h-qKAipf6ks5quVnBgaJpZM4JGVJJ
.

@tjade273
Copy link
Author

An opcode is like a regular CPU instruction. It pops arguments off the stack and puts a return value back on top, and is represented by an instruction in the EVM (i.e. ADD, MUL, PUSH, MSTORE, etc.)

Precompiled contracts look and act like contracts, but under the hood they are executed natively, instead of within the EVM. These contracts have an address, like other accounts, and are called using the CALL opcode or a transaction. They are assigned an address, which could be anything under 20 bytes but is usually 1, 2, 3, etc. for convenience (we assume that the chance of collision is negligible).

Interestingly, SHA3 is implemented as an opcode (0x20), while SHA256 is implemented as a precompiled contract at address 2.

Precompiles have more flexibility in the size of their input and output, but they are more expensive to use, since they require a CALL, and they must be allocated gas, instead of simply using the available gas.

@zookozcash
Copy link

zookozcash commented Sep 28, 2016

Okay, thanks for the explanation of opcode vs contract.

Precompiles have more flexibility in the size of their input and output, but they are more expensive to use, since they require a CALL, and they must be allocated gas, instead of simply using the available gas.

What do you mean "more flexibility" in the size of the input? Does it mean that the input size of an opcode cannot be dynamically determined? Or does it mean that opcodes have a max input size that they can accept? (And if so what is the max?)

So there are three options:

  1. an opcode that does the full BLAKE2 hash
    pros:

    • it's an opcode so it is efficient to invoke
    • it's simple to use

    cons:

    • can't do incremental (streaming) processing
    • it's more to design and implement inside consensus rules
  2. a precompiled contract that offers the initialize, update, finalize API
    pros:

    • can do incremental processing
    • can also offer a simple one-shot API, so it can be simple to use

    cons:

    • it requires a CALL and must be allocated gas
    • it's more to design and implement inside consensus rules
  3. an opcode that does the BLAKE2 Compression Function F
    pros:

    • can do incremental processing
    • it's an opcode so it is efficient to invoke
    • it's simple to implement
    • it's less to design and implement inside consensus rules
    • can be wrapped in some Solidity code that provides the full BLAKE2 API, so that wrapper code can be simple to use
    • the wrapper can provide both one-shot and incremental processing, so both of those can be simple to use

    cons:

    • isn't simple to use unless you have that wrapper code
    • it'll be a little less efficient than option 1 when your inputs are few and already-here (But note "incremental processing" in the "pros" above — "incremental processing" means that it'll be a lot more efficient than option 1 when your inputs are numerous or not-yet-here.)

I'd support any of these three, but I especially like option 3 because it moves some of the implementation and the design trade-offs out of consensus code and into Solidity wrapper code. People can optimize that stuff or settle on new idioms without any further changes to consensus.

Note that option 1, which precludes streaming processing, would be precluding an idiom that the rest of the crypto-engineering world has already settled on over the last decade or so.

@zmanian
Copy link

zmanian commented Sep 30, 2016

Relevant to metering estimation

@zookozcash
Copy link

@tjade273: would you be willing to implement an opcode for BLAKE2b Compression Function F per option 3 in this comment? Jay Graber from the Zcash Company might be interested in doing that, or to pair program with you in order to learn how.

@axic
Copy link
Member

axic commented Sep 30, 2016

Please don't implement a new opcode.

@tjade273
Copy link
Author

I'm not sure a new opcode is a great idea. I personally think that the EVM should have a relatively limited instruction set, and only implement relatively trivial and basic functions.

I like the idea of only implementing the compression function at the protocol level, though. It would make a BLAKE2 contact much more efficient, while not overly complicating the protocol and allowing a lot of flexibility

@zookozcash
Copy link

zookozcash commented Sep 30, 2016

@tjade273:

So are you suggesting an option 4 (compared to the other three from this comment):

\4. a precompiled contract that does the BLAKE2 Compression Function F

  • pros:
    • can do incremental processing
    • it's simple to implement
    • it's less to design and implement inside consensus rules
    • can be wrapped in some Solidity code that provides the full BLAKE2 API, so that wrapper code can be simple to use
    • the wrapper can provide both one-shot and incremental processing, so both of those can be simple to use
  • cons:
    • isn't simple to use unless you have that wrapper code
    • it requires a CALL (thus making it less efficient) and must be allocated gas
    • it'll be less efficient than option 1 when your inputs are few and already-here (But note "incremental processing" in the "pros" above — "incremental processing" means that it'll be a lot more efficient than option 1 when your inputs are numerous or not-yet-here.)

I'm still a fan of option 3. In my ever so humble opinion a BLAKE2b Compression Function F is a great basic opcode, and I think real hardware microprocessors ought to add such an instruction too. :-)

However, option 4 seems okay too. The exact performance results aren't clear to me, but that probably doesn't matter too much — just having a precompile for BLAKE2b Compression Function F is already going to be enough to solve all of our (current) performance problems. Option 4 keeps more things outside of consensus rules than Option 2.

@zookozcash
Copy link

Okay, so the next step on Option 4 is to define the API to BLAKE2b Compression Function F. It should be stateless, and the inputs are:

Compression function F takes as an argument the state vector "h",
message block vector "m" (last block is padded with zeros to full
block size, if required), 2w-bit offset counter "t", and final block
indicator flag "f".  Local vector v[0..15] is used in processing.  F
returns a new state vector.

(Quoted from the RFC.)

It might also be worth passing in the number of rounds r as an input, instead of hard-coding 12, because there are multiple newer cryptographic projects that have been experimenting with or at least talking about reduced-round BLAKE2b.

The implementation of BLAKE2b Compression Function F is conveniently shown in pseudocode in this section of the RFC.

@axic
Copy link
Member

axic commented Sep 30, 2016

@zookozcash:

a precompiled contract that offers the initialize, update, finalize API
pros:

  • can do incremental processing

precompiled contracts are stateless, so in order to do those 3 calls, you need to pass the state back and forth.

@tjade273
Copy link
Author

tjade273 commented Sep 30, 2016

Yes, implementing F should be pretty straightforward, and it's inherently stateless. The API can easily just be the ABI encoding of the arguments, i.e.

F(bytes32[2] h, bytes32[4] b, uint t, bool f) returns (bytes32[2] h1)

Where h is the state, b is the input, t is the byte counter, and f is the finalization flag

@tjade273
Copy link
Author

@axic, that's why a contract that just defines the compression function is useful. It implements the computationally difficult and 64-bit optimized parts natively, while allowing other contracts to wrap it in nice APIs that can do things like incremental hashing, tree hashing, etc.

@zookozcash
Copy link

Looks great, Tjaden! I still think r -- the number of rounds -- should be a
parameter in the API instead of being hardcoded.

On Sep 30, 2016 12:05 PM, "Tjaden Hess" notifications@github.com wrote:

Yes, implementing F should be pretty straightforward, and it's inherently
stateless. The API can easily just be the ABI encoding of the arguments,
i.e. F(bytes32[2] h, bytes32[4] b, uint t, book f) returns (bytes32[2] h)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#129 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APp9QM-D8PtBOyLNxH70Kgb2P6Aepn4Eks5qvU9mgaJpZM4JGVJJ
.

@tjade273
Copy link
Author

tjade273 commented Oct 1, 2016

That should be pretty straightforward

On Fri, Sep 30, 2016, 8:01 PM zookozcash notifications@github.com wrote:

Looks great, Tjaden! I still think r -- the number of rounds -- should be a
parameter in the API instead of being hardcoded.

On Sep 30, 2016 12:05 PM, "Tjaden Hess" notifications@github.com wrote:

Yes, implementing F should be pretty straightforward, and it's inherently
stateless. The API can easily just be the ABI encoding of the arguments,
i.e. F(bytes32[2] h, bytes32[4] b, uint t, book f) returns (bytes32[2] h)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#129 (comment),
or mute
the thread
<
https://github.com/notifications/unsubscribe-auth/APp9QM-D8PtBOyLNxH70Kgb2P6Aepn4Eks5qvU9mgaJpZM4JGVJJ

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#129 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADwRiI_jm0X3bIob7HDw0lv4oxt3WoWJks5qvaLOgaJpZM4JGVJJ
.

@tjade273
Copy link
Author

tjade273 commented Oct 4, 2016

I've opened a new issue for this new proposal #152

@zookozcash
Copy link

So let's close this one, showing that we're not going to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants