Change order of hash arguments to P || m || R #62

real-or-random · 2019-08-28T13:14:17Z

This is still WIP because

I haven't updated the algorithms and test vectors
I'll probably need to rebase on 32 byte pubkey nits #59 anyway at some point

ajtowns · 2019-08-30T00:24:07Z

For what it's worth, I think the argument that you know P,m earlier and can therefore get started on calculating the hash before you've worked out R is pretty weak -- the time to hash P,m is just going to be noise as far as I can see, and not worth trying to do in parallel. Limiting the rationale to just "this seems simpler for plausible zk proofs because ..." seems more logical to me?

real-or-random · 2019-08-30T07:19:16Z

Yes, it's maybe a little artificial. I can remove this.

Does it make sense to unify the hashes k' = hash_BIPSchnorr(P || m || x) and e = hash_BIPSchnorr(P || m || R) such that the midstate can be reused? The performance gain is again pretty weak but at least reasonable to implement as opposed to the parallelization.

Note to self: I should also mention the tagging when I write "first block" and "second block".

ajtowns · 2019-08-30T07:47:18Z

Does it make sense to unify the hashes k' = hash_BIPSchnorr(P || m || x) and e = hash_BIPSchnorr(P || m || R) such that the midstate can be reused?

Oh! While I don't think the performance matters there either, I can see k' = hash(P||m||x) being better for a zk proof of deterministic nonce -- you wouldn't need to do an zk calculations on m, you'd just pass in a midstate based on the tag, P and m which are all public.

(Public: P, R and midstate a; private: x, k', k. Prove P=x*G, k' = sha256_fin(a,x,len=160), k = k' or k = -k', R = k*G. Means you only need one round of sha in the circuit rather than two -- one for m,x and a final one for padding and length)

real-or-random · 2019-08-30T08:01:36Z

Indeed -- our draft paper on zk proofs of deterministic nonces does this conceptually: hashing m and other public stuff is "outside" the zk proof and then we feed the resulting hash/mid-state in a pseudorandom function optimized for efficiency in the zk proof.

But I think all of this shouldn't bother us too much for the BIP. If you're doing zk proofs over k, you're free to violate the spec anyway.

jonasnick · 2019-08-30T15:39:21Z

Fully on board with making bip-schnorr easer to use in zero knowledge proofs in general, but I don't get the current rationale:

Is there an application where you want to prove that you know some P, and m for a public signature? That the midstate after P and m are hashed needs to be revealed (in order for the argument to make sense) is additionally limiting.
Blind signing seems like the way more interesting application. But in general you need to prove properties about P and m in zero knowledge as well, namely whether they match what the client committed to in the beginning. Otherwise the client has control over the hash and can execute Wagner's attack. Fwiw, most interesting blind signing protocols will want to include proofs about properties of m, but in the case where m is a Bitcoin transaction, a single additional compression doesn't seem to be that significant anyway.

Does it make sense to unify the hashes [...] ?

I don't think so, we use tagging for a reason and a small performance gain would not be worth it.
Also, right now we only hash 64 bytes in the nonce derivation, so what you're proposing would require an additional compression.

I can see k' = hash(P||m||x) being better for a zk proof of deterministic nonce

@ajtowns to add what @real-or-random said as far as I remember the current idea for deterministic nonces looks very different to our nonce derivation hash here.

real-or-random · 2019-08-31T19:23:48Z

Fully on board with making bip-schnorr easer to use in zero knowledge proofs in general, but I don't get the current rationale:
* Is there an application where you want to prove that you know some P, and m for a public signature? That the midstate after P and m are hashed needs to be revealed (in order for the argument to make sense) is additionally limiting.

I don't have a full specific application in mind here. My rationale was these statements are at least meaningful. For example, you could prove that a signature is valid for a message that is known only in committed or encrypted form. Where is that useful? I don't know yet but maybe as a building block in another protocol. You can indeed say that all of this is not very useful if this is a BIP which will be mostly used for signing Bitcoin transactions.

Midstate: That's true (as mentioned in the commit). But it could very well be that there's enough entropy in m or P to make sure that revealing the midstate is not a problem. If this is not the case, then you can always add entropy to m.

If we think that this part of the rational is too much guesswork, we may as well drop it.

* Blind signing seems like the way more interesting application. But in general you need to prove properties about P and m in zero knowledge as well, namely whether they match what the client committed to in the beginning. Otherwise the client has control over the hash and can execute Wagner's attack. Fwiw, most interesting blind signing protocols will want to include proofs about properties of `m`, but in the case where `m` is a Bitcoin transaction, a single additional compression doesn't seem to be that significant anyway.

Without going into all the details here, the idea is that it suffices that the client commits in the beginning to the midstate (which he computed locally) and his contribution to R. Then the client does not have control over the hash. (And the commitment to the midstate is in fact a commitment to P and m but the prover anyway does not care about specific properties of P and m.)

Does it make sense to unify the hashes [...] ?

I don't think so, we use tagging for a reason and a small performance gain would not be worth it.
Also, right now we only hash 64 bytes in the nonce derivation, so what you're proposing would require an additional compression.
Oh right. So let's not do this. And someone really want this, it's of course still possible.

@ajtowns to add what @real-or-random said as far as I remember the current idea for deterministic nonces looks very different to our nonce derivation hash here.

Indeed, it's a very weird elliptic-curve based pseudorandom function and it needs non-blackbox reasoning to get a security proof etc. But in general the observation that P and m can be hashed outside is the right one.

jonasnick

Oh I see now that in the blind signature idea a commitment to the midstate would be a commitment to P and m. That makes sense.

But it could very well be that there's enough entropy in m or P to make sure that revealing the midstate is not a problem.

Nit: Doesn't apply if message is revealed (because it ends up being a transaction in the blockchain).

Approach ACK. Seems to help with the blind signature ideas and that's a good reason for doing it. It doesn't help if the server wants to have a proof about some property of m, but that'll be way more complicated than a single compression anyway.

real-or-random · 2019-09-02T12:00:52Z

Okay, I think I'll update this and drop the vague example (and just keep the blind sig example).

real-or-random · 2019-09-06T08:41:41Z

Rebased and addressed the comments.

sipa · 2019-09-12T23:40:41Z

The original reason for picking R||P||m may have been because that's what Ed25519 does.

There actually is a reason for that ordering. It turns out that having R first remains secure when H is a Merkle-Damgard function that has a non-collision-resistant compression function. The reasoning is that in MD functions a collision can be length-extended to be a collision with chosen suffix, but this is not possible when the first block has data that is not under the attacker's control (and P and m can be assumed to be under attacker control but R is not).

Now, this argument requires that m is the full message, and not a (possibly non collision resistant itself) hash of the message. In our use case, m is fixed length and this is arguably assumed to be a hash already (we don't specify whether it should or shouldn't be). Also for other reasons, we can't realistically claim security under collision attacks (most generally, a collision between a valid and invalid block header would break progress on the network; collisions between two valid transactions would lead to double spends).

Still, it feels it is easier to justify reusing ed25519's ordering than a hard-to-explain blind signatures argument.

What do others think?

jonasnick · 2019-09-14T18:40:56Z

Just to summarize, the blind signatures argument is that there is an optimization (of relatively unknown proportion) when using blind Schnorr signatures which is made secure with a zkp (instead of "random abortion" aka modified ROS) which does not exist yet and it's only an optimization if nothing about the blinded message is being proven.

That's hard to explain indeed. On the other hand, the argument that R||P||m prevents some collision attacks is not convincing because m is output of a hash anyway as sipa mentioned. Any one of the two options are fine for me

real-or-random · 2019-09-18T15:38:53Z

Just to summarize, the blind signatures argument is that there is an optimization (of relatively unknown proportion) when using blind Schnorr signatures which is made secure with a zkp (instead of "random abortion" aka modified ROS) which does not exist yet and it's only an optimization if nothing about the blinded message is being proven.

Just want to add that in the typical blind signatures scenarios (such as ecash), nothing about the blinded messages needs to be proven.

To be honest, I don't think that the argument with the collision attacks has changed the situation.

In the end, every order of arguments gives us a good signature scheme, this is really a detail. But yes, we care about deails, and we should have a rationale.

The argument for P||R||m not hold in practice as long as we specify (or de facto use) this scheme for 32-bytes messages. My feeling is that this additional protections against chosen-message attacks can be a very valid argument, and is in fact stronger than the possible blind signature optimization. But to make this argument hold true, we need to change m to be the real message instead of the hash of the message. And I think we don't want to do this because it's inconvenient in other places? (I'm not deeply enough into taproot to be able to judge this here.) Without using arbitrary-length messages, it's hard to write down a rationale for R||P||m in the BIP, I think.

The argument for P||m||R is indeed weak. To be honest I don't know who will use blind sigs but one goal of Schnorr sigs is to allow for a broad range of applications. The security argument is likely to hold but not fully worked out. But even if we turn out to be wrong about the security of blind sigs with zkps or we turn out to be wrong about the prediction that anyone wants to use blind signatures, I don't see that lose anything by committing to P||m||R.

So in the end, I think that P||R||m has 0 advantage in practice (with 32-byte message) and P||m||R has >= 0 advantage. And if we can switch to arbitrary-length messages without any drawbacks, then we should consider this.

sipa · 2019-09-18T18:43:32Z

And if we can switch to arbitrary-length messages without any drawbacks, then we should consider this.

Well the argument doesn't just require allowing the "message" to be arbitrary length, but also requires the message to not contain any form of hashes of other data inside of it. That's unreasonable, because for example the txid in transaction inputs goes into the message, but is itself a hash of data that is not known to the verifier. Going a step further, Bitcoin itself already relies on collision resistance for properties that are more fundamental (e.g. it's necessary for chain convergence).

Though to the extent that we intend bip-schnorr to be useful for more than just Bitcoin transaction signatures, it may be reasonable to permit m to be variable length? But with a note in bip-taproot that it's using bip-schnorr with hashed data in the message so the security of the signatures inherently relies on SHA256 collision resistance.

elichai · 2019-09-19T13:39:49Z

@sipa could you please elaborate why R in the start is what removes the requirement for collision resistance?
What I remember from the ed25519 paper, they write that the existence of R in the hash function means that collision resistance isn't needed, not a specific location.

I would even argue that putting the arbitrary length variable at the end can expose more length extension attacks problems. because it's easily controlled by an adversary.
(although I'm still unsure how can length extension attacks hurt schnorr's security)

Another thing is that length extension attacks are almost always relevant for hashing of public variables (like here), they can be a problem mostly for secret prefixing (like the deterministic nonce hashing)

sipa · 2019-09-19T21:08:45Z

@elichai If H were to be have like a random oracle, you would be right: as long as R is an input to the hash, the attacker can't construct collisions.

However, due to SHA256 being constructed using Merkle-Damgard, that's not actually true. If the input to the hash is (P||m||R), the (P||m) part is hashed first, producing a midstate. That midstate together with R is then fed through the compression function again to obtain the final hash. If the attacker can construct two pairs of P and m such that the midstate for (P||m) collides, it does not matter what R is appended afterwards.

Apparently the correct name for this is not length extension (indeed, that only applies to secret data) but a chosen-prefix collision attack.

elichai · 2019-09-19T21:28:17Z

Thanks.
Hmm yes.
I think the attack that you described will also work on a sponge-like construction (i.e sha3) and isn't related to merkle damgrad specifically (because the fact that it has no "finalization" step isn't related here, H(m||p1) and H(m||p2) will look the same "midstate" up until it gets to the p's.

sipa · 2019-09-19T21:51:35Z

@elichai True, but for SHA-3 the midstate is 1600 bits, and you'd need to collide the entire thing.

Good point that this is not unique to MD constructions, though.

jonasnick · 2019-09-23T15:30:20Z

Just want to add that in the typical blind signatures scenarios (such as ecash), nothing about the blinded messages needs to be proven.

Just want to add that you don't use Schnorr blind signatures unless you really have to - i.e. you only use them if you want to blind sign a Bitcoin transaction. Afaik the only known schemes where the message is irrelevant are (tumblebit-like) blind swaps and statechains.

real-or-random · 2019-09-24T09:06:25Z

Well the argument doesn't just require allowing the "message" to be arbitrary length, but also requires the message to not contain any form of hashes of other data inside of it. That's unreasonable, because for example the txid in transaction inputs goes into the message, but is itself a hash of data that is not known to the verifier. Going a step further, Bitcoin itself already relies on collision resistance for properties that are more fundamental (e.g. it's necessary for chain convergence).

Oh indeed, the txid is a good point. It's a hash anyway. (Even I should have noticed that. ;))

real-or-random · 2019-09-24T09:13:22Z

Though to the extent that we intend bip-schnorr to be useful for more than just Bitcoin transaction signatures, it may be reasonable to permit m to be variable length? But with a note in bip-taproot that it's using bip-schnorr with hashed data in the message so the security of the signatures inherently relies on SHA256 collision resistance.

Hm, I think things are actually more complicated for taproot. Sure collision-resistance is necessary for the signatures, but we rely on so many more properties of the hash function, even in taproot itself. If you can open a taproot to two different scripts, you don't necessarily have a normal collision in the hash function for example but just some other collision-thing for which x + H(xG, m) = x' + H(x'G, m').

real-or-random · 2019-10-03T09:02:44Z

This is a paper that formalizes the intuition of collisions with random prefix (R), in the context of Schnorr signatures: http://www.neven.org/papers/schnorr.pdf

I was not aware of this work. In fact I found it because it's behind a link that should point to the original paper by Schnorr in the JoC: https://en.wikipedia.org/wiki/Schnorr_signature#References

sipa · 2019-11-05T00:07:29Z

All things considered, I think it's better to stick to the more conventional R||P||m order.

elichai · 2020-03-13T09:52:07Z

FWIW I think our schnorr implementation does assume collision resistant and not chosen-prefix resistant because we pre-hash the message. meaning any collisions on the message itself results in a collision in the signature. the collision doesn't need to have any chosen prefix.

jonasnick mentioned this pull request Aug 29, 2019

Add schnorrsig module which implements BIP-340 compliant signatures bitcoin-core/secp256k1#558

Merged

jonasnick reviewed Sep 2, 2019

View reviewed changes

real-or-random added 5 commits September 6, 2019 10:26

Change order of hash arguments to P || m || R

d87e6c9

Feedback for text

f35e099

Update pseudocode

d536b9b

Update reference code

5ed084e

Update test vectors

aeb69e3

real-or-random force-pushed the patch-5 branch from d0a3ba9 to aeb69e3 Compare September 6, 2019 08:40

real-or-random marked this pull request as ready for review September 6, 2019 08:41

real-or-random changed the title ~~WIP: Change order of hash arguments to P || m || R~~ Change order of hash arguments to P || m || R Sep 6, 2019

sipa closed this Nov 5, 2019

real-or-random mentioned this pull request Dec 13, 2019

Mention that we don't change the hash function #179

Merged

real-or-random mentioned this pull request Aug 27, 2020

BIP340: clarify impact of pre-hashed messages, or support variable-length messages #207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change order of hash arguments to P || m || R #62

Change order of hash arguments to P || m || R #62

real-or-random commented Aug 28, 2019

ajtowns commented Aug 30, 2019

real-or-random commented Aug 30, 2019

ajtowns commented Aug 30, 2019

real-or-random commented Aug 30, 2019

jonasnick commented Aug 30, 2019 •

edited

Loading

real-or-random commented Aug 31, 2019

jonasnick left a comment

real-or-random commented Sep 2, 2019

real-or-random commented Sep 6, 2019

sipa commented Sep 12, 2019

jonasnick commented Sep 14, 2019

real-or-random commented Sep 18, 2019

sipa commented Sep 18, 2019

elichai commented Sep 19, 2019

sipa commented Sep 19, 2019

elichai commented Sep 19, 2019

sipa commented Sep 19, 2019

jonasnick commented Sep 23, 2019

real-or-random commented Sep 24, 2019

real-or-random commented Sep 24, 2019

real-or-random commented Oct 3, 2019

sipa commented Nov 5, 2019 •

edited

Loading

elichai commented Mar 13, 2020

Change order of hash arguments to P || m || R #62

Change order of hash arguments to P || m || R #62

Conversation

real-or-random commented Aug 28, 2019

ajtowns commented Aug 30, 2019

real-or-random commented Aug 30, 2019

ajtowns commented Aug 30, 2019

real-or-random commented Aug 30, 2019

jonasnick commented Aug 30, 2019 • edited Loading

real-or-random commented Aug 31, 2019

jonasnick left a comment

Choose a reason for hiding this comment

real-or-random commented Sep 2, 2019

real-or-random commented Sep 6, 2019

sipa commented Sep 12, 2019

jonasnick commented Sep 14, 2019

real-or-random commented Sep 18, 2019

sipa commented Sep 18, 2019

elichai commented Sep 19, 2019

sipa commented Sep 19, 2019

elichai commented Sep 19, 2019

sipa commented Sep 19, 2019

jonasnick commented Sep 23, 2019

real-or-random commented Sep 24, 2019

real-or-random commented Sep 24, 2019

real-or-random commented Oct 3, 2019

sipa commented Nov 5, 2019 • edited Loading

elichai commented Mar 13, 2020

jonasnick commented Aug 30, 2019 •

edited

Loading

sipa commented Nov 5, 2019 •

edited

Loading