Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryptographic primitives for hashing and HMAC #1712

Merged
merged 21 commits into from
Oct 8, 2020
Merged

Conversation

pchiusano
Copy link
Member

@pchiusano pchiusano commented Oct 7, 2020

This PR adds support for hashing and HMAC. See the transcript for docs and tests. Note that the newly created unison-src/new-runtime-transcripts directory has transcripts that will only be run using the new runtime and there's a tweak to the test harness to support this.

Closes #1694, but see loose ends tickets below.

Implementation notes

Followed the instructions in adding-builtins.md. Nothing too exciting. I hit a couple bugs that @dolio was kind enough to fix. I have some ideas on how to improve the handling of foreign data types in the runtime but will work on that separately.

I revamped the Bytes implementation to use a ByteArray interface provided by the memory package, to get various encoding/decoding functions (like converting to and from base 16 and base 32) and to switch to using unpinned memory. That package is already a dependency of cryptonite which we are using for crypto. See the diff in Bytes.hs for details.

See self-review.

Interesting/controversial decisions

I had wanted an incremental hashing (and HMAC) implementation but put that aside for now: you have to feed it the full Bytes or other value you are hashing. An incremental interface would look like:

Hash.new : HashAlgorithm -> Hash
Hash.add : Bytes -> Hash -> Hash
Hash.finish : Hash -> Bytes

This will let you do things like hash a large file without having to load the whole file contents in memory at once. The cryptonite library we are using supports incremental hashing, but the data type it uses to support it isn't safely serializable - it's literally just an opaque byte array internally, and the hashing algorithm (which is implemented in C) then makes assumptions based on these bytes. I was concerned that if you just deserialize any old byte array and do hashing with it, you could get buffer overruns and so on.

We could support an incremental interface later.

Test coverage

The transcript has some basic tests. The various Bytes encoding functions could use more tests.

Loose ends

ex : Bytes
ex =
  h = Sha3_512.new
  h2 = h `appendBytes` Bytes.fromList [1,2,3,4]
  Sha3_512.finish h2

> ex

unison: pack bytes: non-natural closure: DataU1 589824 1
CallStack (from HasCallStack):
  error, called at src/Unison/Runtime/Machine.hs:1027:14 in unison-parser-typechecker-0.1-DahyA8UvBK0B5OVYJToEFT:Unison.Runtime.Machine
…e2b{512,256}, Blake2s{256}. Still pending work from @dolio for the universal hashing functions.
…ializing / decompiling the hash / hmac state is a can of worms
…ts directory for transcripts that are only run with the new runtime
Copy link
Member Author

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review.

Comment on lines +1111 to +1153
-- Pure ForeignOp taking two boxed values
pfopbb :: ForeignOp
pfopbb instr
= ([BX,BX],)
. TAbss [b1,b2]
$ TFOp instr [b1,b2]
where
[b1,b2] = freshes 2

pfopbbb :: ForeignOp
pfopbbb instr
= ([BX,BX,BX],)
. TAbss [b1,b2,b3]
$ TFOp instr [b1,b2,b3]
where
[b1,b2,b3] = freshes 3

-- Pure ForeignOp taking no values
pfop0 :: ForeignOp
pfop0 instr = ([],) $ TFOp instr []

-- Pure ForeignOp taking 1 boxed value and returning 1 boxed value
pfopb :: ForeignOp
pfopb instr
= ([BX],)
. TAbss [b]
$ TFOp instr [b]
where
[b] = freshes 1

-- Pure ForeignOp taking 1 boxed value and returning 1 Either, both sides boxed
pfopb_ebb :: ForeignOp
pfopb_ebb instr
= ([BX],)
. TAbss [b]
. TLet e UN (AFOp instr [b])
. TMatch e . MatchSum
$ mapFromList
[ (0, ([BX], TAbs ev $ TCon eitherReference 0 [ev]))
, (1, ([BX], TAbs ev $ TCon eitherReference 1 [ev]))
]
where
[e,b,ev] = freshes 3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these common cases and am recycling them in a few places for various builtins.

-- declareForeign ("crypto.hash") pfopbb . mkForeign $ \(HashAlgorithm _ref _alg, _a :: Closure) ->
-- pure $ Bytes.empty -- todo : implement me

declareForeign "crypto.hashBytes" pfopbb . mkForeign $
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice am using the pfopbb : ForeignOp convention here - it's a pure foreign operation that takes 2 boxed values. Probably a lot of the foreign operations in this file could be cleaned up to use a small number of ForeignOp types.

Comment on lines +299 to +301
instance {-# overlappable #-} BuiltinForeign b => ForeignConvention b where
readForeign = readForeignBuiltin
writeForeign = writeForeignBuiltin
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves you from needing to provide a ForeignConvention for a type that's already a BuiltinForeign.

@pchiusano pchiusano requested review from dolio and stew October 7, 2020 17:01
@pchiusano pchiusano marked this pull request as ready for review October 7, 2020 17:02
}

instance B.ByteArrayAccess bytes => Eq (View bytes) where
v1 == v2 = viewSize v1 == viewSize v2 && unsafeDupablePerformIO (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, this (unsafeDupablePerformIO) seems like the wrong way to be doing this now. GHC has primops for comparing byte arrays that don't require dangerous stuff like this.

Copy link
Contributor

@dolio dolio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things overall look all right to me.

I think it'd be better to get those unsafeDupablePerformIO calls eliminated, but it's not really different than depending on a library that does it (but shouldn't).

@pchiusano
Copy link
Member Author

I wonder if we could just build some utilities around Vector Word8 rather than using the ByteArray interface. I'm not really sold on that interface. But I'd also like to start moving away from ByteString as I think the use of pinned memory isn't what we want.

Going to merge this but we can revisit the Bytes internals anytime.

@pchiusano pchiusano merged commit 8186cb1 into trunk Oct 8, 2020
@pchiusano pchiusano deleted the topic/basic-hashing branch October 8, 2020 20:03
@pchiusano pchiusano mentioned this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to unpinned byte array type in Bytes implementation Hashing and HMAC builtins proposal
3 participants