Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduces incremental hash. #188

Merged
merged 17 commits into from
Aug 10, 2023
Merged

Conversation

yilinwei
Copy link
Collaborator

@yilinwei yilinwei commented Aug 7, 2023

Fixes #181. Each API is subtly different, but in general we have a new/update/finalize/reset.

The WebCrypto API for browsers doesn't support incremental hashes yet; we could either shim it (i.e. build up the ByteVector in memory) or just throw. I feel that it is better to throw so that the user does not have an expectation of memory usage, but it might be a PITA for anyone cross-compling.

Fixes typelevel#181. Each API is subtly different, but in general we have a
new/update/finalize/reset.

The WebCrypto API for browsers doesn't support incremental hashes
yet; we could either shim it (i.e. build up the `ByteVector` in
memory) or just throw. I feel that it is better to throw so that the
user does not have an expectation of memory usage, but it might be a
PITA for anyone cross-compling.
Comment on lines 22 to 25
/**
* Used for incremental hashing.
*/
sealed trait Digest[F[_]] {
Copy link
Member

@armanbilge armanbilge Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against this, but I do want to point out some trade-offs ...

An incremental hashing API in terms of update/get/reset is "lower-level" vs an API expressed as a Stream.

This is relevant because there is a proposal for browsers to support incremental hashing, but this will likely be in the form of streams. I think it might still be possible to implement the API you've declared here with a stream, but we should be thoughtful about that.

https://github.com/wintercg/proposal-webcrypto-streams

Copy link
Collaborator Author

@yilinwei yilinwei Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; I wasn't sure which way we wanted the dependency graph (as in bobcats <- fs2 or fs2 -> bobcats where -> is dependent on). If the former then more flexibility is always nicer.

EDIT. I'll take a look at the webcrypto-streams proposal and see if I can stub an implementation with this API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bobcats <- fs2

Definitely not this. There is a high bar for FS2 taking on new dependencies and although it currently has crypto code I believe moving it out to a library like this makes much more sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Then I'll switch over to a Stream based API - we can always split into multiple modules if we don't want consumers to pull in fs2.

Comment on lines 28 to 66
implicit def forAsync[F[_]](implicit F: Async[F]): Hash[F] =
implicit def forAsync[F[_]](implicit F: Async[F]): Hash[F] = forSync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw: I now believe offering implicits in exchange for Sync/Async is an anti-pattern. See:

@yilinwei yilinwei changed the title Introduces incremental hash. WIP: Introduces incremental hash. Aug 8, 2023
implicit F: Applicative[F])
extends UnsealedHash1[F] {

override def digest(data: ByteVector): F[ByteVector] = F.pure {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think by convention this shouldn't throw if the MessageDigest is well behaved but we can err on the side of caution and just catchNonFatal.

extends UnsealedHash1[F] {

override def digest(data: ByteVector): F[ByteVector] = F.pure {
val h = hash.clone().asInstanceOf[MessageDigest]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like cloning is always supported.

Implementations are free to implement the Cloneable interface. Client applications can test cloneability by attempting cloning and catching the CloneNotSupportedException:

https://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh! This is very annoying.

I think I'll double check bouncy castle (since that's probably going to be the biggest implementation) and if they support cloning then I'll do a .clone when creating the Hash1 and switch out to a different implementation if it can't.

* Use this class if you have a specific `HashAlgorithm` known in advance or you're using a
* customized algorithm not covered by the `HashAlgorithm` class.
*/
sealed trait Hash1[F[_]] {
Copy link
Collaborator Author

@yilinwei yilinwei Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this should wrap the MessageDigest/EVP_MD/Hash equivalent on each platform so folks can construct them when the HashAlgorithm isn't complete.

It's also means that you can fail-fast if you have an application which is dependent on certain algorithm's, which imo is better than waiting until later.

If I feel like this mechanism is good, we can reuse the same pattern for most of the other algorithm selection. Hash is the simplest so doing it here first is good.

@armanbilge
Copy link
Member

Another high-level idea: have you seen this issue?

You could set up some sort of keypool of MessageDigest objects and re-use instances from that.

@yilinwei
Copy link
Collaborator Author

yilinwei commented Aug 8, 2023

Yes I have; I think that apart from the performance gains, it doesn't help in this case (apologies if I've misinterpreted your comment).

I think the underlying issue that MessageDigest#getInstance is side-effecting is the more annoying problem (hence trying to side-step it via .clone()).

Consider you set up a pool and there aren't any more MessageDigest's and getInstance fails when trying to construct a new digest. You either fail or need to wait until an instance is free; you've just changed your function from something that definitely will fail to something that might fail.

Maybe this is overthinking it though, since 99% of users won't be doing addSecurityProvider and even less doing removeSecurityProvider.

@armanbilge
Copy link
Member

armanbilge commented Aug 8, 2023

I think the underlying issue that MessageDigest#getInstance is side-effecting is the more annoying problem (hence trying to side-step it via .clone()).

Ahh, I'm sorry, I just understood what you are trying to do with that. Hmm.

Ok, how about this: if you have a known list of Providers, then there should be no side-effect right? e.g. you can use catchNonFatal. So you can use the getInstance method that explicitly takes a Provider.

https://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html#getInstance-java.lang.String-java.security.Provider-

Then to construct the Hash you will need that list of Providers. The user can either provide this or it can be obtained side-effectually via Security.getProviders.

https://docs.oracle.com/javase/8/docs/api/java/security/Security.html#getProviders--

Edit: hmmmmmmm, maybe not 😭 even Providers look to be mutable.

Edit 2: perhaps if you clone the providers beforehand?

@yilinwei yilinwei changed the title WIP: Introduces incremental hash. Introduces incremental hash. Aug 10, 2023
@yilinwei
Copy link
Collaborator Author

A lot of code for very little functionality; but it opens up it up for #189.

I'm generally happy with the pattern as well, exposing a variant of the capabilities for #58. I think the names are a little meh; the reason I'm not using forPlatform is that NodeJS for example offers both SubtleCrypto and Crypto; I'm not actually sure if the names are guaranteed to be consistent between the both of them...

Some love needs to be put into ignoring the test cases and fixtures (maybe a switch to weaver would be good).

@yilinwei yilinwei merged commit a9130e3 into typelevel:main Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sketch of incremental Hash
2 participants