Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace fs2.hash with fs2.hashing.Hashing[F] #3454

Merged
merged 19 commits into from
Aug 5, 2024
Merged

Replace fs2.hash with fs2.hashing.Hashing[F] #3454

merged 19 commits into from
Aug 5, 2024

Conversation

mpilquist
Copy link
Member

@mpilquist mpilquist commented Jul 4, 2024

The fs2.hash object provides the ability to compute the hash of a stream of bytes (i.e. Stream[F, Byte]). The hash computation is modeled as a Pipe[F, Byte, Byte], where pulling on the output stream results in all the source bytes getting hashed and then a final chunk being emitted that contains the hash of the seen bytes.

This API is not expressive enough for common use cases though. Consider the case where a Stream[F, Byte] needs to be written to a file (or uploaded to an S3 bucket, or sent to a socket, etc) and the hash also needs to be stored. The current hashing API makes this awkward -- either requiring heavy duty machinery like broadcastThrough or requiring the stream to be processed twice.

This PR deprecates the fs2.hash object and replaces it with the new fs2.hashing package. The entry point to the new package is the Hashing[F] capability trait, allowing the creation of Hash[F] objects as well as providing various convenience methods.

The hashing package contains:

  • HashAlgorithm enumeration -- e.g., HashAlgorithm.SHA256 and HashAlgorithm.Named("MD2")
  • Hash[F] mutable object -- allows incremental computation of hashes with a specific algorithm. Hash defines the following operations:
    • Low level hashing operations: addChunk(c: Chunk[Byte]): F[Unit] and computeAndReset: F[Chunk[Byte]]
    • update: Pipe[F, Byte, Byte] - updates the hash with the chunks pulled through the pipe
    • observe(source: Stream[F, Byte], sink: Pipe[F, Byte, Nothing]): Stream[F, Byte] - returns a stream that outputs the hash of the source bytes after they've been consumed by the supplied sink
    • hash: Pipe[F, Byte, Byte] - outputs the hash of the source bytes
    • verify(expected: Chunk[Byte])(implicit F: RaiseThrowable[F]): Pipe[F, Byte, Byte] - pipe that outputs the source bytes but raises a HashVerificationException when the source terminates if the hash of seen bytes doesn't match the expected hash
  • Hashing[F] capability trait -- allows creation of Hash[F] objects

The Hashing[F] trait returns Hash[F] objects as resources (i.e. Resource[F, Hash[F]]) because (on some platforms) they have to be released after computations are complete.

With this new API, writing the contents of a source to a file and then subsequently writing a hash to a separate file, while processing the source just once, can be accomplished like so:

def writeWithHash[F[_]: Files: Hashing: MonadCancelThrow](path: Path): Pipe[F, Byte, Nothing] =
  source =>
     Stream.resource(Hashing[F].sha256).flatMap { h =>
       h.observe(source, Files[F].writeAll(path)).through(Files[F].writeAll(Path(s"$path.sha256")))
     }

The Hashing object also contains utility functions for hashing a pure stream and a chunk.

val h1 = Hashing.hashChunk(HashAlgorithm.SHA256, Chunk.array("The quick brown fox".getBytes))
val h2 = Hashing.hashPureStream(HashAlgorithm.SHA256, Stream.chunk(Chunk.array("The quick brown fox".getBytes)))

.make(acquire)(ctx => F.delay(EVP_MD_CTX_free(ctx)))
.evalTap { ctx =>
F.delay {
val `type` = EVP_get_digestbyname(toCString(algorithm)(zone))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought to free this string right after the call to EVP_get_digestbyname but the pointer appears to be stored on result struct

@mpilquist mpilquist marked this pull request as ready for review July 6, 2024 12:45
@mpilquist mpilquist changed the title Initial draft of new hashing package Replace fs2.hash with fs2.hashing.Hashing[F] Jul 6, 2024
@mpilquist
Copy link
Member Author

@armanbilge Thanks, addressed comments

@mpilquist mpilquist merged commit 41db6b0 into main Aug 5, 2024
31 checks passed
@mpilquist mpilquist deleted the topic/hashing branch August 5, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants