Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding Blake3 to registered types #819

Open
sargun opened this issue Jan 25, 2021 · 11 comments
Open

Consider adding Blake3 to registered types #819

sargun opened this issue Jan 25, 2021 · 11 comments

Comments

@sargun
Copy link

sargun commented Jan 25, 2021

I think we should consider adding blake3 to the registered types (https://github.com/opencontainers/image-spec/blob/79b036d80240ae530a8de15e1d21c7ab9292c693/descriptor.md#registered-algorithms). I propose the prefix b3-256.

@cyphar
Copy link
Member

cyphar commented Jan 25, 2021

We can add support for it (though b3 has many configurable hash lengths -- how do we pick the best one?), but I do think we also need to work on making sure that tooling which generates images supports mixed-digest images (and maybe that should be done first). It's not just a matter of being able to handle different hashing algorithms, it's that you need to have a way of either regenerating all hashes in an image tree with different digest algorithms or otherwise determining what is the best hash algorithm to use given an existing image that you're creating a modified version of.

For instance, given a descriptor tree full of SHA-256 digests you probably want to use SHA-256, but if there's a SHA-512 in there which should you use for new digests when creating a new image? Maybe you want to opt for the newest algorithm in use by an image digest tree? Either way I think most tools already don't handle SHA-512 in the most ideal way, so we need to work on that.

And finally it would be nice if we had a way of not having to duplicate the same underlying data if the hashing algorithm referring to it is different (layers being the most important example of this). But then again, tools can also be sufficiently clever about this (store an out-of-spec lookup table which knows the hash of each object under multiple hashing algorithms).

@cyphar
Copy link
Member

cyphar commented Jan 25, 2021

Also this will need probably need a https://github.com/opencontainers/go-digest PR first?

@sargun
Copy link
Author

sargun commented Jan 25, 2021

There seemed to be a circular nature between go-digest and image-spec. Mostly I'm starting the issue so I can get blake3 added to go-digest, but the process by which you get new hashes added to go-digest is to first add get them registered here.

I also wanted to start the conversation of what the prefix should be. I suggested b3-256 in my initial issue description, because b3 = blake3, 256-bit, where 256 bits is the default b3sum output.

@sargun
Copy link
Author

sargun commented Feb 15, 2021

I added a proposal to add blake3, where the default hash length is 256-bit as suggested in the spec.

@rchincha
Copy link

Previously, the motivation to add blake3 was not clear.
blake3 is very fast even on generic h/w and that should definitely help with large artifact production and verification.

@rchincha
Copy link

rchincha commented Jul 8, 2024

quick test
20 vCPU (Intel Xeon Gold 5218)

blake3 (parallel, so uses all available cores)

$ time b3sum test.img (100GiB)
253da5f3c5802f7a2c30b16a29ae4aa3830be8d5be57e31778087ea060f7def9 test.img
real 0m48.493s

sha256sum (can only use single core)

$ time sha256sum test.img
f0b14a8da7f1c48a0846647a078b97956edd8df451a62fc4b466879aa24d4fd7 test.img
real 10m49.152s

@rchincha
Copy link

@shizhMSFT
Copy link
Contributor

Switching to blake3 is indeed faster. I built a PoC of the oras tool, which uses blake3 by default (the PoC release v1.2.0-blake3 can be downloaded at here).

Here's a test run for oras pushing an 10G file to a local folder in OCI image layout.

$ truncate -s 10G 10G.bin
$ time oras push --oci-layout test-sha256:10G 10G.bin
✓ Uploaded  10G.bin                                                                              10/10 GB 100.00%    36s
  └─ sha256:732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d
✓ Uploaded  application/vnd.oci.empty.v1+json                                                      2/2  B 100.00%    2ms
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                                         595/595  B 100.00%    4ms
  └─ sha256:58f03d65ab562e6905c10e26cc9c48b8c95ac8d6db3b3ceb3d860fc2321f5848
Pushed [oci-layout] test-sha256:10G
ArtifactType: application/vnd.unknown.artifact.v1
Digest: sha256:58f03d65ab562e6905c10e26cc9c48b8c95ac8d6db3b3ceb3d860fc2321f5848

real    0m59.275s
user    0m44.369s
sys     0m20.767s
$ time oras_blake3 push --oci-layout test-blake3:10G 10G.bin
✓ Uploaded  10G.bin                                                                              10/10 GB 100.00%     9s
  └─ blake3:28960eef7d587ab6d1627b7efe30c7a07ce2dce4871d339fdfb607cb0776e064
✓ Uploaded  application/vnd.oci.empty.v1+json                                                      2/2  B 100.00%    2ms
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                                         595/595  B 100.00%     0s
  └─ blake3:cbee086b764e6912688269c2fdf2db8a454e0e07dd39c5601a7db1a79bd247a4
Pushed [oci-layout] test-blake3:10G
ArtifactType: application/vnd.unknown.artifact.v1
Digest: blake3:cbee086b764e6912688269c2fdf2db8a454e0e07dd39c5601a7db1a79bd247a4

real    0m12.392s
user    0m5.798s
sys     0m8.748s

As you can observe, blake3 is roughly 4x ~ 5x faster than sha256.

@shizhMSFT
Copy link
Contributor

Although blake3 is faster, the blake3 is not a NIST approved algorithm. Therefore, I have concerns to use blake3 in FIPS scenarios as well as signing scenarios.

@shizhMSFT
Copy link
Contributor

Note that the above test with the modified oras tool uses the blake3 implementation referenced in opencontainers/go-digest. That is, zeebo/blake3.

However, unlike the upstream rust implementation, the zeebo/blake3 implementation does not support multi-threading.

@rchincha
Copy link

@shizhMSFT thanks for the experiment. This is another great data point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants