This brings AVX-512 support on Linux.
Also adds a JMH benchmark pitting BLAKE3 against SHA2-256.
Results with `-f 1` (single JVM fork) and for `hashBytesOneShot` only:
<details>
<summary>Intel Core i5-8520U, Linux: BLAKE3 has ~8x the throughput on large inputs</summary>
<pre>
Benchmark (size) (type) Mode Cnt Score Error Units
BazelHashFunctionsBenchmark.hashBytesOneShot 1 BLAKE3 thrpt 5 3897193.109 ± 104089.759 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1 SHA2_256 thrpt 5 9773250.840 ± 919565.969 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16 BLAKE3 thrpt 5 4058401.127 ± 69345.382 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16 SHA2_256 thrpt 5 9338184.696 ± 575903.627 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 128 BLAKE3 thrpt 5 3883335.405 ± 197131.021 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 128 SHA2_256 thrpt 5 3931746.804 ± 111963.068 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 512 BLAKE3 thrpt 5 3165886.130 ± 105001.405 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 512 SHA2_256 thrpt 5 1689377.092 ± 67006.025 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1024 BLAKE3 thrpt 5 2137151.012 ± 71425.961 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1024 SHA2_256 thrpt 5 971335.403 ± 43622.796 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 4096 BLAKE3 thrpt 5 1266551.855 ± 77312.865 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 4096 SHA2_256 thrpt 5 271217.035 ± 15770.310 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16384 BLAKE3 thrpt 5 562124.458 ± 47243.736 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16384 SHA2_256 thrpt 5 72281.652 ± 10734.186 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1048576 BLAKE3 thrpt 5 9800.524 ± 230.269 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1048576 SHA2_256 thrpt 5 1124.542 ± 40.938 ops/s
</pre>
</details>
<details>
<summary>MacBook Pro with M3 Max, macOS: BLAKE3 has ~0.75x the throughput on large inputs</summary>
<pre>
Benchmark (size) (type) Mode Cnt Score Error Units
BazelHashFunctionsBenchmark.hashBytesOneShot 1 BLAKE3 thrpt 5 9262824.819 ± 12194.067 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1 SHA2_256 thrpt 5 76557346.275 ± 548738.127 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16 BLAKE3 thrpt 5 9254500.192 ± 22138.081 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16 SHA2_256 thrpt 5 81029076.629 ± 748425.519 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 128 BLAKE3 thrpt 5 8304084.839 ± 20398.724 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 128 SHA2_256 thrpt 5 4146027.256 ± 106648.234 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1024 BLAKE3 thrpt 5 3092086.580 ± 1301.806 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1024 SHA2_256 thrpt 5 9355426.285 ± 7352.032 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 4096 BLAKE3 thrpt 5 1670833.346 ± 1809.726 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 4096 SHA2_256 thrpt 5 2562509.914 ± 29303.110 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16384 BLAKE3 thrpt 5 484960.116 ± 146.961 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16384 SHA2_256 thrpt 5 658392.748 ± 3364.324 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1048576 BLAKE3 thrpt 5 7987.472 ± 19.194 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1048576 SHA2_256 thrpt 5 10380.444 ± 8.804 ops/s
</pre>
</details>
<details>
<summary>AMD Ryzen 7 PRO 5850U, Windows: BLAKE3 has ~1.5x the throughput on large inputs</summary>
<pre>
BazelHashFunctionsBenchmark.hashBytesOneShot 1 BLAKE3 thrpt 5 5569003,683 ± 125621,794 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1 SHA2_256 thrpt 5 21202138,257 ± 458127,205 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16 BLAKE3 thrpt 5 5539298,273 ± 77378,097 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16 SHA2_256 thrpt 5 21618815,496 ± 208338,556 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 128 BLAKE3 thrpt 5 5047579,827 ± 118690,537 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 128 SHA2_256 thrpt 5 15806244,512 ± 258848,826 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 512 BLAKE3 thrpt 5 3300538,392 ± 53754,778 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 512 SHA2_256 thrpt 5 8353887,852 ± 47076,094 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1024 BLAKE3 thrpt 5 2062144,084 ± 14557,116 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1024 SHA2_256 thrpt 5 5120693,705 ± 30640,599 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 4096 BLAKE3 thrpt 5 1437595,889 ± 34088,637 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 4096 SHA2_256 thrpt 5 1552307,356 ± 25584,819 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16384 BLAKE3 thrpt 5 558955,757 ± 8647,716 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 16384 SHA2_256 thrpt 5 411619,868 ± 1179,203 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1048576 BLAKE3 thrpt 5 9576,940 ± 460,875 ops/s
BazelHashFunctionsBenchmark.hashBytesOneShot 1048576 SHA2_256 thrpt 5 6470,682 ± 41,223 ops/s
</pre>
</details>
Closes bazelbuild#22017.
PiperOrigin-RevId: 628330908
Change-Id: Ic635027d020d60b79d2e498fcebb0cc42fae712b