Option 14: unholy alloy of gold and silver #4

corsix · 2022-08-01T15:15:10Z

With reference to https://www.corsix.org/content/fast-crc32c-4k, what I call crc32_4k is your option 12 ("8-byte Hardware-accelerated"), and what I call crc32_4k_three_way is your option 13 ("Golden"). The theoretical upper bound on option 13 is 64 bits/cycle, which your implementation gets close to, at 62 bits/cycle. What I realised is that:

There's an inferior option, that I call crc32_4k_pclmulqdq, but you might call "Silver".
Gold and silver use separate execution ports, and thus can be alloyed together, for a theoretical upper bound of 120.89 bits/cycle (this is 64+72 bytes every 9 cycles). I'm measuring 93 bits/cycle for this alloy, and I imagine that a well tuned implementation could get closer to 120.89.

The text was updated successfully, but these errors were encountered:

komrad36 · 2022-08-03T10:17:53Z

That's awesome. Alloyed, ha!
Given that there are other bottlenecks than just execution ports, like decode or just total uops scheduled/retired, I'm surprised it's possible to do anything with the remaining bandwidth in the processor. But I'll have to check this out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option 14: unholy alloy of gold and silver #4

Option 14: unholy alloy of gold and silver #4

corsix commented Aug 1, 2022

komrad36 commented Aug 3, 2022

Option 14: unholy alloy of gold and silver #4

Option 14: unholy alloy of gold and silver #4

Comments

corsix commented Aug 1, 2022

komrad36 commented Aug 3, 2022