Could we support `Zvbc` (Carryless Multiplication) for SEW32? #309

JerryShih · 2023-03-30T07:50:12Z

If we could use zvbc with SEW32, we could get better performance for CRC32 algorithm.
We could still use the current SEW32 instructions, but we need the additional widening from SEW32 to 64 and masking the unused MSB parts.

nibrunieAtSi5 · 2023-04-04T18:33:07Z

I think you don't need the widening if you work 64-bit wide chunks and uses R = X^128 [P] where P is the irreducible 33-bit polynomial of the CRC 32 (I ignore bit/byte endianness issues).

R is a 32-bit value (static, only depends on the CRC) that is zero extended to a 64-bit element once and for all.

You need to compute a 128-bit accumulator: A_hi . X^64 + A_lo as follows:

A_hi = 0 // 64-bit
A_lo = 0 // 64-bit
for each 64-bit block Mi of the message
   A_hi = A_hi ^ Mi
   M_hi = vclmulh(A_hi, R)
   M_lo = vclmul(A_hi, R)
   A_hi = A_lo ^ M_hi
   A_lo = M_lo
end for

Eventually (at the end), the accumulator and the final 128-bit of the message needs to be properly padded and reduced

nibrunieAtSi5 · 2023-04-04T18:41:06Z

Here, the most useful instruction would be a vwclmul but I think the pair of vclmul / vclmulh should work fine (and this can even be vectorized by computing a vector of R_i = X^(128+VLEN*LMUL/64 - i + 1) and using them in the carry-less multiplications).

A smaller SEW may be less wasteful powerwise (the R * A_hi only uses part of the carry-less multiplier array, hopefully this can be handled by the uarch).

nibrunieAtSi5 · 2023-04-06T18:19:23Z

This was discussed during April 6th task group and it appears that this proposal could be quite useful in particular to allow 32-bit implementation (Zve32) to support vector carry-less multiply. So the TG is considering adding it to Zvbc.

(This is not an official message from the TG, just my transcript of my understanding of the meeting discussions)

kdockser · 2023-04-24T22:31:27Z

The final decision has been to leave the SEW=32 variant out of the current vector crypto specification. Should there be a need for this instruction for Zve32* implementations, we are leaving the option open to add them in a subsequent extension, perhaps as a FastTrack.

kdockser closed this as completed Apr 24, 2023

ebiggers mentioned this issue Feb 3, 2024

[extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs #362

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could we support `Zvbc` (Carryless Multiplication) for SEW32? #309

Could we support `Zvbc` (Carryless Multiplication) for SEW32? #309

JerryShih commented Mar 30, 2023

nibrunieAtSi5 commented Apr 4, 2023

nibrunieAtSi5 commented Apr 4, 2023

nibrunieAtSi5 commented Apr 6, 2023

kdockser commented Apr 24, 2023

Could we support Zvbc (Carryless Multiplication) for SEW32? #309

Could we support Zvbc (Carryless Multiplication) for SEW32? #309

Comments

JerryShih commented Mar 30, 2023

nibrunieAtSi5 commented Apr 4, 2023

nibrunieAtSi5 commented Apr 4, 2023

nibrunieAtSi5 commented Apr 6, 2023

kdockser commented Apr 24, 2023

Could we support `Zvbc` (Carryless Multiplication) for SEW32? #309

Could we support `Zvbc` (Carryless Multiplication) for SEW32? #309