Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could we support Zvbc (Carryless Multiplication) for SEW32? #309

Closed
JerryShih opened this issue Mar 30, 2023 · 4 comments
Closed

Could we support Zvbc (Carryless Multiplication) for SEW32? #309

JerryShih opened this issue Mar 30, 2023 · 4 comments

Comments

@JerryShih
Copy link

If we could use zvbc with SEW32, we could get better performance for CRC32 algorithm.
We could still use the current SEW32 instructions, but we need the additional widening from SEW32 to 64 and masking the unused MSB parts.

@nibrunieAtSi5
Copy link
Contributor

I think you don't need the widening if you work 64-bit wide chunks and uses R = X^128 [P] where P is the irreducible 33-bit polynomial of the CRC 32 (I ignore bit/byte endianness issues).

R is a 32-bit value (static, only depends on the CRC) that is zero extended to a 64-bit element once and for all.

You need to compute a 128-bit accumulator: A_hi . X^64 + A_lo as follows:

A_hi = 0 // 64-bit
A_lo = 0 // 64-bit
for each 64-bit block Mi of the message
   A_hi = A_hi ^ Mi
   M_hi = vclmulh(A_hi, R)
   M_lo = vclmul(A_hi, R)
   A_hi = A_lo ^ M_hi
   A_lo = M_lo
end for

CRC vclmul

Eventually (at the end), the accumulator and the final 128-bit of the message needs to be properly padded and reduced

@nibrunieAtSi5
Copy link
Contributor

Here, the most useful instruction would be a vwclmul but I think the pair of vclmul / vclmulh should work fine (and this can even be vectorized by computing a vector of R_i = X^(128+VLEN*LMUL/64 - i + 1) and using them in the carry-less multiplications).

A smaller SEW may be less wasteful powerwise (the R * A_hi only uses part of the carry-less multiplier array, hopefully this can be handled by the uarch).

@nibrunieAtSi5
Copy link
Contributor

This was discussed during April 6th task group and it appears that this proposal could be quite useful in particular to allow 32-bit implementation (Zve32) to support vector carry-less multiply. So the TG is considering adding it to Zvbc.

(This is not an official message from the TG, just my transcript of my understanding of the meeting discussions)

@kdockser
Copy link
Collaborator

The final decision has been to leave the SEW=32 variant out of the current vector crypto specification. Should there be a need for this instruction for Zve32* implementations, we are leaving the option open to add them in a subsequent extension, perhaps as a FastTrack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants