Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: GFNI Intrinsics #96170

Closed
MineCake147E opened this issue Dec 19, 2023 · 8 comments · Fixed by #109537
Closed

[API Proposal]: GFNI Intrinsics #96170

MineCake147E opened this issue Dec 19, 2023 · 8 comments · Fixed by #109537
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@MineCake147E
Copy link
Contributor

MineCake147E commented Dec 19, 2023

Background and motivation

GFNI is supported by Intel in the Ice Lake and newer architectures, and by AMD in Zen 4.
These instructions are known to be useful for cryptography and bit manipulations.
An efficient bit-reversal can be implemented with it.

API Proposal

namespace System.Runtime.Intrinsics.X86;

public abstract class Gfni : Sse41
{
    public static bool IsSupported { get; }

    public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);

    public abstract class X64 : Sse41.X64
    {
        public static bool IsSupported { get; }
    }

    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<byte> GaloisFieldAffineTransformInverse(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldAffineTransform(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldMultiply(Vector512<byte> left, Vector512<byte> right);
    }
}

API Usage

// https://wunkolo.github.io/post/2020/11/gf2p8affineqb-bit-reversal/
public static Vector128<byte> ReverseBits128(Vector128<byte> value)
{
    var xmm0 = Gfni.GaloisFieldAffineTransform(value, Vector128.Create(0b10000000_01000000_00100000_00010000_00001000_00000100_00000010_00000001ul).AsByte(), 0);
    return Ssse3.Shuffle(xmm0, Vector128.Create(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, byte.MinValue));
}

Alternative Designs

No response

Risks

No response

@MineCake147E MineCake147E added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 19, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 19, 2023
@ghost
Copy link

ghost commented Dec 19, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

GFNI is supported by Intel in the Ice Lake and newer architectures, and by AMD in Zen 4.
These instructions are known to be useful for cryptography and bit manipulations.
An efficient bit-reversal can be implemented with it.

API Proposal

namespace System.Runtime.Intrinsics.X86
{
    public abstract class Avx512Gfni : Avx512F
    {
        public static bool IsSupported { get; }
        public static Vector512<byte> GaloisFieldAffineTransformInverse(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldAffineTransform(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldMultiply(Vector512<byte> left, Vector512<byte> right);
        public abstract class VL : Avx512F.VL
        {
            public static new bool IsSupported { get; }
            public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
            public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
            public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
            public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
            public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
            public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);
        }
    }
    public abstract class AvxGfni : Avx
    {
        public static bool IsSupported { get; }
        public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
    }
    public abstract class Gfni : Sse41
    {
        public static bool IsSupported { get; }
        public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
        public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
        public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);
    }
}


### API Usage

```csharp
// https://wunkolo.github.io/post/2020/11/gf2p8affineqb-bit-reversal/
public static Vector128<byte> ReverseBits128(Vector128<byte> value)
{
    var xmm0 = Gfni.GaloisFieldAffineTransform(value, Vector128.Create(0b10000000_01000000_00100000_00010000_00001000_00000100_00000010_00000001ul).AsByte(), 0);
    return Ssse3.Shuffle(xmm0, Vector128.Create(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, byte.MinValue));
}

Alternative Designs

No response

Risks

No response

Author: MineCake147E
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics

Milestone: -

@PaulusParssinen
Copy link
Contributor

Here's more unexpected uses for the Galois Field Affine Transformation Instruction. collected by animetosho 👍

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Jan 3, 2024
@tannergooding tannergooding added this to the Future milestone Jan 3, 2024
@saucecontrol
Copy link
Member

saucecontrol commented Feb 1, 2024

Should these be named Gfni128, Gfni256, and Gfni512 to be consistent with Pclmulqdq256 and Pclmulqdq512? The ISA support flags work the same way with GFNI.

Same thing with Avx512F.VL overloads mirroring AvxGfni/Gfni256. They probably don't need to be there, as they were skipped with VPCLMULQDQ.

@terrajobst
Copy link
Contributor

terrajobst commented Feb 29, 2024

Video

  • Looks good as proposed
namespace System.Runtime.Intrinsics.X86;

public abstract class Avx512Gfni : Avx512F
{
    public static bool IsSupported { get; }

    public static Vector512<byte> GaloisFieldAffineTransformInverse(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
    public static Vector512<byte> GaloisFieldAffineTransform(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
    public static Vector512<byte> GaloisFieldMultiply(Vector512<byte> left, Vector512<byte> right);

    public abstract class VL : Avx512F.VL
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
        public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);
    }
}

public abstract class AvxGfni : Avx
{
    public static bool IsSupported { get; }

    public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
    public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
    public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
}

public abstract class Gfni : Sse41
{
    public static bool IsSupported { get; }

    public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 29, 2024
@saucecontrol
Copy link
Member

saucecontrol commented Oct 19, 2024

For consistency with the AVX10 surface (and #86952), this should probably be revised to

namespace System.Runtime.Intrinsics.X86;

public abstract class Gfni : Sse41
{
    public static bool IsSupported { get; }

    public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);

    public abstract class X64 : Sse41.X64
    {
        public static bool IsSupported { get; }
    }

    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<byte> GaloisFieldAffineTransformInverse(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldAffineTransform(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldMultiply(Vector512<byte> left, Vector512<byte> right);
    }
}

Also, the affine transform ops treat the second operand as an 8x8bit matrix and are named in the C intrinsics to indicate one operand is a vector of 64-bit values (e.g. _mm_gf2p8affine_epi64_epi8). It might make more sense to define those as Vector128<ulong>, etc for consistency. With names being x and a to match the C defs (although the matrix operand is capital A there), it can be difficult to remember which is which, but having one int64 and one int8 makes it more clear. And should there be signed overloads?

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels Oct 21, 2024
@bartonjs
Copy link
Member

bartonjs commented Oct 22, 2024

Video

  • [ConstantExpected] byte b should be [ConstantExpected] byte control
  • Otherwise, looks good as proposed
namespace System.Runtime.Intrinsics.X86;

public abstract class Gfni : Sse41
{
    public static bool IsSupported { get; }

    public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte control);
    public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<byte> a, [ConstantExpected] byte control);
    public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);

    public abstract class X64 : Sse41.X64
    {
        public static bool IsSupported { get; }
    }

    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte control);
        public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<byte> a, [ConstantExpected] byte control);
        public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<byte> GaloisFieldAffineTransformInverse(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte control);
        public static Vector512<byte> GaloisFieldAffineTransform(Vector512<byte> x, Vector512<byte> a, [ConstantExpected] byte control);
        public static Vector512<byte> GaloisFieldMultiply(Vector512<byte> left, Vector512<byte> right);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 22, 2024
@saucecontrol
Copy link
Member

I'll implement this one

@saucecontrol
Copy link
Member

saucecontrol commented Oct 26, 2024

I just got a chance to watch the API review video. It sounds like there was some confusion around the immediate operand for the affine instructions. The documentation defines the affine transform as producing each output byte from the formula A * x + b, where

  • A is an 8x8 bit matrix vector
  • x is a byte vector
  • b is defined as a constant vector, as if the immediate byte were broadcast to all positions

This doesn't fit the pattern of what we typically call a 'control' byte, which might select a lane for processing or give a permute order. Since it's an actual operand used in the mathematical definition in this case, it would be more clear if the name matched the documentation. It should be noted that this discussion was part of the API review for the original shape, when it was decided to keep the name b.

I also didn't hear any mention of the 8x8 matrix operand's type in the discussion. Typical use, as in the sample given in top issue, would have the same matrix for each 64-bit lane. Example repeated here:

// https://wunkolo.github.io/post/2020/11/gf2p8affineqb-bit-reversal/
public static Vector128<byte> ReverseBits128(Vector128<byte> value)
{
    var xmm0 = Gfni.GaloisFieldAffineTransform(value, Vector128.Create(0b10000000_01000000_00100000_00010000_00001000_00000100_00000010_00000001ul).AsByte(), 0);
    return Ssse3.Shuffle(xmm0, Vector128.Create(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, byte.MinValue));
}

Note that the sample creates the matrix vector by broadcast of a ulong and then calls AsByte(), where the cast ends up being noise. Likewise, the EVEX instruction encoding supports a 64-bit memory broadcast for the matrix operand. Between matching the documentation more closely and more closely matching the typical use of the instruction, I think it makes more sense to define that operand as VectorXXX<ulong> rather than VectorXXX<byte>.

Proposed shape would be:

namespace System.Runtime.Intrinsics.X86;

public abstract class Gfni : Sse2
{
    public static bool IsSupported { get; }

    public static Vector128<byte> GaloisFieldAffineTransformInverse(Vector128<byte> x, Vector128<ulong> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldAffineTransform(Vector128<byte> x, Vector128<ulong> a, [ConstantExpected] byte b);
    public static Vector128<byte> GaloisFieldMultiply(Vector128<byte> left, Vector128<byte> right);

    public abstract class X64 : Sse2.X64
    {
        public static bool IsSupported { get; }
    }

    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> GaloisFieldAffineTransformInverse(Vector256<byte> x, Vector256<ulong> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldAffineTransform(Vector256<byte> x, Vector256<ulong> a, [ConstantExpected] byte b);
        public static Vector256<byte> GaloisFieldMultiply(Vector256<byte> left, Vector256<byte> right);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<byte> GaloisFieldAffineTransformInverse(Vector512<byte> x, Vector512<ulong> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldAffineTransform(Vector512<byte> x, Vector512<ulong> a, [ConstantExpected] byte b);
        public static Vector512<byte> GaloisFieldMultiply(Vector512<byte> left, Vector512<byte> right);
    }
}

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Nov 5, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants