Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Arm64: FEAT_SVE: firstfaulting #94004

Open
a74nh opened this issue Oct 26, 2023 · 17 comments
Open

[API Proposal]: Arm64: FEAT_SVE: firstfaulting #94004

a74nh opened this issue Oct 26, 2023 · 17 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support
Milestone

Comments

@a74nh
Copy link
Contributor

a74nh commented Oct 26, 2023

namespace System.Runtime.Intrinsics.Arm;

/// VectorT Summary
public abstract partial class Sve : AdvSimd /// Feature: FEAT_SVE  Category: firstfaulting
{

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1B

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* base, Vector<T> offsets); // LDFF1B

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* base, Vector<T2> offsets); // LDFF1B

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1B

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T2> offsets); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T> offsets); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T2> indices); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T> indices); // LDFF1W or LDFF1D

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1W or LDFF1D

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> bases, long index); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> bases, long index); // LDFF1W or LDFF1D

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T> offsets); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T2> offsets); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T> indices); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T2> indices); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long index); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long index); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T> offsets); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T2> offsets); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T> indices); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T2> indices); // LDFF1H

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1H

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long index); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long index); // LDFF1H

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T> offsets); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T2> offsets); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T> indices); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T2> indices); // LDFF1SW

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset); // LDFF1SW

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index); // LDFF1SW

  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T> offsets); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T2> offsets); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T> indices); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T2> indices); // LDFF1W

  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset); // LDFF1W

  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index); // LDFF1W

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1SB

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* base, Vector<T> offsets); // LDFF1SB

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* base, Vector<T2> offsets); // LDFF1SB

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1SB

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> GetFFR(); // RDFFR // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorByteZeroExtendFirstFaulting(byte* address); // LDFF1B // predicated

  /// T: float, double, sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorFirstFaulting(T* address); // LDFF1W or LDFF1D or LDFF1B or LDFF1H // predicated

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorInt16SignExtendFirstFaulting(short* address); // LDFF1SH // predicated

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address); // LDFF1H // predicated

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorInt32SignExtendFirstFaulting(int* address); // LDFF1SW // predicated

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorInt32ZeroExtendFirstFaulting(uint* address); // LDFF1W // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorSByteSignExtendFirstFaulting(sbyte* address); // LDFF1SB // predicated

  public static unsafe void SetFFR(); // SETFFR

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe void WriteFFR(Vector<T> value); // WRFFR

  /// total method signatures: 72

}

First Faulting Loads

Each First Faulting LoadVector method matches a non-faulting version in #94006.

Loads a vector element by element. If loading that element would cause a fault, then stop the load and set the first faulting register for each element that completed. If no elements fault, then the load completes as per the non-faulting version.

After calling a first faulting load, the first faulting mask should always be read to determine if the load completed.

GetFFR(), SetFFR() and WriteFFR() are used to access the ffr mask for the last first faulting load.

// Look for the 0 at the end of the buffer
long strlen(byte *buffer)
{
  long i = 0;
  Sve.SetFFR();

  while (1)
  {
    // Load data from the buffer. maskfrr contains the elements that were loaded.
    Vector<byte> data = Sve.LoadVectorFirstFaulting(buffer + i);
    Vector<byte> maskfrr = Sve.GetFFR(all);

    // Look for zeros in the loaded data   
    Vector<byte> zeromask = Sve.ConditionalSelect(maskfrr, Sve.CompareEquals(data, 0), Vector<byte>.Zero);

    if (Sve.ConditionalSelect(Sve.MaskTestAnyTrue(zeromask))) {
      // There was a zero in the loaded data. Increment up to the zero and exit the loop
      zeromask = Sve.BreakBefore(zeromask);
      i += Sve.GetActiveElementCount(zeromask);  
      break;
    } else if (Sve.MaskTestLastTrue(maskffr)) {
      // Final bit in the ffr mask is set, therefore the load completed. Increment and continue.
      i += Sve.Count8BitElements();
    } else {
      // The load faulted. Increment by the number of loaded elements and continue.
      Sve.SetFFR();
      i += Sve.GetActiveElementCount(maskfrr);
    }
  }  

  return i;
}
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 26, 2023
@ghost
Copy link

ghost commented Oct 26, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details
namespace System.Runtime.Intrinsics.Arm

/// VectorT Summary
public abstract class Sve : AdvSimd /// Feature: FEAT_SVE  Category: firstfaulting
{
  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(Vector<T2> bases); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(Vector<T> bases); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(const short *base, Vector<T> offsets); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(const short *base, Vector<T2> offsets); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(const short *base, Vector<T> indices); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(const short *base, Vector<T2> indices); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(Vector<T2> bases, long offset); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(Vector<T> bases, long offset); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(Vector<T2> bases, long index); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector16SignExtendFirstFaulting(Vector<T> bases, long index); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(Vector<T2> bases); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(Vector<T> bases); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(const ushort *base, Vector<T> offsets); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(const ushort *base, Vector<T2> offsets); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(const ushort *base, Vector<T> indices); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(const ushort *base, Vector<T2> indices); // LDFF1H

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(Vector<T2> bases, long offset); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(Vector<T> bases, long offset); // LDFF1H

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(Vector<T2> bases, long index); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector16ZeroExtendFirstFaulting(Vector<T> bases, long index); // LDFF1H

  public static unsafe Vector<long> GatherLoadVector32SignExtendFirstFaulting(Vector<ulong> bases);

  public static unsafe Vector<ulong> GatherLoadVector32SignExtendFirstFaulting(Vector<ulong> bases);

  /// T: long, ulong
  public static unsafe Vector<T> GatherLoadVector32SignExtendFirstFaulting(const int *base, Vector<T> offsets); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector32SignExtendFirstFaulting(const int *base, Vector<T2> offsets); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherLoadVector32SignExtendFirstFaulting(const int *base, Vector<T> indices); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector32SignExtendFirstFaulting(const int *base, Vector<T2> indices); // LDFF1SW

  public static unsafe Vector<long> GatherLoadVector32SignExtendFirstFaulting(Vector<ulong> bases, long offset);

  public static unsafe Vector<ulong> GatherLoadVector32SignExtendFirstFaulting(Vector<ulong> bases, long offset);

  public static unsafe Vector<long> GatherLoadVector32SignExtendFirstFaulting(Vector<ulong> bases, long index);

  public static unsafe Vector<ulong> GatherLoadVector32SignExtendFirstFaulting(Vector<ulong> bases, long index);

  public static unsafe Vector<long> GatherLoadVector32ZeroExtendFirstFaulting(Vector<ulong> bases);

  public static unsafe Vector<ulong> GatherLoadVector32ZeroExtendFirstFaulting(Vector<ulong> bases);

  /// T: long, ulong
  public static unsafe Vector<T> GatherLoadVector32ZeroExtendFirstFaulting(const uint *base, Vector<T> offsets); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector32ZeroExtendFirstFaulting(const uint *base, Vector<T2> offsets); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherLoadVector32ZeroExtendFirstFaulting(const uint *base, Vector<T> indices); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector32ZeroExtendFirstFaulting(const uint *base, Vector<T2> indices); // LDFF1W

  public static unsafe Vector<long> GatherLoadVector32ZeroExtendFirstFaulting(Vector<ulong> bases, long offset);

  public static unsafe Vector<ulong> GatherLoadVector32ZeroExtendFirstFaulting(Vector<ulong> bases, long offset);

  public static unsafe Vector<long> GatherLoadVector32ZeroExtendFirstFaulting(Vector<ulong> bases, long index);

  public static unsafe Vector<ulong> GatherLoadVector32ZeroExtendFirstFaulting(Vector<ulong> bases, long index);

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector8SignExtendFirstFaulting(Vector<T2> bases); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector8SignExtendFirstFaulting(Vector<T> bases); // LDFF1SB

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVector8SignExtendFirstFaulting(const sbyte *base, Vector<T> offsets); // LDFF1SB

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector8SignExtendFirstFaulting(const sbyte *base, Vector<T2> offsets); // LDFF1SB

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector8SignExtendFirstFaulting(Vector<T2> bases, long offset); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector8SignExtendFirstFaulting(Vector<T> bases, long offset); // LDFF1SB

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector8ZeroExtendFirstFaulting(Vector<T2> bases); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector8ZeroExtendFirstFaulting(Vector<T> bases); // LDFF1B

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVector8ZeroExtendFirstFaulting(const byte *base, Vector<T> offsets); // LDFF1B

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector8ZeroExtendFirstFaulting(const byte *base, Vector<T2> offsets); // LDFF1B

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherLoadVector8ZeroExtendFirstFaulting(Vector<T2> bases, long offset); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVector8ZeroExtendFirstFaulting(Vector<T> bases, long offset); // LDFF1B

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(Vector<T2> bases); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(Vector<T> bases); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(const T *base, Vector<T2> offsets); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(const T *base, Vector<T> offsets); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(const T *base, Vector<T2> indices); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(const T *base, Vector<T> indices); // LDFF1W or LDFF1D

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(Vector<T2> bases, long offset); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(Vector<T> bases, long offset); // LDFF1W or LDFF1D

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(Vector<T2> bases, long index); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherLoadVectorFirstFaulting(Vector<T> bases, long index); // LDFF1W or LDFF1D

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> GetFFR(); // RDFFR

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVector16SignExtendFirstFaulting(const short *base); // LDFF1SH

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVector16SignExtendFirstFaulting(const short *base, long vnum); // LDFF1SH

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVector16ZeroExtendFirstFaulting(const ushort *base); // LDFF1H

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVector16ZeroExtendFirstFaulting(const ushort *base, long vnum); // LDFF1H

  /// T: long, ulong
  public static unsafe Vector<T> LoadVector32SignExtendFirstFaulting(const int *base); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> LoadVector32SignExtendFirstFaulting(const int *base, long vnum); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> LoadVector32ZeroExtendFirstFaulting(const uint *base); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> LoadVector32ZeroExtendFirstFaulting(const uint *base, long vnum); // LDFF1W

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVector8SignExtendFirstFaulting(const sbyte *base); // LDFF1SB

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVector8SignExtendFirstFaulting(const sbyte *base, long vnum); // LDFF1SB

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVector8ZeroExtendFirstFaulting(const byte *base); // LDFF1B

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVector8ZeroExtendFirstFaulting(const byte *base, long vnum); // LDFF1B

  /// T: float, double, sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorFirstFaulting(const T *base); // LDFF1W or LDFF1D or LDFF1B or LDFF1H

  /// T: float, double, sbyte, short, int, byte, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorFirstFaulting(const T *base, long vnum); // LDFF1W or LDFF1D or LDFF1B or LDFF1H

  public static unsafe Vector<long> LoadVectorFirstFaulting(const long *base, long vnum);

  public static unsafe void SetFFR();

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe void WriteFFR(Vector<T> value); // WRFFR

  /// total method signatures: 80
}
Author: a74nh
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Oct 26, 2023

/// Full API
public abstract partial class Sve : AdvSimd /// Feature: FEAT_SVE  Category: firstfaulting
{
    /// GatherVectorByteZeroExtendFirstFaulting : Load 8-bit data and zero-extend, first-faulting

    /// svint32_t svldff1ub_gather[_u32base]_s32(svbool_t pg, svuint32_t bases) : "LDFF1B Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<int> GatherVectorByteZeroExtendFirstFaulting(Vector<int> mask, Vector<uint> bases);

    /// svuint32_t svldff1ub_gather[_u32base]_u32(svbool_t pg, svuint32_t bases) : "LDFF1B Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<uint> GatherVectorByteZeroExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases);

    /// svint64_t svldff1ub_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1B Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorByteZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1ub_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1B Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorByteZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svint32_t svldff1ub_gather_[s32]offset_s32(svbool_t pg, const uint8_t *base, svint32_t offsets) : "LDFF1B Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<int> GatherVectorByteZeroExtendFirstFaulting(Vector<int> mask, byte* base, Vector<int> offsets);

    /// svuint32_t svldff1ub_gather_[s32]offset_u32(svbool_t pg, const uint8_t *base, svint32_t offsets) : "LDFF1B Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<uint> GatherVectorByteZeroExtendFirstFaulting(Vector<uint> mask, byte* base, Vector<int> offsets);

    /// svint32_t svldff1ub_gather_[u32]offset_s32(svbool_t pg, const uint8_t *base, svuint32_t offsets) : "LDFF1B Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<int> GatherVectorByteZeroExtendFirstFaulting(Vector<int> mask, byte* base, Vector<uint> offsets);

    /// svuint32_t svldff1ub_gather_[u32]offset_u32(svbool_t pg, const uint8_t *base, svuint32_t offsets) : "LDFF1B Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorByteZeroExtendFirstFaulting(Vector<uint> mask, byte* base, Vector<uint> offsets);

    /// svint64_t svldff1ub_gather_[s64]offset_s64(svbool_t pg, const uint8_t *base, svint64_t offsets) : "LDFF1B Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorByteZeroExtendFirstFaulting(Vector<long> mask, byte* base, Vector<long> offsets);

    /// svuint64_t svldff1ub_gather_[s64]offset_u64(svbool_t pg, const uint8_t *base, svint64_t offsets) : "LDFF1B Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorByteZeroExtendFirstFaulting(Vector<ulong> mask, byte* base, Vector<long> offsets);

    /// svint64_t svldff1ub_gather_[u64]offset_s64(svbool_t pg, const uint8_t *base, svuint64_t offsets) : "LDFF1B Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorByteZeroExtendFirstFaulting(Vector<long> mask, byte* base, Vector<ulong> offsets);

    /// svuint64_t svldff1ub_gather_[u64]offset_u64(svbool_t pg, const uint8_t *base, svuint64_t offsets) : "LDFF1B Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorByteZeroExtendFirstFaulting(Vector<ulong> mask, byte* base, Vector<ulong> offsets);

    /// svint32_t svldff1ub_gather[_u32base]_offset_s32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1B Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1B Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorByteZeroExtendFirstFaulting(Vector<int> mask, Vector<uint> bases, long offset);

    /// svuint32_t svldff1ub_gather[_u32base]_offset_u32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1B Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1B Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorByteZeroExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases, long offset);

    /// svint64_t svldff1ub_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1B Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1B Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorByteZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1ub_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1B Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1B Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorByteZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);


    /// GatherVectorFirstFaulting : Unextended load, first-faulting

    /// svfloat32_t svldff1_gather[_u32base]_f32(svbool_t pg, svuint32_t bases) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, Vector<uint> bases);

    /// svint32_t svldff1_gather[_u32base]_s32(svbool_t pg, svuint32_t bases) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, Vector<uint> bases);

    /// svuint32_t svldff1_gather[_u32base]_u32(svbool_t pg, svuint32_t bases) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, Vector<uint> bases);

    /// svfloat64_t svldff1_gather[_u64base]_f64(svbool_t pg, svuint64_t bases) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, Vector<ulong> bases);

    /// svint64_t svldff1_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svfloat32_t svldff1_gather_[s32]offset[_f32](svbool_t pg, const float32_t *base, svint32_t offsets) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, float* base, Vector<int> offsets);

    /// svint32_t svldff1_gather_[s32]offset[_s32](svbool_t pg, const int32_t *base, svint32_t offsets) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, int* base, Vector<int> offsets);

    /// svuint32_t svldff1_gather_[s32]offset[_u32](svbool_t pg, const uint32_t *base, svint32_t offsets) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, uint* base, Vector<int> offsets);

    /// svfloat32_t svldff1_gather_[u32]offset[_f32](svbool_t pg, const float32_t *base, svuint32_t offsets) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, float* base, Vector<uint> offsets);

    /// svint32_t svldff1_gather_[u32]offset[_s32](svbool_t pg, const int32_t *base, svuint32_t offsets) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, int* base, Vector<uint> offsets);

    /// svuint32_t svldff1_gather_[u32]offset[_u32](svbool_t pg, const uint32_t *base, svuint32_t offsets) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, uint* base, Vector<uint> offsets);

    /// svfloat64_t svldff1_gather_[s64]offset[_f64](svbool_t pg, const float64_t *base, svint64_t offsets) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, double* base, Vector<long> offsets);

    /// svint64_t svldff1_gather_[s64]offset[_s64](svbool_t pg, const int64_t *base, svint64_t offsets) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, long* base, Vector<long> offsets);

    /// svuint64_t svldff1_gather_[s64]offset[_u64](svbool_t pg, const uint64_t *base, svint64_t offsets) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, ulong* base, Vector<long> offsets);

    /// svfloat64_t svldff1_gather_[u64]offset[_f64](svbool_t pg, const float64_t *base, svuint64_t offsets) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, double* base, Vector<ulong> offsets);

    /// svint64_t svldff1_gather_[u64]offset[_s64](svbool_t pg, const int64_t *base, svuint64_t offsets) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, long* base, Vector<ulong> offsets);

    /// svuint64_t svldff1_gather_[u64]offset[_u64](svbool_t pg, const uint64_t *base, svuint64_t offsets) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, ulong* base, Vector<ulong> offsets);

    /// svfloat32_t svldff1_gather_[s32]index[_f32](svbool_t pg, const float32_t *base, svint32_t indices) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #2]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, float* base, Vector<int> indices);

    /// svint32_t svldff1_gather_[s32]index[_s32](svbool_t pg, const int32_t *base, svint32_t indices) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #2]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, int* base, Vector<int> indices);

    /// svuint32_t svldff1_gather_[s32]index[_u32](svbool_t pg, const uint32_t *base, svint32_t indices) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #2]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, uint* base, Vector<int> indices);

    /// svfloat32_t svldff1_gather_[u32]index[_f32](svbool_t pg, const float32_t *base, svuint32_t indices) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #2]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, float* base, Vector<uint> indices);

    /// svint32_t svldff1_gather_[u32]index[_s32](svbool_t pg, const int32_t *base, svuint32_t indices) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #2]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, int* base, Vector<uint> indices);

    /// svuint32_t svldff1_gather_[u32]index[_u32](svbool_t pg, const uint32_t *base, svuint32_t indices) : "LDFF1W Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #2]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, uint* base, Vector<uint> indices);

    /// svfloat64_t svldff1_gather_[s64]index[_f64](svbool_t pg, const float64_t *base, svint64_t indices) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, double* base, Vector<long> indices);

    /// svint64_t svldff1_gather_[s64]index[_s64](svbool_t pg, const int64_t *base, svint64_t indices) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, long* base, Vector<long> indices);

    /// svuint64_t svldff1_gather_[s64]index[_u64](svbool_t pg, const uint64_t *base, svint64_t indices) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, ulong* base, Vector<long> indices);

    /// svfloat64_t svldff1_gather_[u64]index[_f64](svbool_t pg, const float64_t *base, svuint64_t indices) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, double* base, Vector<ulong> indices);

    /// svint64_t svldff1_gather_[u64]index[_s64](svbool_t pg, const int64_t *base, svuint64_t indices) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, long* base, Vector<ulong> indices);

    /// svuint64_t svldff1_gather_[u64]index[_u64](svbool_t pg, const uint64_t *base, svuint64_t indices) : "LDFF1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, ulong* base, Vector<ulong> indices);

    /// svfloat32_t svldff1_gather[_u32base]_offset_f32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1W Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, Vector<uint> bases, long offset);

    /// svint32_t svldff1_gather[_u32base]_offset_s32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1W Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, Vector<uint> bases, long offset);

    /// svuint32_t svldff1_gather[_u32base]_offset_u32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1W Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, Vector<uint> bases, long offset);

    /// svfloat64_t svldff1_gather[_u64base]_offset_f64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1D Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, Vector<ulong> bases, long offset);

    /// svint64_t svldff1_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1D Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1D Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);

    /// svfloat32_t svldff1_gather[_u32base]_index_f32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #index * 4]" or "LDFF1W Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<float> GatherVectorFirstFaulting(Vector<float> mask, Vector<uint> bases, long index);

    /// svint32_t svldff1_gather[_u32base]_index_s32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #index * 4]" or "LDFF1W Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorFirstFaulting(Vector<int> mask, Vector<uint> bases, long index);

    /// svuint32_t svldff1_gather[_u32base]_index_u32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1W Zresult.S, Pg/Z, [Zbases.S, #index * 4]" or "LDFF1W Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorFirstFaulting(Vector<uint> mask, Vector<uint> bases, long index);

    /// svfloat64_t svldff1_gather[_u64base]_index_f64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #index * 8]" or "LDFF1D Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<double> GatherVectorFirstFaulting(Vector<double> mask, Vector<ulong> bases, long index);

    /// svint64_t svldff1_gather[_u64base]_index_s64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #index * 8]" or "LDFF1D Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index);

    /// svuint64_t svldff1_gather[_u64base]_index_u64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1D Zresult.D, Pg/Z, [Zbases.D, #index * 8]" or "LDFF1D Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index);


    /// GatherVectorInt16SignExtendFirstFaulting : Load 16-bit data and sign-extend, first-faulting

    /// svint32_t svldff1sh_gather[_u32base]_s32(svbool_t pg, svuint32_t bases) : "LDFF1SH Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, Vector<uint> bases);

    /// svuint32_t svldff1sh_gather[_u32base]_u32(svbool_t pg, svuint32_t bases) : "LDFF1SH Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases);

    /// svint64_t svldff1sh_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1SH Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1sh_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1SH Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svint32_t svldff1sh_gather_[s32]offset_s32(svbool_t pg, const int16_t *base, svint32_t offsets) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, short* base, Vector<int> offsets);

    /// svuint32_t svldff1sh_gather_[s32]offset_u32(svbool_t pg, const int16_t *base, svint32_t offsets) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, short* base, Vector<int> offsets);

    /// svint32_t svldff1sh_gather_[u32]offset_s32(svbool_t pg, const int16_t *base, svuint32_t offsets) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, short* base, Vector<uint> offsets);

    /// svuint32_t svldff1sh_gather_[u32]offset_u32(svbool_t pg, const int16_t *base, svuint32_t offsets) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, short* base, Vector<uint> offsets);

    /// svint64_t svldff1sh_gather_[s64]offset_s64(svbool_t pg, const int16_t *base, svint64_t offsets) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, short* base, Vector<long> offsets);

    /// svuint64_t svldff1sh_gather_[s64]offset_u64(svbool_t pg, const int16_t *base, svint64_t offsets) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, short* base, Vector<long> offsets);

    /// svint64_t svldff1sh_gather_[u64]offset_s64(svbool_t pg, const int16_t *base, svuint64_t offsets) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, short* base, Vector<ulong> offsets);

    /// svuint64_t svldff1sh_gather_[u64]offset_u64(svbool_t pg, const int16_t *base, svuint64_t offsets) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, short* base, Vector<ulong> offsets);

    /// svint32_t svldff1sh_gather_[s32]index_s32(svbool_t pg, const int16_t *base, svint32_t indices) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #1]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, short* base, Vector<int> indices);

    /// svuint32_t svldff1sh_gather_[s32]index_u32(svbool_t pg, const int16_t *base, svint32_t indices) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #1]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, short* base, Vector<int> indices);

    /// svint32_t svldff1sh_gather_[u32]index_s32(svbool_t pg, const int16_t *base, svuint32_t indices) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #1]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, short* base, Vector<uint> indices);

    /// svuint32_t svldff1sh_gather_[u32]index_u32(svbool_t pg, const int16_t *base, svuint32_t indices) : "LDFF1SH Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #1]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, short* base, Vector<uint> indices);

    /// svint64_t svldff1sh_gather_[s64]index_s64(svbool_t pg, const int16_t *base, svint64_t indices) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, short* base, Vector<long> indices);

    /// svuint64_t svldff1sh_gather_[s64]index_u64(svbool_t pg, const int16_t *base, svint64_t indices) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, short* base, Vector<long> indices);

    /// svint64_t svldff1sh_gather_[u64]index_s64(svbool_t pg, const int16_t *base, svuint64_t indices) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, short* base, Vector<ulong> indices);

    /// svuint64_t svldff1sh_gather_[u64]index_u64(svbool_t pg, const int16_t *base, svuint64_t indices) : "LDFF1SH Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, short* base, Vector<ulong> indices);

    /// svint32_t svldff1sh_gather[_u32base]_offset_s32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1SH Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1SH Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, Vector<uint> bases, long offset);

    /// svuint32_t svldff1sh_gather[_u32base]_offset_u32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1SH Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1SH Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases, long offset);

    /// svint64_t svldff1sh_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1SH Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1SH Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1sh_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1SH Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1SH Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);

    /// svint32_t svldff1sh_gather[_u32base]_index_s32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1SH Zresult.S, Pg/Z, [Zbases.S, #index * 2]" or "LDFF1SH Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorInt16SignExtendFirstFaulting(Vector<int> mask, Vector<uint> bases, long index);

    /// svuint32_t svldff1sh_gather[_u32base]_index_u32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1SH Zresult.S, Pg/Z, [Zbases.S, #index * 2]" or "LDFF1SH Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorInt16SignExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases, long index);

    /// svint64_t svldff1sh_gather[_u64base]_index_s64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1SH Zresult.D, Pg/Z, [Zbases.D, #index * 2]" or "LDFF1SH Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt16SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index);

    /// svuint64_t svldff1sh_gather[_u64base]_index_u64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1SH Zresult.D, Pg/Z, [Zbases.D, #index * 2]" or "LDFF1SH Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt16SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index);


    /// GatherVectorInt16ZeroExtendFirstFaulting : Load 16-bit data and zero-extend, first-faulting

    /// svint32_t svldff1uh_gather[_u32base]_s32(svbool_t pg, svuint32_t bases) : "LDFF1H Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, Vector<uint> bases);

    /// svuint32_t svldff1uh_gather[_u32base]_u32(svbool_t pg, svuint32_t bases) : "LDFF1H Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases);

    /// svint64_t svldff1uh_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1H Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1uh_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1H Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svint32_t svldff1uh_gather_[s32]offset_s32(svbool_t pg, const uint16_t *base, svint32_t offsets) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, ushort* base, Vector<int> offsets);

    /// svuint32_t svldff1uh_gather_[s32]offset_u32(svbool_t pg, const uint16_t *base, svint32_t offsets) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, ushort* base, Vector<int> offsets);

    /// svint32_t svldff1uh_gather_[u32]offset_s32(svbool_t pg, const uint16_t *base, svuint32_t offsets) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, ushort* base, Vector<uint> offsets);

    /// svuint32_t svldff1uh_gather_[u32]offset_u32(svbool_t pg, const uint16_t *base, svuint32_t offsets) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, ushort* base, Vector<uint> offsets);

    /// svint64_t svldff1uh_gather_[s64]offset_s64(svbool_t pg, const uint16_t *base, svint64_t offsets) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, ushort* base, Vector<long> offsets);

    /// svuint64_t svldff1uh_gather_[s64]offset_u64(svbool_t pg, const uint16_t *base, svint64_t offsets) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, ushort* base, Vector<long> offsets);

    /// svint64_t svldff1uh_gather_[u64]offset_s64(svbool_t pg, const uint16_t *base, svuint64_t offsets) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, ushort* base, Vector<ulong> offsets);

    /// svuint64_t svldff1uh_gather_[u64]offset_u64(svbool_t pg, const uint16_t *base, svuint64_t offsets) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, ushort* base, Vector<ulong> offsets);

    /// svint32_t svldff1uh_gather_[s32]index_s32(svbool_t pg, const uint16_t *base, svint32_t indices) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #1]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, ushort* base, Vector<int> indices);

    /// svuint32_t svldff1uh_gather_[s32]index_u32(svbool_t pg, const uint16_t *base, svint32_t indices) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #1]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, ushort* base, Vector<int> indices);

    /// svint32_t svldff1uh_gather_[u32]index_s32(svbool_t pg, const uint16_t *base, svuint32_t indices) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #1]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, ushort* base, Vector<uint> indices);

    /// svuint32_t svldff1uh_gather_[u32]index_u32(svbool_t pg, const uint16_t *base, svuint32_t indices) : "LDFF1H Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #1]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, ushort* base, Vector<uint> indices);

    /// svint64_t svldff1uh_gather_[s64]index_s64(svbool_t pg, const uint16_t *base, svint64_t indices) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, ushort* base, Vector<long> indices);

    /// svuint64_t svldff1uh_gather_[s64]index_u64(svbool_t pg, const uint16_t *base, svint64_t indices) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, ushort* base, Vector<long> indices);

    /// svint64_t svldff1uh_gather_[u64]index_s64(svbool_t pg, const uint16_t *base, svuint64_t indices) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, ushort* base, Vector<ulong> indices);

    /// svuint64_t svldff1uh_gather_[u64]index_u64(svbool_t pg, const uint16_t *base, svuint64_t indices) : "LDFF1H Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #1]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, ushort* base, Vector<ulong> indices);

    /// svint32_t svldff1uh_gather[_u32base]_offset_s32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1H Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1H Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, Vector<uint> bases, long offset);

    /// svuint32_t svldff1uh_gather[_u32base]_offset_u32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1H Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1H Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases, long offset);

    /// svint64_t svldff1uh_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1H Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1H Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1uh_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1H Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1H Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);

    /// svint32_t svldff1uh_gather[_u32base]_index_s32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1H Zresult.S, Pg/Z, [Zbases.S, #index * 2]" or "LDFF1H Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorInt16ZeroExtendFirstFaulting(Vector<int> mask, Vector<uint> bases, long index);

    /// svuint32_t svldff1uh_gather[_u32base]_index_u32(svbool_t pg, svuint32_t bases, int64_t index) : "LDFF1H Zresult.S, Pg/Z, [Zbases.S, #index * 2]" or "LDFF1H Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorInt16ZeroExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases, long index);

    /// svint64_t svldff1uh_gather[_u64base]_index_s64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1H Zresult.D, Pg/Z, [Zbases.D, #index * 2]" or "LDFF1H Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt16ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index);

    /// svuint64_t svldff1uh_gather[_u64base]_index_u64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1H Zresult.D, Pg/Z, [Zbases.D, #index * 2]" or "LDFF1H Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt16ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index);


    /// GatherVectorInt32SignExtendFirstFaulting : Load 32-bit data and sign-extend, first-faulting

    /// svint64_t svldff1sw_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1SW Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1sw_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1SW Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svint64_t svldff1sw_gather_[s64]offset_s64(svbool_t pg, const int32_t *base, svint64_t offsets) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, int* base, Vector<long> offsets);

    /// svuint64_t svldff1sw_gather_[s64]offset_u64(svbool_t pg, const int32_t *base, svint64_t offsets) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, int* base, Vector<long> offsets);

    /// svint64_t svldff1sw_gather_[u64]offset_s64(svbool_t pg, const int32_t *base, svuint64_t offsets) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, int* base, Vector<ulong> offsets);

    /// svuint64_t svldff1sw_gather_[u64]offset_u64(svbool_t pg, const int32_t *base, svuint64_t offsets) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, int* base, Vector<ulong> offsets);

    /// svint64_t svldff1sw_gather_[s64]index_s64(svbool_t pg, const int32_t *base, svint64_t indices) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, int* base, Vector<long> indices);

    /// svuint64_t svldff1sw_gather_[s64]index_u64(svbool_t pg, const int32_t *base, svint64_t indices) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, int* base, Vector<long> indices);

    /// svint64_t svldff1sw_gather_[u64]index_s64(svbool_t pg, const int32_t *base, svuint64_t indices) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, int* base, Vector<ulong> indices);

    /// svuint64_t svldff1sw_gather_[u64]index_u64(svbool_t pg, const int32_t *base, svuint64_t indices) : "LDFF1SW Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, int* base, Vector<ulong> indices);

    /// svint64_t svldff1sw_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1SW Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1SW Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1sw_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1SW Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1SW Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);

    /// svint64_t svldff1sw_gather[_u64base]_index_s64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1SW Zresult.D, Pg/Z, [Zbases.D, #index * 4]" or "LDFF1SW Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index);

    /// svuint64_t svldff1sw_gather[_u64base]_index_u64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1SW Zresult.D, Pg/Z, [Zbases.D, #index * 4]" or "LDFF1SW Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index);


    /// GatherVectorInt32ZeroExtendFirstFaulting : Load 32-bit data and zero-extend, first-faulting

    /// svint64_t svldff1uw_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1W Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1uw_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1W Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svint64_t svldff1uw_gather_[s64]offset_s64(svbool_t pg, const uint32_t *base, svint64_t offsets) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, uint* base, Vector<long> offsets);

    /// svuint64_t svldff1uw_gather_[s64]offset_u64(svbool_t pg, const uint32_t *base, svint64_t offsets) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, uint* base, Vector<long> offsets);

    /// svint64_t svldff1uw_gather_[u64]offset_s64(svbool_t pg, const uint32_t *base, svuint64_t offsets) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, uint* base, Vector<ulong> offsets);

    /// svuint64_t svldff1uw_gather_[u64]offset_u64(svbool_t pg, const uint32_t *base, svuint64_t offsets) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, uint* base, Vector<ulong> offsets);

    /// svint64_t svldff1uw_gather_[s64]index_s64(svbool_t pg, const uint32_t *base, svint64_t indices) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, uint* base, Vector<long> indices);

    /// svuint64_t svldff1uw_gather_[s64]index_u64(svbool_t pg, const uint32_t *base, svint64_t indices) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, uint* base, Vector<long> indices);

    /// svint64_t svldff1uw_gather_[u64]index_s64(svbool_t pg, const uint32_t *base, svuint64_t indices) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, uint* base, Vector<ulong> indices);

    /// svuint64_t svldff1uw_gather_[u64]index_u64(svbool_t pg, const uint32_t *base, svuint64_t indices) : "LDFF1W Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #2]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, uint* base, Vector<ulong> indices);

    /// svint64_t svldff1uw_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1W Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1W Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1uw_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1W Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1W Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);

    /// svint64_t svldff1uw_gather[_u64base]_index_s64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1W Zresult.D, Pg/Z, [Zbases.D, #index * 4]" or "LDFF1W Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index);

    /// svuint64_t svldff1uw_gather[_u64base]_index_u64(svbool_t pg, svuint64_t bases, int64_t index) : "LDFF1W Zresult.D, Pg/Z, [Zbases.D, #index * 4]" or "LDFF1W Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index);


    /// GatherVectorSByteSignExtendFirstFaulting : Load 8-bit data and sign-extend, first-faulting

    /// svint32_t svldff1sb_gather[_u32base]_s32(svbool_t pg, svuint32_t bases) : "LDFF1SB Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<int> GatherVectorSByteSignExtendFirstFaulting(Vector<int> mask, Vector<uint> bases);

    /// svuint32_t svldff1sb_gather[_u32base]_u32(svbool_t pg, svuint32_t bases) : "LDFF1SB Zresult.S, Pg/Z, [Zbases.S, #0]"
  public static unsafe Vector<uint> GatherVectorSByteSignExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases);

    /// svint64_t svldff1sb_gather[_u64base]_s64(svbool_t pg, svuint64_t bases) : "LDFF1SB Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<long> GatherVectorSByteSignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases);

    /// svuint64_t svldff1sb_gather[_u64base]_u64(svbool_t pg, svuint64_t bases) : "LDFF1SB Zresult.D, Pg/Z, [Zbases.D, #0]"
  public static unsafe Vector<ulong> GatherVectorSByteSignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases);

    /// svint32_t svldff1sb_gather_[s32]offset_s32(svbool_t pg, const int8_t *base, svint32_t offsets) : "LDFF1SB Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<int> GatherVectorSByteSignExtendFirstFaulting(Vector<int> mask, sbyte* base, Vector<int> offsets);

    /// svuint32_t svldff1sb_gather_[s32]offset_u32(svbool_t pg, const int8_t *base, svint32_t offsets) : "LDFF1SB Zresult.S, Pg/Z, [Xbase, Zoffsets.S, SXTW]"
  public static unsafe Vector<uint> GatherVectorSByteSignExtendFirstFaulting(Vector<uint> mask, sbyte* base, Vector<int> offsets);

    /// svint32_t svldff1sb_gather_[u32]offset_s32(svbool_t pg, const int8_t *base, svuint32_t offsets) : "LDFF1SB Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<int> GatherVectorSByteSignExtendFirstFaulting(Vector<int> mask, sbyte* base, Vector<uint> offsets);

    /// svuint32_t svldff1sb_gather_[u32]offset_u32(svbool_t pg, const int8_t *base, svuint32_t offsets) : "LDFF1SB Zresult.S, Pg/Z, [Xbase, Zoffsets.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorSByteSignExtendFirstFaulting(Vector<uint> mask, sbyte* base, Vector<uint> offsets);

    /// svint64_t svldff1sb_gather_[s64]offset_s64(svbool_t pg, const int8_t *base, svint64_t offsets) : "LDFF1SB Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorSByteSignExtendFirstFaulting(Vector<long> mask, sbyte* base, Vector<long> offsets);

    /// svuint64_t svldff1sb_gather_[s64]offset_u64(svbool_t pg, const int8_t *base, svint64_t offsets) : "LDFF1SB Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorSByteSignExtendFirstFaulting(Vector<ulong> mask, sbyte* base, Vector<long> offsets);

    /// svint64_t svldff1sb_gather_[u64]offset_s64(svbool_t pg, const int8_t *base, svuint64_t offsets) : "LDFF1SB Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<long> GatherVectorSByteSignExtendFirstFaulting(Vector<long> mask, sbyte* base, Vector<ulong> offsets);

    /// svuint64_t svldff1sb_gather_[u64]offset_u64(svbool_t pg, const int8_t *base, svuint64_t offsets) : "LDFF1SB Zresult.D, Pg/Z, [Xbase, Zoffsets.D]"
  public static unsafe Vector<ulong> GatherVectorSByteSignExtendFirstFaulting(Vector<ulong> mask, sbyte* base, Vector<ulong> offsets);

    /// svint32_t svldff1sb_gather[_u32base]_offset_s32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1SB Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1SB Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<int> GatherVectorSByteSignExtendFirstFaulting(Vector<int> mask, Vector<uint> bases, long offset);

    /// svuint32_t svldff1sb_gather[_u32base]_offset_u32(svbool_t pg, svuint32_t bases, int64_t offset) : "LDFF1SB Zresult.S, Pg/Z, [Zbases.S, #offset]" or "LDFF1SB Zresult.S, Pg/Z, [Xoffset, Zbases.S, UXTW]"
  public static unsafe Vector<uint> GatherVectorSByteSignExtendFirstFaulting(Vector<uint> mask, Vector<uint> bases, long offset);

    /// svint64_t svldff1sb_gather[_u64base]_offset_s64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1SB Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1SB Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<long> GatherVectorSByteSignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset);

    /// svuint64_t svldff1sb_gather[_u64base]_offset_u64(svbool_t pg, svuint64_t bases, int64_t offset) : "LDFF1SB Zresult.D, Pg/Z, [Zbases.D, #offset]" or "LDFF1SB Zresult.D, Pg/Z, [Xoffset, Zbases.D]"
  public static unsafe Vector<ulong> GatherVectorSByteSignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset);


    /// GetFFR : Read FFR, returning predicate of succesfully loaded elements

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<sbyte> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<short> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<int> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<long> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<byte> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<ushort> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<uint> GetFFR();

    /// svbool_t svrdffr() : "RDFFR Presult.B"
    /// svbool_t svrdffr_z(svbool_t pg) : "RDFFR Presult.B, Pg/Z"
  public static unsafe Vector<ulong> GetFFR();


    /// LoadVectorByteZeroExtendFirstFaulting : Load 8-bit data and zero-extend, first-faulting

    /// svint16_t svldff1ub_s16(svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.H, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.H, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<short> LoadVectorByteZeroExtendFirstFaulting(byte* address);

    /// svint32_t svldff1ub_s32(svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.S, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.S, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<int> LoadVectorByteZeroExtendFirstFaulting(byte* address);

    /// svint64_t svldff1ub_s64(svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.D, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.D, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<long> LoadVectorByteZeroExtendFirstFaulting(byte* address);

    /// svuint16_t svldff1ub_u16(svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.H, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.H, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<ushort> LoadVectorByteZeroExtendFirstFaulting(byte* address);

    /// svuint32_t svldff1ub_u32(svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.S, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.S, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<uint> LoadVectorByteZeroExtendFirstFaulting(byte* address);

    /// svuint64_t svldff1ub_u64(svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.D, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.D, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<ulong> LoadVectorByteZeroExtendFirstFaulting(byte* address);


    /// LoadVectorFirstFaulting : Unextended load, first-faulting

    /// svfloat32_t svldff1[_f32](svbool_t pg, const float32_t *base) : "LDFF1W Zresult.S, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1W Zresult.S, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<float> LoadVectorFirstFaulting(float* address);

    /// svfloat64_t svldff1[_f64](svbool_t pg, const float64_t *base) : "LDFF1D Zresult.D, Pg/Z, [Xarray, Xindex, LSL #3]" or "LDFF1D Zresult.D, Pg/Z, [Xbase, XZR, LSL #3]"
  public static unsafe Vector<double> LoadVectorFirstFaulting(double* address);

    /// svint8_t svldff1[_s8](svbool_t pg, const int8_t *base) : "LDFF1B Zresult.B, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.B, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<sbyte> LoadVectorFirstFaulting(sbyte* address);

    /// svint16_t svldff1[_s16](svbool_t pg, const int16_t *base) : "LDFF1H Zresult.H, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1H Zresult.H, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<short> LoadVectorFirstFaulting(short* address);

    /// svint32_t svldff1[_s32](svbool_t pg, const int32_t *base) : "LDFF1W Zresult.S, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1W Zresult.S, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<int> LoadVectorFirstFaulting(int* address);

    /// svint64_t svldff1[_s64](svbool_t pg, const int64_t *base) : "LDFF1D Zresult.D, Pg/Z, [Xarray, Xindex, LSL #3]" or "LDFF1D Zresult.D, Pg/Z, [Xbase, XZR, LSL #3]"
  public static unsafe Vector<long> LoadVectorFirstFaulting(long* address);

    /// svuint8_t svldff1[_u8](svbool_t pg, const uint8_t *base) : "LDFF1B Zresult.B, Pg/Z, [Xarray, Xindex]" or "LDFF1B Zresult.B, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<byte> LoadVectorFirstFaulting(byte* address);

    /// svuint16_t svldff1[_u16](svbool_t pg, const uint16_t *base) : "LDFF1H Zresult.H, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1H Zresult.H, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<ushort> LoadVectorFirstFaulting(ushort* address);

    /// svuint32_t svldff1[_u32](svbool_t pg, const uint32_t *base) : "LDFF1W Zresult.S, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1W Zresult.S, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<uint> LoadVectorFirstFaulting(uint* address);

    /// svuint64_t svldff1[_u64](svbool_t pg, const uint64_t *base) : "LDFF1D Zresult.D, Pg/Z, [Xarray, Xindex, LSL #3]" or "LDFF1D Zresult.D, Pg/Z, [Xbase, XZR, LSL #3]"
  public static unsafe Vector<ulong> LoadVectorFirstFaulting(ulong* address);


    /// LoadVectorInt16SignExtendFirstFaulting : Load 16-bit data and sign-extend, first-faulting

    /// svint32_t svldff1sh_s32(svbool_t pg, const int16_t *base) : "LDFF1SH Zresult.S, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1SH Zresult.S, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<int> LoadVectorInt16SignExtendFirstFaulting(short* address);

    /// svint64_t svldff1sh_s64(svbool_t pg, const int16_t *base) : "LDFF1SH Zresult.D, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1SH Zresult.D, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<long> LoadVectorInt16SignExtendFirstFaulting(short* address);

    /// svuint32_t svldff1sh_u32(svbool_t pg, const int16_t *base) : "LDFF1SH Zresult.S, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1SH Zresult.S, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<uint> LoadVectorInt16SignExtendFirstFaulting(short* address);

    /// svuint64_t svldff1sh_u64(svbool_t pg, const int16_t *base) : "LDFF1SH Zresult.D, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1SH Zresult.D, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<ulong> LoadVectorInt16SignExtendFirstFaulting(short* address);


    /// LoadVectorInt16ZeroExtendFirstFaulting : Load 16-bit data and zero-extend, first-faulting

    /// svint32_t svldff1uh_s32(svbool_t pg, const uint16_t *base) : "LDFF1H Zresult.S, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1H Zresult.S, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<int> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address);

    /// svint64_t svldff1uh_s64(svbool_t pg, const uint16_t *base) : "LDFF1H Zresult.D, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1H Zresult.D, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<long> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address);

    /// svuint32_t svldff1uh_u32(svbool_t pg, const uint16_t *base) : "LDFF1H Zresult.S, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1H Zresult.S, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<uint> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address);

    /// svuint64_t svldff1uh_u64(svbool_t pg, const uint16_t *base) : "LDFF1H Zresult.D, Pg/Z, [Xarray, Xindex, LSL #1]" or "LDFF1H Zresult.D, Pg/Z, [Xbase, XZR, LSL #1]"
  public static unsafe Vector<ulong> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address);


    /// LoadVectorInt32SignExtendFirstFaulting : Load 32-bit data and sign-extend, first-faulting

    /// svint64_t svldff1sw_s64(svbool_t pg, const int32_t *base) : "LDFF1SW Zresult.D, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1SW Zresult.D, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<long> LoadVectorInt32SignExtendFirstFaulting(int* address);

    /// svuint64_t svldff1sw_u64(svbool_t pg, const int32_t *base) : "LDFF1SW Zresult.D, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1SW Zresult.D, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<ulong> LoadVectorInt32SignExtendFirstFaulting(int* address);


    /// LoadVectorInt32ZeroExtendFirstFaulting : Load 32-bit data and zero-extend, first-faulting

    /// svint64_t svldff1uw_s64(svbool_t pg, const uint32_t *base) : "LDFF1W Zresult.D, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1W Zresult.D, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<long> LoadVectorInt32ZeroExtendFirstFaulting(uint* address);

    /// svuint64_t svldff1uw_u64(svbool_t pg, const uint32_t *base) : "LDFF1W Zresult.D, Pg/Z, [Xarray, Xindex, LSL #2]" or "LDFF1W Zresult.D, Pg/Z, [Xbase, XZR, LSL #2]"
  public static unsafe Vector<ulong> LoadVectorInt32ZeroExtendFirstFaulting(uint* address);


    /// LoadVectorSByteSignExtendFirstFaulting : Load 8-bit data and sign-extend, first-faulting

    /// svint16_t svldff1sb_s16(svbool_t pg, const int8_t *base) : "LDFF1SB Zresult.H, Pg/Z, [Xarray, Xindex]" or "LDFF1SB Zresult.H, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<short> LoadVectorSByteSignExtendFirstFaulting(sbyte* address);

    /// svint32_t svldff1sb_s32(svbool_t pg, const int8_t *base) : "LDFF1SB Zresult.S, Pg/Z, [Xarray, Xindex]" or "LDFF1SB Zresult.S, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<int> LoadVectorSByteSignExtendFirstFaulting(sbyte* address);

    /// svint64_t svldff1sb_s64(svbool_t pg, const int8_t *base) : "LDFF1SB Zresult.D, Pg/Z, [Xarray, Xindex]" or "LDFF1SB Zresult.D, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<long> LoadVectorSByteSignExtendFirstFaulting(sbyte* address);

    /// svuint16_t svldff1sb_u16(svbool_t pg, const int8_t *base) : "LDFF1SB Zresult.H, Pg/Z, [Xarray, Xindex]" or "LDFF1SB Zresult.H, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<ushort> LoadVectorSByteSignExtendFirstFaulting(sbyte* address);

    /// svuint32_t svldff1sb_u32(svbool_t pg, const int8_t *base) : "LDFF1SB Zresult.S, Pg/Z, [Xarray, Xindex]" or "LDFF1SB Zresult.S, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<uint> LoadVectorSByteSignExtendFirstFaulting(sbyte* address);

    /// svuint64_t svldff1sb_u64(svbool_t pg, const int8_t *base) : "LDFF1SB Zresult.D, Pg/Z, [Xarray, Xindex]" or "LDFF1SB Zresult.D, Pg/Z, [Xbase, XZR]"
  public static unsafe Vector<ulong> LoadVectorSByteSignExtendFirstFaulting(sbyte* address);


    /// SetFFR : Initialize the first-fault register to all-true

    /// void svsetffr() : "SETFFR"
  public static unsafe void SetFFR();


    /// WriteFFR : Write to the first-fault register

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<sbyte> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<short> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<int> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<long> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<byte> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<ushort> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<uint> value);

    /// void svwrffr(svbool_t op) : "WRFFR Pop.B"
  public static unsafe void WriteFFR(Vector<ulong> value);


  /// total method signatures: 209
  /// total method names:      17
}

@a74nh
Copy link
Contributor Author

a74nh commented Oct 26, 2023

  /// Rejected:
  ///   public static unsafe Vector<short> LoadVectorByteZeroExtendFirstFaulting(byte* address, long vnum); // svldff1ub_vnum_s16
  ///   public static unsafe Vector<int> LoadVectorByteZeroExtendFirstFaulting(byte* address, long vnum); // svldff1ub_vnum_s32
  ///   public static unsafe Vector<long> LoadVectorByteZeroExtendFirstFaulting(byte* address, long vnum); // svldff1ub_vnum_s64
  ///   public static unsafe Vector<ushort> LoadVectorByteZeroExtendFirstFaulting(byte* address, long vnum); // svldff1ub_vnum_u16
  ///   public static unsafe Vector<uint> LoadVectorByteZeroExtendFirstFaulting(byte* address, long vnum); // svldff1ub_vnum_u32
  ///   public static unsafe Vector<ulong> LoadVectorByteZeroExtendFirstFaulting(byte* address, long vnum); // svldff1ub_vnum_u64
  ///   public static unsafe Vector<float> LoadVectorFirstFaulting(float* address, long vnum); // svldff1_vnum[_f32]
  ///   public static unsafe Vector<double> LoadVectorFirstFaulting(double* address, long vnum); // svldff1_vnum[_f64]
  ///   public static unsafe Vector<sbyte> LoadVectorFirstFaulting(sbyte* address, long vnum); // svldff1_vnum[_s8]
  ///   public static unsafe Vector<short> LoadVectorFirstFaulting(short* address, long vnum); // svldff1_vnum[_s16]
  ///   public static unsafe Vector<int> LoadVectorFirstFaulting(int* address, long vnum); // svldff1_vnum[_s32]
  ///   public static unsafe Vector<long> LoadVectorFirstFaulting(long* address, long vnum); // svldff1_vnum[_s64]
  ///   public static unsafe Vector<byte> LoadVectorFirstFaulting(byte* address, long vnum); // svldff1_vnum[_u8]
  ///   public static unsafe Vector<ushort> LoadVectorFirstFaulting(ushort* address, long vnum); // svldff1_vnum[_u16]
  ///   public static unsafe Vector<uint> LoadVectorFirstFaulting(uint* address, long vnum); // svldff1_vnum[_u32]
  ///   public static unsafe Vector<ulong> LoadVectorFirstFaulting(ulong* address, long vnum); // svldff1_vnum[_u64]
  ///   public static unsafe Vector<int> LoadVectorInt16SignExtendFirstFaulting(short* address, long vnum); // svldff1sh_vnum_s32
  ///   public static unsafe Vector<long> LoadVectorInt16SignExtendFirstFaulting(short* address, long vnum); // svldff1sh_vnum_s64
  ///   public static unsafe Vector<uint> LoadVectorInt16SignExtendFirstFaulting(short* address, long vnum); // svldff1sh_vnum_u32
  ///   public static unsafe Vector<ulong> LoadVectorInt16SignExtendFirstFaulting(short* address, long vnum); // svldff1sh_vnum_u64
  ///   public static unsafe Vector<int> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address, long vnum); // svldff1uh_vnum_s32
  ///   public static unsafe Vector<long> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address, long vnum); // svldff1uh_vnum_s64
  ///   public static unsafe Vector<uint> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address, long vnum); // svldff1uh_vnum_u32
  ///   public static unsafe Vector<ulong> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address, long vnum); // svldff1uh_vnum_u64
  ///   public static unsafe Vector<long> LoadVectorInt32SignExtendFirstFaulting(int* address, long vnum); // svldff1sw_vnum_s64
  ///   public static unsafe Vector<ulong> LoadVectorInt32SignExtendFirstFaulting(int* address, long vnum); // svldff1sw_vnum_u64
  ///   public static unsafe Vector<long> LoadVectorInt32ZeroExtendFirstFaulting(uint* address, long vnum); // svldff1uw_vnum_s64
  ///   public static unsafe Vector<ulong> LoadVectorInt32ZeroExtendFirstFaulting(uint* address, long vnum); // svldff1uw_vnum_u64
  ///   public static unsafe Vector<short> LoadVectorSByteSignExtendFirstFaulting(sbyte* address, long vnum); // svldff1sb_vnum_s16
  ///   public static unsafe Vector<int> LoadVectorSByteSignExtendFirstFaulting(sbyte* address, long vnum); // svldff1sb_vnum_s32
  ///   public static unsafe Vector<long> LoadVectorSByteSignExtendFirstFaulting(sbyte* address, long vnum); // svldff1sb_vnum_s64
  ///   public static unsafe Vector<ushort> LoadVectorSByteSignExtendFirstFaulting(sbyte* address, long vnum); // svldff1sb_vnum_u16
  ///   public static unsafe Vector<uint> LoadVectorSByteSignExtendFirstFaulting(sbyte* address, long vnum); // svldff1sb_vnum_u32
  ///   public static unsafe Vector<ulong> LoadVectorSByteSignExtendFirstFaulting(sbyte* address, long vnum); // svldff1sb_vnum_u64
  ///   Total Rejected: 34

  /// Total ACLE covered across API:      251

@a74nh
Copy link
Contributor Author

a74nh commented Oct 26, 2023

This contributes to #93095

It covers instructions in FEAT_SVE related to first faulting loads.

This list was auto generated from the C ACLE for SVE, and is in three parts:

The methods list reduced down to Vector versions. All possible varaints of T are given above the method.
The complete list of all methods. The corresponding ACLE methods and SVE instructions are given above the method.
All rejected ACLE methods. These are methods we have agreed that do not need including in C#.
Where possible, existing C# naming conventions have been matched.

Many of the C functions include predicate argument(s), of type svbool_t as the first argument. These are missing from the C# method. It is expected that the Jit will create predicates where required, or combine with uses of conditionalSelect(). For more discussion see #88140 comment.

@tannergooding tannergooding added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 26, 2023
@tannergooding tannergooding added this to the 9.0.0 milestone Oct 26, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 26, 2023
@tannergooding tannergooding added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 26, 2023
@ghost
Copy link

ghost commented Oct 26, 2023

This issue has been marked needs-author-action and may be missing some important information.

@tannergooding
Copy link
Member

Same general comments as the Gather/Load and Scatter/Store API proposals.

These probably need a small explanation of how they work and what SetFFR and WriteFFR do, how they are expected to be used, and the implications of a user not resetting the FFR when done.

@a74nh
Copy link
Contributor Author

a74nh commented Oct 31, 2023

I've taken an existing C strlen() in SVE, and rewritten it in C#. Both to show the use of FFR and to check it works with conditional selects.

// Look for the 0 at the end of the buffer
long strlen(byte *buffer)
{
  long i = 0;
  Sve.SetFFR();

  while (1)
  {
    // Load data from the buffer. maskfrr contains the elements that were loaded.
    Vector<byte> data = Sve.LoadVectorFirstFaulting(buffer + i);
    Vector<byte> maskfrr = Sve.GetFFR(all);

    // Look for zeros in the loaded data   
    Vector<byte> zeromask = Sve.ConditionalSelect(maskfrr, Sve.CompareEquals(data, 0), Vector<byte>.Zero);

    if (Sve.ConditionalSelect(Sve.svptest_any(zeromask))) {  // No API for svptest_any yet
      // There was a zero in the loaded data. Increment up to the zero and exit the loop
      zeromask = BreakBefore(zeromask);
      i += GetActiveElementCount(zeromask);  
      break;
    } else if (Sve.svptest_last(maskffr)) { // No API for svptest_last yet
      // Final bit in the ffr mask is set, therefore the load completed. Increment and continue.
      i += Count8BitElements();
    } else {
      // The load faulted. Increment by the number of loaded elements and continue.
      Sve.SetFFR();
      i += GetActiveElementCount(maskfrr);
    }
  }  

  return i;
}

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Oct 31, 2023
@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Oct 31, 2023
@terrajobst
Copy link
Member

terrajobst commented Feb 1, 2024

  • T* base => T* address
  • Vector<T> bases => Vector<T> addresses
  • Instructions taking offset(s) need WithByteOffset
  • The instructions with long offset are removed in favor of existing API
  • Int16ZeroExtend (et al) become UInt16ZeroExtend (et al)
  • GatherVectorSByteSignExtendFirstFaulting with an indices/offsets parameter: make sure these are consistently named (either indices or offsets as they're the same for 8-bit) across all 8-bit loading functions.
  • GetFFR => GetFfr. The initialism is preserved because in the domain where someone will use SVE directly, they will know the register as "FFR", not "First Fault Register"
  • WriteFFR(value) => SetFfr(value).
  • SetFFR() would be SetFfr(), but it was erased in favor of JIT/AoT pattern matching.
namespace System.Runtime.Intrinsics.Arm;

/// VectorT Summary
public abstract partial class Sve : AdvSimd /// Feature: FEAT_SVE  Category: firstfaulting
{

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T> addresses); // LDFF1B

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorByteWithByteOffsetZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T> offsets); // LDFF1B

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteWithByteOffsetZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> addresses); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorWithByteOffsetFirstFaulting(Vector<T> mask, T* address, Vector<T2> offsets); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorWithByteOffsetFirstFaulting(Vector<T> mask, T* address, Vector<T> offsets); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* address, Vector<T2> indices); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* address, Vector<T> indices); // LDFF1W or LDFF1D

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> addresses); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16WithByteOffsetSignExtendFirstFaulting(Vector<T> mask, short* address, Vector<T> offsets); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16WithByteOffsetSignExtendFirstFaulting(Vector<T> mask, short* address, Vector<T2> offsets); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* address, Vector<T> indices); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* address, Vector<T2> indices); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> addresses); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorUInt16WithByteOffsetZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T> offsets); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorUInt16WithByteOffsetZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> offsets); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T> indices); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> indices); // LDFF1H

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> addresses); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> addresses); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32WithByteOffsetSignExtendFirstFaulting(Vector<T> mask, int* address, Vector<T> offsets); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32WithByteOffsetSignExtendFirstFaulting(Vector<T> mask, int* address, Vector<T2> offsets); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* address, Vector<T> indices); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* address, Vector<T2> indices); // LDFF1SW

  public static unsafe Vector<long> GatherVectorUInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> addresses); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorUInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> addresses); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorUInt32WithByteOffsetZeroExtendFirstFaulting(Vector<T> mask, uint* address, Vector<T> offsets); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorUInt32WithByteOffsetZeroExtendFirstFaulting(Vector<T> mask, uint* address, Vector<T2> offsets); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorUInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* address, Vector<T> indices); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorUInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* address, Vector<T2> indices); // LDFF1W

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T> addresses); // LDFF1SB

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* address, Vector<T> offsets); // LDFF1SB

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* address, Vector<T2> offsets); // LDFF1SB

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> GetFfr(); // RDFFR // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorByteZeroExtendFirstFaulting(byte* address); // LDFF1B // predicated

  /// T: float, double, sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorFirstFaulting(T* address); // LDFF1W or LDFF1D or LDFF1B or LDFF1H // predicated

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorInt16SignExtendFirstFaulting(short* address); // LDFF1SH // predicated

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorUInt16ZeroExtendFirstFaulting(ushort* address); // LDFF1H // predicated

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorInt32SignExtendFirstFaulting(int* address); // LDFF1SW // predicated

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorUInt32ZeroExtendFirstFaulting(uint* address); // LDFF1W // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorSByteSignExtendFirstFaulting(sbyte* address); // LDFF1SB // predicated

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe void SetFfr(Vector<T> value); // WRFFR

  /// total method signatures: 72

}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 1, 2024
@a74nh
Copy link
Contributor Author

a74nh commented Feb 5, 2024

One missing change: LoadVectorInt16ZeroExtendFirstFaulting needs a U, it should be LoadVectorUInt16ZeroExtendFirstFaulting

@ghost
Copy link

ghost commented Feb 8, 2024

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details
namespace System.Runtime.Intrinsics.Arm;

/// VectorT Summary
public abstract partial class Sve : AdvSimd /// Feature: FEAT_SVE  Category: firstfaulting
{

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1B

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* base, Vector<T> offsets); // LDFF1B

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* base, Vector<T2> offsets); // LDFF1B

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1B

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1B

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T2> offsets); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T> offsets); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T2> indices); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* base, Vector<T> indices); // LDFF1W or LDFF1D

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1W or LDFF1D

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> bases, long index); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> bases, long index); // LDFF1W or LDFF1D

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T> offsets); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T2> offsets); // LDFF1SH

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T> indices); // LDFF1SH

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* base, Vector<T2> indices); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long index); // LDFF1SH

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long index); // LDFF1SH

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T> offsets); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T2> offsets); // LDFF1H

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T> indices); // LDFF1H

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* base, Vector<T2> indices); // LDFF1H

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1H

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long index); // LDFF1H

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long index); // LDFF1H

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T> offsets); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T2> offsets); // LDFF1SW

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T> indices); // LDFF1SW

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* base, Vector<T2> indices); // LDFF1SW

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset); // LDFF1SW

  public static unsafe Vector<long> GatherVectorInt32SignExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index); // LDFF1SW

  public static unsafe Vector<ulong> GatherVectorInt32SignExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index); // LDFF1SW

  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T> offsets); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T2> offsets); // LDFF1W

  /// T: long, ulong
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T> indices); // LDFF1W

  /// T: [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* base, Vector<T2> indices); // LDFF1W

  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long offset); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long offset); // LDFF1W

  public static unsafe Vector<long> GatherVectorInt32ZeroExtendFirstFaulting(Vector<long> mask, Vector<ulong> bases, long index); // LDFF1W

  public static unsafe Vector<ulong> GatherVectorInt32ZeroExtendFirstFaulting(Vector<ulong> mask, Vector<ulong> bases, long index); // LDFF1W

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T> bases); // LDFF1SB

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* base, Vector<T> offsets); // LDFF1SB

  /// T: [uint, int], [int, uint], [ulong, long], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* base, Vector<T2> offsets); // LDFF1SB

  /// T: [int, uint], [long, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T2> bases, long offset); // LDFF1SB

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T> bases, long offset); // LDFF1SB

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> GetFFR(); // RDFFR // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorByteZeroExtendFirstFaulting(byte* address); // LDFF1B // predicated

  /// T: float, double, sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorFirstFaulting(T* address); // LDFF1W or LDFF1D or LDFF1B or LDFF1H // predicated

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorInt16SignExtendFirstFaulting(short* address); // LDFF1SH // predicated

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorInt16ZeroExtendFirstFaulting(ushort* address); // LDFF1H // predicated

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorInt32SignExtendFirstFaulting(int* address); // LDFF1SW // predicated

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorInt32ZeroExtendFirstFaulting(uint* address); // LDFF1W // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorSByteSignExtendFirstFaulting(sbyte* address); // LDFF1SB // predicated

  public static unsafe void SetFFR(); // SETFFR

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe void WriteFFR(Vector<T> value); // WRFFR

  /// total method signatures: 72

}

First Faulting Loads

Each First Faulting LoadVector method matches a non-faulting version in #94006.

Loads a vector element by element. If loading that element would cause a fault, then stop the load and set the first faulting register for each element that completed. If no elements fault, then the load completes as per the non-faulting version.

After calling a first faulting load, the first faulting mask should always be read to determine if the load completed.

GetFFR(), SetFFR() and WriteFFR() are used to access the ffr mask for the last first faulting load.

// Look for the 0 at the end of the buffer
long strlen(byte *buffer)
{
  long i = 0;
  Sve.SetFFR();

  while (1)
  {
    // Load data from the buffer. maskfrr contains the elements that were loaded.
    Vector<byte> data = Sve.LoadVectorFirstFaulting(buffer + i);
    Vector<byte> maskfrr = Sve.GetFFR(all);

    // Look for zeros in the loaded data   
    Vector<byte> zeromask = Sve.ConditionalSelect(maskfrr, Sve.CompareEquals(data, 0), Vector<byte>.Zero);

    if (Sve.ConditionalSelect(Sve.MaskTestAnyTrue(zeromask))) {
      // There was a zero in the loaded data. Increment up to the zero and exit the loop
      zeromask = Sve.BreakBefore(zeromask);
      i += Sve.GetActiveElementCount(zeromask);  
      break;
    } else if (Sve.MaskTestLastTrue(maskffr)) {
      // Final bit in the ffr mask is set, therefore the load completed. Increment and continue.
      i += Sve.Count8BitElements();
    } else {
      // The load faulted. Increment by the number of loaded elements and continue.
      Sve.SetFFR();
      i += Sve.GetActiveElementCount(maskfrr);
    }
  }  

  return i;
}
Author: a74nh
Assignees: -
Labels:

api-approved, area-System.Runtime.Intrinsics

Milestone: 9.0.0

@tannergooding
Copy link
Member

Fixed the missing U

@a74nh
Copy link
Contributor Author

a74nh commented Feb 9, 2024

After looking at the gather section, I think these are wrong in first faulting:

  • GatherVector*WithByteOffsetZeroExtendFirstFaulting needs removing
  • Some of the loads need additional versions with int and uint versions
  • Masks need adding to LoadVector*ExtendFirstFaulting

My scripting thinks it should now look like:

(collapsed section)
namespace System.Runtime.Intrinsics.Arm;

/// VectorT Summary
public abstract partial class Sve : AdvSimd /// Feature: FEAT_SVE  Category: firstfaulting
{

  /// T: [int, uint], [uint, uint], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1B

  /// T: [int, int], [uint, int], [int, uint], [uint, uint], [long, long], [ulong, long], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B

  /// T: [float, uint], [int, uint], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1W or LDFF1D

  /// T: uint, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, Vector<T> addresses); // LDFF1W or LDFF1D

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* address, Vector<T2> indices); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorFirstFaulting(Vector<T> mask, T* address, Vector<T> indices); // LDFF1W or LDFF1D

  /// T: [int, uint], [uint, uint], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1SH

  /// T: [int, int], [uint, int], [int, uint], [uint, uint], [long, long], [ulong, long], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* address, Vector<T2> indices); // LDFF1SH

  /// T: [int, int], [uint, int], [int, uint], [uint, uint], [long, long], [ulong, long], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorInt16WithByteOffsetsSignExtendFirstFaulting(Vector<T> mask, short* address, Vector<T2> offsets); // LDFF1SH

  /// T: [long, ulong], [int, uint], [ulong, ulong], [uint, uint]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1SW

  /// T: [long, long], [int, int], [ulong, long], [uint, int], [long, ulong], [int, uint], [ulong, ulong], [uint, uint]
  public static unsafe Vector<T> GatherVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* address, Vector<T2> indices); // LDFF1SW

  /// T: [long, long], [int, int], [ulong, long], [uint, int], [long, ulong], [int, uint], [ulong, ulong], [uint, uint]
  public static unsafe Vector<T> GatherVectorInt32WithByteOffsetsSignExtendFirstFaulting(Vector<T> mask, int* address, Vector<T2> offsets); // LDFF1SW

  /// T: [int, uint], [uint, uint], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1SB

  /// T: [int, int], [uint, int], [int, uint], [uint, uint], [long, long], [ulong, long], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* address, Vector<T2> offsets); // LDFF1SB

  /// T: [int, int], [uint, int], [int, uint], [uint, uint], [long, long], [ulong, long], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorUInt16WithByteOffsetsZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> offsets); // LDFF1H

  /// T: [int, uint], [uint, uint], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1H

  /// T: [int, int], [uint, int], [int, uint], [uint, uint], [long, long], [ulong, long], [long, ulong], [ulong, ulong]
  public static unsafe Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> indices); // LDFF1H

  /// T: [long, long], [int, int], [ulong, long], [uint, int], [long, ulong], [int, uint], [ulong, ulong], [uint, uint]
  public static unsafe Vector<T> GatherVectorUInt32WithByteOffsetsZeroExtendFirstFaulting(Vector<T> mask, uint* address, Vector<T2> offsets); // LDFF1W

  /// T: [long, ulong], [int, uint], [ulong, ulong], [uint, uint]
  public static unsafe Vector<T> GatherVectorUInt32ZeroExtendFirstFaulting(Vector<T> mask, Vector<T2> addresses); // LDFF1W

  /// T: [long, long], [int, int], [ulong, long], [uint, int], [long, ulong], [int, uint], [ulong, ulong], [uint, uint]
  public static unsafe Vector<T> GatherVectorUInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* address, Vector<T2> indices); // LDFF1W

  /// T: [float, int], [uint, int], [float, uint], [int, uint], [double, long], [ulong, long], [double, ulong], [long, ulong]
  public static unsafe Vector<T> GatherVectorWithByteOffsetFirstFaulting(Vector<T> mask, T* address, Vector<T2> offsets); // LDFF1W or LDFF1D

  /// T: int, uint, long, ulong
  public static unsafe Vector<T> GatherVectorWithByteOffsetFirstFaulting(Vector<T> mask, T* address, Vector<T> offsets); // LDFF1W or LDFF1D

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> GetFfr(); // RDFFR // predicated

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address); // LDFF1B

  /// T: float, double, sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorFirstFaulting(Vector<T> mask, T* address); // LDFF1W or LDFF1D or LDFF1B or LDFF1H

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorInt16SignExtendFirstFaulting(Vector<T> mask, short* address); // LDFF1SH

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorInt32SignExtendFirstFaulting(Vector<T> mask, int* address); // LDFF1SW

  /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorSByteSignExtendFirstFaulting(Vector<T> mask, sbyte* address); // LDFF1SB

  /// T: int, long, uint, ulong
  public static unsafe Vector<T> LoadVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address); // LDFF1H

  /// T: long, ulong
  public static unsafe Vector<T> LoadVectorUInt32ZeroExtendFirstFaulting(Vector<T> mask, uint* address); // LDFF1W

  /// T: sbyte, short, int, long, byte, ushort, uint, ulong
  public static unsafe void SetFfr(Vector<T> value); // WRFFR

  /// total method signatures: 31

}

@tannergooding
Copy link
Member

tannergooding commented Feb 9, 2024

My scripting thinks it should now look like:

I expect we are going to need to manually validate the correct combinations as they are added. While the tool is convenient for getting a general API shape emitted, I believe it is also missing some nuance in places and it makes it hard to know how correct we are or not.

For example, LDFF1B has seemingly 3 versions:

  • LDFF1B (scalar plus scalar)
  • LDFF1B (scalar plus vector)
  • LDFF1B (vector plus immediate)

LDFF1B (scalar plus scalar)

Contiguous load with first-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector

The general signature is LDFF1B { <Zt>.* }, <Pg>/Z, [<Xn|SP>{, <Xm>}] where

  • <Zt> - The register to merge with, available via ConditionalSelect(mask, GatherVectorByteFirstFaulting(mask, address), Zt)
  • <Pg> - mask
  • <Xn|SP> - address
  • <Xm> - offset

It has encodings for 8, 16, 32, and 64-bit elements. It functionally looks like:

/// T: byte
public static Vector<byte> LoadVectorByteFirstFaulting(Vector<byte> mask, byte* address, nuint offset = 0); // LDFF1B: 8-bit

/// T: short, ushort
public static Vector<T> LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0); // LDFF1B: 16-bit

/// T: int, uint
public static Vector<T> LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0); // LDFF1B: 32-bit

/// T: long, ulong
public static Vector<T> LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0); // LDFF1B: 64-bit

LDFF1B (scalar plus vector)

Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended
from 32 to 64 bits. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero
in the destination vector.

The general signature is LDFF1B { <Zt>.* }, <Pg>/Z, [<Xn|SP>, <Zm>.*, <mod>] where

  • <Zt> - The register to merge with, available via ConditionalSelect(mask, GatherVectorByteFirstFaulting(mask, address, offsets), Zt)
  • <Pg> - mask
  • <Xn|SP> - address
  • <Zm> - offsets
  • mod - 0 for UXTW, 1 for SXTW (indicating whether offsets is signed or unsigned`)

It has encodings for 32-bit unpacked unscaled offset, 32-bit unscaled offset, and 64-bit unscaled offset. It functionally looks like:

/// [T, T2]: [long, int], [long, uint], [ulong, int], [ulong, uint]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 32-bit unpacked unscaled offset

/// [T, T2]: [int, int], [int, uint], [uint, int], [uint, uint]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 32-bit unscaled offset

/// [T, T2]: [long, long], [long, ulong], [ulong, long], [ulong, ulong]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 64-bit unscaled offset

I do not see an encoding that supports loading an int using long offsets.

LDFF1B (vector plus immediate)

Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will
not cause a read from Device memory or signal faults, and are set to zero in the destination vector.

The general signature is LDFF1B { <Zt>.* }, <Pg>/Z, [<Zn>.*{, #<imm>}] where

  • <Zt> - The register to merge with, available via ConditionalSelect(mask, GatherVectorByteFirstFaulting(mask, address), Zt)
  • <Pg> - mask
  • <Zn> - addresses
  • <imm> - offset (must be 0-31, inclusive)

It has encodings for 32 and 64-bit elements. It functionally looks like:

/// T: int, uint
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<uint> addresses, [ConstantExpected] byte offset = 0); // LDFF1B: 32-bit element

/// T: long, ulong
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<ulong> addresses, [ConstantExpected] byte offset = 0); // LDFF1B: 64-bit element

I do not see an encoding that supports using 64-bit addresses for zero-extending to 32-bit results.

Additional notes

All three of these forms have the comment:

This instruction is illegal when executed in Streaming SVE mode, unless FEAT_SME_FA64 is implemented and
enabled at the current Exception level.

So, depending on the desired shape for streaming mode, we may need to put these under a nested class indicating whether or not they are supported.

For LDFF1B in particular, since we are loading from byte* address, indices and offset are the same.

Once we get to LDFF1H or larger we get the names GatherVectorInt16FirstFaulting, GatherVectorUInt16ZeroExtendFirstFaulting, and GatherVectorUInt16FirstFaulting:

  • the scalar plus scalar overload only allows for indices (it always does bits(64) addr = base + (UInt(offset) + e) * mbytes
  • the scalar plus vector overload gets some new encodings
    • 32-bit scaled offset - uses indices rather than offsets, supported for T: int, uint, T2: int, uint
    • 32-bit unpacked scaled offset - uses indices rather than offsets, supported for T: long, ulong, T2: int, uint
    • 64-bit scaled offset - uses indices rather than offsets, supported for T: long, ulong, T2: long, ulong
  • the vector plus immediate overload adjusts <imm> to be a multiple of 2 in the range 0-62, inclusive

And so on....

So, for LDDFF1H I think that'd give us:

/// T: ushort
public static Vector<ushort> GatherVectorUInt16FirstFaulting(Vector<ushort> mask, ushort* address, nuint offset = 0); // LDFF1H: 16-bit

/// T: int, uint
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, nuint offset = 0); // LDFF1H: 32-bit

/// T: long, ulong
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, nuint offset = 0); // LDFF1H: 64-bit

/// [T, T2]: [int, int], [int, uint], [uint, int], [uint, uint]
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> indices); // LDFF1H: 32-bit scaled offset

/// [T, T2]: [long, int], [long, uint], [ulong, int], [ulong, uint]
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> indices); // LDFF1H: 32-bit unpacked scaled offset

/// [T, T2]: [long, int], [long, uint], [ulong, int], [ulong, uint]
public static Vector<T> GatherVectorUInt16WithByteOffsetsZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> offsets); // LDFF1B: 32-bit unpacked unscaled offset

/// [T, T2]: [int, int], [int, uint], [uint, int], [uint, uint]
public static Vector<T> GatherVectorUInt16WithByteOffsetsZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> offsets); // LDFF1B: 32-bit unscaled offset

/// [T, T2]: [long, long], [long, ulong], [ulong, long], [ulong, ulong]
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> indices); // LDFF1H: 64-bit scaled offset

/// [T, T2]: [long, long], [long, ulong], [ulong, long], [ulong, ulong]
public static Vector<T> GatherVectorUInt16WithByteOffsetsZeroExtendFirstFaulting(Vector<T> mask, ushort* address, Vector<T2> offsets); // LDFF1H: 64-bit unscaled offset

/// T: int, uint
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<uint> addresses, [ConstantExpected] byte index = 0); // LDFF1H: 32-bit element

/// T: long, ulong
public static Vector<T> GatherVectorUInt16ZeroExtendFirstFaulting(Vector<T> mask, Vector<ulong> addresses, [ConstantExpected] byte index = 0); // LDFF1H: 64-bit element

We can expose nuint offset = 0 for the scalar plus scalar versions as it avoids additional overloads while still providing the convenience of a direct scalar (not forcing users to do Vector<T2>.Zero)

@a74nh
Copy link
Contributor Author

a74nh commented Feb 13, 2024

My scripting thinks it should now look like:

I expect we are going to need to manually validate the correct combinations as they are added. While the tool is convenient for getting a general API shape emitted, I believe it is also missing some nuance in places and it makes it hard to know how correct we are or not.

Agreed. The tool is still useful in spotting inconsistencies and it's being used to auto generate the tables and templates and full expanded api etc.

For example, LDFF1B has seemingly 3 versions:

  • LDFF1B (scalar plus scalar)
  • LDFF1B (scalar plus vector)
  • LDFF1B (vector plus immediate)

LDFF1B (scalar plus scalar)

Contiguous load with first-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector

The general signature is LDFF1B { <Zt>.* }, <Pg>/Z, [<Xn|SP>{, <Xm>}] where

  • <Zt> - The register to merge with, available via ConditionalSelect(mask, GatherVectorByteFirstFaulting(mask, address), Zt)
  • <Pg> - mask
  • <Xn|SP> - address
  • <Xm> - offset

It has encodings for 8, 16, 32, and 64-bit elements. It functionally looks like:

/// T: byte
public static Vector<byte> GatherVectorByteFirstFaulting(Vector<byte> mask, byte* address, nuint offset = 0); // LDFF1B: 8-bit

/// T: short, ushort
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0); // LDFF1B: 16-bit

/// T: int, uint
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0); // LDFF1B: 32-bit

/// T: long, ulong
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0); // LDFF1B: 64-bit

This one isn't a gather (it's just a single address)

So they should all just be LoadVectorByteZeroExtendFirstFaulting(), which we have:

 /// T: short, int, long, ushort, uint, ulong
  public static unsafe Vector<T> LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address); // LDFF1B

Looking in the rejected pile, there are versions with an offset:

  ///   public static unsafe Vector<short> LoadVectorByteZeroExtendFirstFaulting(Vector<short> mask, byte* address, long vnum); // svldff1ub_vnum_s16
  ///   public static unsafe Vector<int> LoadVectorByteZeroExtendFirstFaulting(Vector<int> mask, byte* address, long vnum); // svldff1ub_vnum_s32
  ///   public static unsafe Vector<long> LoadVectorByteZeroExtendFirstFaulting(Vector<long> mask, byte* address, long vnum); // svldff1ub_vnum_s64
  ///   public static unsafe Vector<ushort> LoadVectorByteZeroExtendFirstFaulting(Vector<ushort> mask, byte* address, long vnum); // svldff1ub_vnum_u16
  ///   public static unsafe Vector<uint> LoadVectorByteZeroExtendFirstFaulting(Vector<uint> mask, byte* address, long vnum); // svldff1ub_vnum_u32
  ///   public static unsafe Vector<ulong> LoadVectorByteZeroExtendFirstFaulting(Vector<ulong> mask, byte* address, long vnum); // svldff1ub_vnum_u64

I thought in the reviews we decided to drop any addresses with an offset? Happy to re-include them.

LDFF1B (scalar plus vector)

Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended
from 32 to 64 bits. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero
in the destination vector.

The general signature is LDFF1B { <Zt>.* }, <Pg>/Z, [<Xn|SP>, <Zm>.*, <mod>] where

  • <Zt> - The register to merge with, available via ConditionalSelect(mask, GatherVectorByteFirstFaulting(mask, address, offsets), Zt)
  • <Pg> - mask
  • <Xn|SP> - address
  • <Zm> - offsets
  • mod - 0 for UXTW, 1 for SXTW (indicating whether offsets is signed or unsigned`)

It has encodings for 32-bit unpacked unscaled offset, 32-bit unscaled offset, and 64-bit unscaled offset. It functionally looks like:

/// [T, T2]: [long, int], [long, uint], [ulong, int], [ulong, uint]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 32-bit unpacked unscaled offset

/// [T, T2]: [int, int], [int, uint], [uint, int], [uint, uint]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 32-bit unscaled offset

/// [T, T2]: [long, long], [long, ulong], [ulong, long], [ulong, ulong]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 64-bit unscaled offset

I do not see an encoding that supports loading an int using long offsets.

/// [T, T2]: [long, int], [long, uint], [ulong, int], [ulong, uint]
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, Vector<T2> offsets); // LDFF1B: 32-bit unpacked unscaled offset

These versions aren't valid. T and T2 must have the same size: https://docsmirror.github.io/A64/2023-09/ldff1b_z_p_bz.html

We could support it, but it would break down to two instructions (a sign extend of the offsets, then the gather load). So, I'd recommend not having it.

LDFF1B (vector plus immediate)

Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will
not cause a read from Device memory or signal faults, and are set to zero in the destination vector.

The general signature is LDFF1B { <Zt>.* }, <Pg>/Z, [<Zn>.*{, #<imm>}] where

  • <Zt> - The register to merge with, available via ConditionalSelect(mask, GatherVectorByteFirstFaulting(mask, address), Zt)
  • <Pg> - mask
  • <Zn> - addresses
  • <imm> - offset (must be 0-31, inclusive)

It has encodings for 32 and 64-bit elements. It functionally looks like:

/// T: int, uint
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<uint> addresses, [ConstantExpected] byte offset = 0); // LDFF1B: 32-bit element

/// T: long, ulong
public static Vector<T> GatherVectorByteZeroExtendFirstFaulting(Vector<T> mask, Vector<ulong> addresses, [ConstantExpected] byte offset = 0); // LDFF1B: 64-bit element

I do not see an encoding that supports using 64-bit addresses for zero-extending to 32-bit results.

Again, T and T2 must be the same size:
https://docsmirror.github.io/A64/2023-09/ldff1b_z_p_ai.html

Additional notes

All three of these forms have the comment:

This instruction is illegal when executed in Streaming SVE mode, unless FEAT_SME_FA64 is implemented and
enabled at the current Exception level.

So, depending on the desired shape for streaming mode, we may need to put these under a nested class indicating whether or not they are supported.

This becomes easier if the decision is to not support any SVE instructions while SME is enabled. I need to figure out if that's viable option though.

@tannergooding
Copy link
Member

tannergooding commented Feb 13, 2024

This one isn't a gather (it's just a single address)

Ah right. I read it initially and then didn't actually do it right when writing it down. Fixed

I thought in the reviews we decided to drop any addresses with an offset? Happy to re-include them.

The consideration is we don't want to provide unnecessary API overloads when there is a trivially recognizable pattern. This is primarily because it reduces the number of APIs needed by several hundred across all of SVE/SVE2.

However, if there is a way to trivially support both via things like optional parameters instead, then we should feel free to propose that. This is because it allows us to do something like LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0) and cover both the offset provided and no offset provided scenario with a single API. So we provide the convenience without exploding the API surface.

These versions aren't valid. T and T2 must have the same size

I think we're both "right" here. This one is a little confusing and it comes down to how we would want to expose the encoding to the user.

The page there shows that for 32-bit unpacked unscaled offset the esize == 64 (element size), msize == 8 (memory size), and offs_size == 32 (offset size). -- This is in contrast to 32-bit unscaled offset where esize == 32, msize == 8, and offs_size == 32.

In the Operation for all encodings it then shows we use esize for accessing the mask. For offset we then use esize to determine which element to access, but then only read the lower offs_size bits of each esize element. Which is to say, it effectively takes Vector<int> but in an even layout.

Giving user access to that particular encoding of the instruction is then "complicated". We would either need the signature to take Vector<int> and have users understand that it takes it in an even layout. Or we'd need a separately named API that takes Vector<long> and where users understand it is zero/sign-extending the lower 32-bits, not the full long/ulong.

Given that it is taking only the lower 32-bits (long) or even elements (int) (really depending on how you look at the API), it's not something we can trivially pattern match outside of constant inputs. However, for constant inputs, there isn't really an optimization to make since you need the same number of bits regardless.

This becomes easier if the decision is to not support any SVE instructions while SME is enabled. I need to figure out if that's viable option though.

Are you suggesting that it would duplicate the surface area for APIs like Add in both SVE and SME?

@a74nh
Copy link
Contributor Author

a74nh commented Feb 14, 2024

This one isn't a gather (it's just a single address)

Ah right. I read it initially and then didn't actually do it right when writing it down. Fixed

I thought in the reviews we decided to drop any addresses with an offset? Happy to re-include them.

The consideration is we don't want to provide unnecessary API overloads when there is a trivially recognizable pattern. This is primarily because it reduces the number of APIs needed by several hundred across all of SVE/SVE2.

However, if there is a way to trivially support both via things like optional parameters instead, then we should feel free to propose that. This is because it allows us to do something like LoadVectorByteZeroExtendFirstFaulting(Vector<T> mask, byte* address, nuint offset = 0) and cover both the offset provided and no offset provided scenario with a single API. So we provide the convenience without exploding the API surface.

This seems reasonable. We'll have to revisit this for all the load and store APIs then, as I think most have have base+offset options.

These versions aren't valid. T and T2 must have the same size

I think we're both "right" here. This one is a little confusing and it comes down to how we would want to expose the encoding to the user.

The page there shows that for 32-bit unpacked unscaled offset the esize == 64 (element size), msize == 8 (memory size), and offs_size == 32 (offset size). -- This is in contrast to 32-bit unscaled offset where esize == 32, msize == 8, and offs_size == 32.

In the Operation for all encodings it then shows we use esize for accessing the mask. For offset we then use esize to determine which element to access, but then only read the lower offs_size bits of each esize element. Which is to say, it effectively takes Vector<int> but in an even layout.

You're right, it is the top one of the three encodings on that page.

Giving user access to that particular encoding of the instruction is then "complicated". We would either need the signature to take Vector<int> and have users understand that it takes it in an even layout. Or we'd need a separately named API that takes Vector<long> and where users understand it is zero/sign-extending the lower 32-bits, not the full long/ulong.

Given that it is taking only the lower 32-bits (long) or even elements (int) (really depending on how you look at the API), it's not something we can trivially pattern match outside of constant inputs. However, for constant inputs, there isn't really an optimization to make since you need the same number of bits regardless.

Interesting, C does not have this as a function.
On https://dougallj.github.io/asil/ you can see that LDFF1B { Zt.D }, Pg/Z, [Xn, Zm.D, {S,U}XTW has no matching C function underneath.

Checking with the team, it seems that they hit the same issue and decided that it's simpler to not have a specific API call and let the user separate it out to an extension and load call, which can then be optimised by the compiler:

https://godbolt.org/z/7K74h9Erf

#include <arm_sve.h>

svint64_t f(uint8_t *ptr, svint64_t offs) {
    return svld1ub_gather_offset_s64(svptrue_b64(), ptr, svextw_x(svptrue_b64(), offs));
}

        ptrue   p0.b, all
        ld1b    z0.d, p0/z, [x0, z0.d, sxtw]
        ret

@tannergooding
Copy link
Member

If there is a simple pattern we can use, then that works. I forgot that most of the SVE methods work in even/odd pairs so it should be natural to recognize Gather(..., WidenEven(...)) (or w/e the appropriate name is here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

No branches or pull requests

4 participants