Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Fix method names of hardware intrinsic APIs #25965

Closed
wants to merge 1 commit into from
Closed

Fix method names of hardware intrinsic APIs #25965

wants to merge 1 commit into from

Conversation

fiigii
Copy link
Contributor

@fiigii fiigii commented Dec 17, 2017

Matching the CoreCLR change dotnet/coreclr#15471

cc @jkotas @eerhardt @tannergooding

@@ -198,7 +197,7 @@ public static class Avx
public static Vector256<float> Set(float e7, float e6, float e5, float e4, float e3, float e2, float e1, float e0) { throw null; }
public static Vector256<double> Set(double e3, double e2, double e1, double e0) { throw null; }
public static Vector256<T> Set1<T>(T value) where T : struct { throw null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather be called SetOne ? For consistency with SetZero, Vector<T>.One, Vector<T>.Zero.

Copy link
Contributor Author

@fiigii fiigii Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that Set1 should not be consistent with SetZero or Vector<T>.One.

  • SetZero stands for "set all elements to value of zero";
  • Vector<T>.One stands for "set all elements to value of one";
  • Set1 stands for "set all elements to one value of XX";
  • Set stands for "set elements to multiple values of XX, YY, ZZ, ...";

But I know Set1 is not a good name that just follows C++ Intel intrinsic naming. Do you have suggestions for a better name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Set or SetAll might make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Set or SetAll might make sense.

We have Set for multi-value initialization. SetAll makes sense to me. Thank you.

@@ -169,8 +169,7 @@ public static class Avx
public static Vector128<double> Permute(Vector128<double> value, byte control) { throw null; }
public static Vector256<float> Permute(Vector256<float> value, byte control) { throw null; }
public static Vector256<double> Permute(Vector256<double> value, byte control) { throw null; }
public static Vector256<float> Permute2x128(Vector256<float> left, Vector256<float> right, byte control) { throw null; }
public static Vector256<double> Permute2x128(Vector256<double> left, Vector256<double> right, byte control) { throw null; }
public static Vector256<T> Permute2x128<T>(Vector256<T> left, Vector256<T> right, byte control) where T : struct { throw null; }
public static Vector128<float> PermuteVar(Vector128<float> left, Vector128<float> mask) { throw null; }
Copy link
Member

@jkotas jkotas Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called PermuteVariable? For consitency with BlendVariable.

Or do these two methods need the Var/Variable suffix at all? Would overload be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the Variable suffix only for v-suffixed instructions, e.g., BlendVariable -> vblendvp*, ShiftLeftLogicalVariable->vpsllv*, etc.
PermuteVar and PermuteVar8x32 is a special case that will generate vpermilp* and vperm*, which breaks the above convention, so it is following C++ Intel intrinsic naming.
We can change it to Variable suffix, I have no strong preference here.

@@ -731,7 +730,7 @@ public static class Sse
public static Vector128<float> Multiply(Vector128<float> left, Vector128<float> right) { throw new NotImplementedException(); }
public static Vector128<float> Or(Vector128<float> left, Vector128<float> right) { throw new NotImplementedException(); }
public static Vector128<float> Reciprocal(Vector128<float> value) { throw new NotImplementedException(); }
public static Vector128<float> ReciprocalSquareRoot(Vector128<float> value) { throw new NotImplementedException(); }
public static Vector128<float> ReciprocalSqrt(Vector128<float> value) { throw new NotImplementedException(); }
public static Vector128<float> Set(float e3, float e2, float e1, float e0) { throw new NotImplementedException(); }
public static Vector128<float> Set1(float value) { throw new NotImplementedException(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@jkotas
Copy link
Member

jkotas commented Dec 17, 2017

For my education, what was the rule used to make the method generic vs. non-generic?

For example, I wondering about these:

  • Vector256<T> Set1<T>(T value) is generic, Vector128<int> Set1 is a series of non-generic methods. Should it be the same?

  • What is the point of e.g. float ExtractSingle<T>(Vector128<T> value, byte index) being generic? If I want to extract single, it should better be Vector128<float>.

@jkotas
Copy link
Member

jkotas commented Dec 17, 2017

I am closing this because of I have cherry picked this into #25969. We can continue the discussion about the naming though.

@jkotas jkotas closed this Dec 17, 2017
@fiigii
Copy link
Contributor Author

fiigii commented Dec 17, 2017

Vector256<T> Set1<T>(T value) is generic, Vector128<int> Set1 is a series of non-generic methods. Should it be the same?

Because SSE only has float instructions, so other types (Vector128<int> Set1, Vector128<double> Set1, ...) have to be in Sse2.
There is another solution would make it more consistent that we can provide generic Vector256<T> Set1<T>(T value) in Sse2 and Vector128<float> Set1 in Sse. Thoughts?

@fiigii
Copy link
Contributor Author

fiigii commented Dec 17, 2017

What is the point of e.g. float ExtractSingle(Vector128 value, byte index) being generic? If I want to extract single, it should better be Vector128

Sometimes, ExtractSingle<T> can save one cast than ExtractSingle.

@jkotas
Copy link
Member

jkotas commented Dec 18, 2017

Sometimes, ExtractSingle<T> can save one cast than ExtractSingle.

Is this a common pattern? It does not sound right to be optimizing for case where folks have e.g. Vector128<int> and they want to extract the int as float.

@fiigii
Copy link
Contributor Author

fiigii commented Dec 18, 2017

Vector128 and they want to extract the int as float.

@jkotas Ok, I will fix this and Set1.

@4creators
Copy link
Contributor

4creators commented Dec 18, 2017

It does not sound right to be optimizing for case where folks have e.g. Vector128 and they want to extract the int as float.

@jkotas @fiigii
This is not an optimization. This is feature of underlying processor instructions which cannot discern if operand xmm is of float or int type. I do not think we should change that behavior. It would be OK for System.Numerics.Vector but intrinsics are just raw, very advanced API to underlying processor instructions and IMO we should just give them as they are.

@tannergooding
Copy link
Member

@4creators, it is an optimization. It just avoids the consumer needing to insert a StaticCast (which will be no-op anyways). ExtractSingle should just take Vector128<float>

@fiigii
Copy link
Contributor Author

fiigii commented Dec 18, 2017

I believe that these generic intrinsic is a "legacy" design from the long design process, and the original motivation has gone due to other changes. I will fix it soon, thanks for pointing out!

@4creators
Copy link
Contributor

@tannergooding my point is that (E)(V)EXTRACTPS instruction does not check if operand is of xmm packed float type and it does not throw any type of floating point exception either, therefore, it can be treated as a general extraction of 32 bits from xmm vector. If we introduce any type of limitations which limit pure instruction functionality it is wrong for raw API.

@4creators
Copy link
Contributor

4creators commented Dec 18, 2017

float ExtractSingle<T>(Vector128<T> value, byte index) where T : struct { throw null; }
Vector128<Int64> src = //... //
var f = Sse41.ExtractSingle<Int64>(src, 3);

Should be perfectly legal - it saves 1 CPU cycle by doing extraction and cast (strange binary one but still), and I may want to use Int64 for loading to get atomic load for adjacent two 32bit values and do some precalculation step with it (again at binary level).

extractps_1

extractps_2

@tannergooding
Copy link
Member

We are not providing a raw api, however. We are providing a managed abstraction over the underlying hardware instructions.

Because it is an abstraction and not raw access to the underlying instructions, there are some helper functions being provided (like static cast, set 1, etc) and other by design limitations set forth (such as no MMX instructions being exposed).

@fiigii
Copy link
Contributor Author

fiigii commented Dec 18, 2017

@4creators Thank you for explaining my original design proposal 😄 . However, after I added StaticCast<T,U>, this kind of "raw operation" can be unified.

Vector128<Int64> src = //... //
Vector128<Single> srcFloat = Sse.StaticCast<Int64, Single>(src);
var f = Sse41.ExtractSingle(srcFloat, 3);

StaticCast<T,U> does not generate any runtime code (just makes the type system happy), so do not worry about "1 CPU cycle by doing extraction and cast".

@fiigii
Copy link
Contributor Author

fiigii commented Dec 18, 2017

Updated the above code example.

@4creators
Copy link
Contributor

Ahh my ... that was a good one on my side 😆

@karelz karelz added this to the 2.1.0 milestone Dec 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants