Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing C++11 type traits for half #628

Closed
lgritz opened this issue Dec 3, 2019 · 19 comments
Closed

Missing C++11 type traits for half #628

lgritz opened this issue Dec 3, 2019 · 19 comments
Milestone

Comments

@lgritz
Copy link
Contributor

lgritz commented Dec 3, 2019

half.h lacks the bits of code necessary for certain C++11 type traits. Users should expect that std::is_floating_point::value is true, as should be std::is_arithmetic::value. (Currently, they default to false, which can screw up your clever metaprogramming when using half. There may be others we should set as well.

(Question above my C++ pay grade: should vector & matrix types also have arithmetic and floating point traits?)

In retrospect, I can see that this was mentioned in #253, and we closed that tickets without fully addressing it.

@lgritz
Copy link
Contributor Author

lgritz commented Dec 3, 2019

oy, excuse the bad formatting that removed the <>.

std::is_floating_point<half>::value

std::is_arithmetic<half>::value

@meshula
Copy link
Contributor

meshula commented Dec 3, 2019

cppreference.com offers an annoying explanation of std::is_floating_point. https://en.cppreference.com/w/cpp/types/is_floating_point --- it specifically names float, double, and long double as returning true, otherwise false. Does that mean that it can only be true for float, double, and long double, or does it mean that in the C++ specification that's how it evaluates? If the spec is restricted, then, we'd probably need our own is_float_point that is a superset of the kinda-maybe useless official one. Same question for is_arthimetic and a definition possibly over restrictive for our uses. I'm leaving this note here in the spirit of "above my pay grade" ;) In other words, I haven't been able to find clarification in the spec, due no doubt to terrible internet-search skills.

@lgritz
Copy link
Contributor Author

lgritz commented Dec 3, 2019

For half, I think the spirit is clearly that types that act like floating point scalars -- regardless of bit width -- should be include. If it were just those three types now and forever, you could just say if (std::is_same<T,float>::value || std::is_same<T,double>::value || ...). It exists as a separate test so that custom types that behave in a semantically isomorphic way can include themselves.

A trickier question is whether it should be true for vector floating point types. I found the guidance there to be lacking.

@meshula
Copy link
Contributor

meshula commented Dec 4, 2019

If we go with the spirit (not very language-lawyer-y ;) ) then std::is_floating_point should be true for half, and if std::is_arithmetic is implemented in terms of is_floating_point, then that will follow for free.

"In the spirit", vector and matrix qualify, marginally:

Arithmetic types are the built-in types for which the arithmetic operators (+, -, *, /) are defined (possibly in combination with the usual arithmetic conversions)

I say marginally, because the usual arithmetic conversions don't really apply in a sensible way - we can't convert a matrix to a floating point number. The spec says "possibly", but that is just worrying since we allow things like Mat44(3) fills the matrix with 3s. Other math libraries treat that constructor as multiplying Identity by 3, or disallow it. Sure the spec says "possibly."

We have defined all four operators, if not in an algebraic way. (matrix matrix divide is per component, multiply is matrix multiplication; vector vector divide and multiply are both per component). Good thing it is not called std::is_algebraic!

is_floating_point and is_arthmetic will likely be used in SFINAE trickery, and I can easily imagine a can of worms since the meaning of the multiplication and division operators is not consistent across mat and vec and scalar values, nor is the meaning and consistency of "possibly existing" arithmetic conversions helping the case. So I think I am leaning towards "no" on vec and mat.

@lgritz
Copy link
Contributor Author

lgritz commented Dec 4, 2019

@lgritz
Copy link
Contributor Author

lgritz commented Dec 4, 2019

It's actually hard to know what the right thing is here. Some implementations of half seem to define these type traits, others do not.

@meshula
Copy link
Contributor

meshula commented Dec 4, 2019

That's a very nice half they've got there, hmmm. It might be time for us to go analytic instead of lookup... @pixelfinger, @romainguy, have you guys benchmarked your half.h versus a table lookup like we have here in OpenEXR?

@romainguy
Copy link

We did not benchmark against a LUT. The main reasons for us are:

  • Binary size is very important (interestingly we switched from openexr to tinyexr because of size/compile times)
  • ARM offers instructions to convert to/from fp16
  • Half floats are mostly meant as a storage format for us. We try to do conversions offline or as early as possible

@meshula
Copy link
Contributor

meshula commented Dec 4, 2019

Good to know, thanks!

For further reference on half: tinyexr uses conversion code modified from this code by @rygorous : https://gist.github.com/rygorous/2156668 which is further derived from reference to the ISPC standard library.

@lgritz
Copy link
Contributor Author

lgritz commented Dec 4, 2019

I've used some of that ryg code as well in OIIO -- it can be SIMD-iszed for multi-half/float conversions when f16c instructions are not available. The native OpenEXR half table-based conversion is fine for scalars but doing 4 or 8 table lookups is terrible compared to actual SIMD ops. See, for example, https://github.com/OpenImageIO/oiio/blob/master/src/include/OpenImageIO/simd.h#L6540

I will post timings in a minute...

@lgritz
Copy link
Contributor Author

lgritz commented Dec 4, 2019

On my work Linux workstation (Xeon Silver 4110 CPU, with OIIO compiled for AVX2+f16c):

Using OpenEXR table-based lookups (inherently one at a time), I can load half values and convert to float (in registers) at a rate of 1550 M values /sec.

Using the ryg half->float sse code (no f16c intrinsics), I can load 4 half values into a 4-wide float SSE register at 2900 Mvals/sec.

Using f16c intrinsics, I can half->float 4 at a time to sse registers at 9300 Mvals/sec, and 8 at a time to avx2 registers at 18500 Mvals/sec.

Going the other way, i.e. converting floats in registers -> half in memory, I get a rate of 490 Mvals/sec for OpenEXR. With f16c intrinsics I can convert+store 4 at a time at 10500 Mvals/sec and 8 at a time at 15400 Mvals/sec. I don't seem to have implemented an sse based version of ryg's float-to-half, I just always use OpenEXR... I don't remember why, maybe I tried it and it wasn't appreciably faster than openexr?

(This is all timed based on OIIO's simd.h vfloat4 and vfloat8, using load(const half*) and store(half*) methods.)

@lgritz
Copy link
Contributor Author

lgritz commented Dec 4, 2019

Those benchmarks are in tight loops where OpenEXR's conversion tables easily fit into cache. I would expect OpenEXR's tables to fare even worse in real world code when the conversions are scattered among other code and the table values get evicted from cache frequently.

@lgritz
Copy link
Contributor Author

lgritz commented Dec 4, 2019

The bottom line is that if you are converting many values (like images), you REALLY want to be using f16c intrinsics and simd registers, the combo can get you 12x speedup for half->float and 30x speedup for float->half.

@xlietz
Copy link
Contributor

xlietz commented Dec 5, 2019 via email

@lgritz
Copy link
Contributor Author

lgritz commented Dec 5, 2019

std::vector<float> isn't really a fair comparison, because it doesn't in any way act like an arithmetic quantity (no operator+, for example).

But Vec3f does! You use it algebraically much like it's a float. We define all the overloads specifically to make this the case.

I thought I would clinch my argument by testing the clang/gcc built-in simd types:

typedef float float4 __attribute__((ext_vector_type(4)));
std::cout << "float4 " << std::is_floating_point<float4>::value
            << ' ' << std::is_arithmetic<float4>::value << "\n";

But alas, they say 0!

@lgritz
Copy link
Contributor Author

lgritz commented Dec 5, 2019

If you think about it, that's ironic and weird because there is no such thing as a 'float' on this CPU, only __m128 and __m256. When you ask for 'float', it simulates it by doing SIMD math but only loads/saves one lane to memory.

@xlietz
Copy link
Contributor

xlietz commented Dec 5, 2019 via email

@lgritz
Copy link
Contributor Author

lgritz commented Dec 5, 2019

Well, they say it just tersely enough that I can't quite tell the intent. Do they mean, "out of the built-in types we document in the spec, is_floating_point is true for float, double, and long double [but we make no claims about other types]?" Or do they mean, "is_floating_point should only ever be true for these built-in types that we name explicitly, not any other type (including compiler extensions or user type) no matter how semantically similar they are to float?"

But the answers here seem very confident about these type traits truly meaning just the built-in types.

OK, fine, I will withdraw the issue.

@lgritz lgritz closed this as completed Dec 5, 2019
@xlietz
Copy link
Contributor

xlietz commented Dec 5, 2019

Just to be clear, I was agreeing with you that I thought is_floating_point and is_arithmetic should be true for half (just not for vectors and matrices). But those answers are compelling in the link you provided.

@cary-ilm cary-ilm added this to the v2.5.0 milestone Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants