Skip to content

[SYCL] C++ User-Literals Broken #2187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ax3l opened this issue Jul 28, 2020 · 19 comments · Fixed by #2729
Closed

[SYCL] C++ User-Literals Broken #2187

ax3l opened this issue Jul 28, 2020 · 19 comments · Fixed by #2729
Assignees
Labels
bug Something isn't working

Comments

@ax3l
Copy link

ax3l commented Jul 28, 2020

Hi,

our code-base defines a C++ user-defined literal of the form

// myreal.H snippet
    using Real = float; // somewhere

    /** @{
      C++ user literals ``_rt`` &  ``_prt`` for short-hand notations
      Use this to properly add types to constant such as
      ```C++
      auto const mypi = 3.14_rt;
      auto const sphere_volume = 4_rt / 3_rt * pow(r, 3) * mypi;
      ```
    */
    constexpr Real
    operator"" _rt( long double x )
    {
        return Real( x );
    }

    constexpr Real
    operator"" _rt( unsigned long long int x )
    {
        return Real( x );
    }
    /// @}

prior to beta08 of oneAPI this worked quite well, but now it leads to a

dpcpp <...> -O3 -DNDEBUG -pthread -Wno-error=sycl-strict -fsycl -fsycl-unnamed-lambda -fsycl-device-code-split=per_kernel -o myfile.cpp.o -c myfile.cpp
myfile.cpp: error: 'operator""_rt' requires 128 bit size 'long double' type support, but device 'spir64-unknown-unknown-sycldevice' does not support it
    auto rand = 0.0_rt;
                   ^
myreal.H: note: 'operator""_rt' defined here
    operator"" _rt( long double x )

Specifically for C++ user-defined literals this is odd, since they are only defined with long double as floating point type:

Only the following parameter lists are allowed on literal operators : 
( const char * ) | (1) |  
( unsigned long long int ) | (2) |  
( long double ) | (3)
...

Overloading user-defined literals with double does not work:

myreal.H: error: invalid literal operator parameter type 'double', did you mean 'long double'?
    operator"" _rt( double x )
                    ^~~~~~~~~~~~~~~~~~~

Tried as a work-around -mlong-double-64, but this seems to have no effect (still reported as 128 bit).

Seen with beta08 on Ubuntu 18.04.

ax3l added a commit to ax3l/amrex that referenced this issue Jul 28, 2020
Work-around for DPC++ beta08 bug in:
  intel/llvm#2187
ax3l added a commit to ax3l/amrex that referenced this issue Jul 28, 2020
Work-around for DPC++ beta08 bug in:
  intel/llvm#2187
ax3l added a commit to ax3l/amrex that referenced this issue Jul 28, 2020
Work-around for DPC++ beta08 bug in:
  intel/llvm#2187
ax3l added a commit to ax3l/amrex that referenced this issue Jul 28, 2020
Work-around for DPC++ beta08 bug in:
  intel/llvm#2187
@ax3l ax3l changed the title [SYCL] User-Literal on unknown-sycldevice [SYCL] C++ User-Literals Broken Jul 28, 2020
@AlexeySachkov
Copy link
Contributor

This is probably caused by #1512. The PR doesn't contain a lot of info about the motivation: the one reason I can guess is that if long double is actually larger than 64 bits then we cannot represent it in device code, because OpenCL doesn't support such data types.

However, I don't know why we are not mapping long double onto just double if it is just 64 bits - bug in #1786?

@cperkinsintel
Copy link
Contributor

cperkinsintel commented Jul 29, 2020

I've found it. It was introduced here in this commit:
cf6cc66#diff-2856c75f844d1133220fc68717852bf1R1727

For SYCL, those lines (1728 - 1737 of the link above) are redundant. We already have checks for illegal __float128 etc over here: https://github.com/intel/llvm/blob/sycl/clang/lib/Sema/SemaSYCL.cpp#L253

The difference is that the checks in Sema.cpp that are causing the issue are performed early, during parsing, I believe. But the ones performed in SemaSYCL.cpp are being performed later, after there is more type information in the QualTypes. The lifecycle stage is important, because otherwise types get missed or mis-interpreted.

I haven't looked at that type block in Sema.cpp carefully yet, but if it is redundant, then at least for SYCL we could check getLangOpts().SYCLIsDevice and skip the problematic checks .

Question: does OpenMP have the same issue? I would think it should.

@Fznamznon @AlexeySachkov @bader - what do you think?

I will investigate a bit more to double check.

@Fznamznon
Copy link
Contributor

Fznamznon commented Jul 29, 2020

For SYCL, those lines (1728 - 1737 of the link above) are redundant. We already have checks for illegal __float128 etc over here

AFAIK they works only for variable declarations inside device code. The checks that you claim about is added by me through review with community. They have other underlying idea - diagnose on attempt to use (not declare!) something in device code, so they work for cases like captured illegal variables in the device code and calling a function with illegal type in parameters and return value. The community's idea was to replace illegal types with byte arrays in resulting code and diagnose when such replacement is not possible. Right now replacement doesn't happen, so we still need our checks in SemaSYCL.cpp to keep SPIR-V valid.

The difference is that the checks in Sema.cpp that are causing the issue are performed early, during parsing, I believe.

I believe they happen a bit later, on attempt to use some declaration. See (diagnoseUseOfDecl Sema function).

Question: does OpenMP have the same issue? I would think it should.

Probably yes. But is it an issue?
I mean, there is nothing wrong with this diagnostic. The function with illegal type in argument is called - the diagnostic is emitted.

Although I would expect that everything should be fine if 64 bit long type is requested, since this check (cf6cc66#diff-2856c75f844d1133220fc68717852bf1R1727) should claim only about 128 bit size.

Other interesting question is "Does SYCL device code has support for long double at all?".
Because SYCL 1.2.1 is a bit unclear about this. First, is says:

SYCL as a C++11 programming model supports the C++11 ISO standard data types (pt 4.10)

After that I would expect that all types that are supported in C++11 also supported in SYCL.
But, later it says:

The fundamental C++ data types which are supported in SYCL are described in Table 6.1.

And there is no long double type in this table at all.

@cperkinsintel
Copy link
Contributor

In this particular case long double yes in SYCL or no in SYCL doesn't matter. In the top post ax3l uses User Defined Literal functions to return floats. The signature of one of those functions is long double but it's replaced at compile time ( I'm not sure when Clang replaces it). My current theory is that the community code check is being run before Clang replaces it.
But you are right in that long double isn't expressly in the list. Maybe it's getting scooped up by (Ty->isRealFloatingType() && Context.getTypeSize(Ty) == 128)) ? I will check.

But, regardless, would it be ok with you if I bracketed that code with a SYCL check so it's not run with SYCL, and added a new test for UDL functions? That would fix this bug in the short term, and the UDL test could be sent back to the community because I think, ultimately, they'll need to address this problem as well. Let me know your thoughts.

@Fznamznon
Copy link
Contributor

The signature of one of those functions is long double but it's replaced at compile time ( I'm not sure when Clang replaces it).

Why it is replaced? I see a function with long double argument in resulting LLVM IR for such literal, see https://godbolt.org/z/xPGW57 . Am I misunderstanding something?

@cperkinsintel
Copy link
Contributor

cperkinsintel commented Jul 29, 2020

You are correct, apparently UDL functions can be runtime, have their address taken, etc. But my understanding is that they are normally compiled away. And, in your example, the _rt() function does not end up in the machine code ( but, a() b() and main() do ).

Hmm - so maybe the problem isn't a lifecycle problem like I originally posited.

@erichkeane
Copy link
Contributor

Unfortunately, the two standards place us in a position where we cannot allow these when the target size long-double is an int-128 (like linux). The host and target size of variables HAS to remain the same.

In the example, the variable 'rand' is not evaluated in a constant expression, so the frontend HAS to treat it like it is going to be emitted to machine code (because it WILL when optimizations are omitted).

That said, there are actually TWO other bugs with the 128-bit-size 'long double' support that need to be fixed.

First, when the variable is compelled to be a constexpr, we shouldn't be diagnosing (such as when you change your code to be constexpr auto rand = 3.14_rt). So the fix there is likely that we have to not-descend the call-graph in the case where the call is constant-evaluated (or not ODR used).

SECOND: (and @Fznamznon may need to look into this), the diagnostic doesn't seem to be paying attention to -mlong-double-64, which configures 'long double' to be a 64 bit type. (mlong-double-80 and -mlong-double-128 do the obvious thing though).

@erichkeane
Copy link
Contributor

Hmm... This seems to affect all of the deferred diagnostic targets in clang (that is, assigning to a constexpr). I filed a bug against openMP where I think the maintainer did a lot of work on it (https://bugs.llvm.org/show_bug.cgi?id=46910), to see if he can figure out a good way to solve THAT.

sayerhs added a commit to Exawind/amr-wind that referenced this issue Jul 31, 2020
marchdf pushed a commit to marchdf/amr-wind that referenced this issue Aug 4, 2020
@Fznamznon Fznamznon assigned Fznamznon and unassigned cperkinsintel Aug 10, 2020
@Fznamznon
Copy link
Contributor

(and @Fznamznon may need to look into this), the diagnostic doesn't seem to be paying attention to -mlong-double-64, which configures 'long double' to be a 64 bit type.

I'll take a look.

@Fznamznon
Copy link
Contributor

This one #2295 should fix the -mlong-double-64 option.

@ax3l
Copy link
Author

ax3l commented Sep 13, 2020

@Fznamznon we are currently validating the fixed -mlong-double-64 option from #2295 in oneAPI beta09. Unfortunately, dpcpp is still not able to compile the above C++ code example with user-provided literals, even when setting the flag.

The error in beta09 is still:

error: 'operator""_rt' requires 128 bit size 'long double' type support, but device 'spir64-unknown-unknown-sycldevice' does not support it

(C++ user literals worked with dpcpp prior to the beta08 release.)

Further help is greatly appreciated, because user-literals is literally (no pun intended) how we express mixed precision in our downstream, modern C++ code bases. One idea could also be to add the above example to the dpcpp tests?

@Fznamznon
Copy link
Contributor

@Fznamznon we are currently validating the fixed -mlong-double-64 option from #2295 in oneAPI beta09. Unfortunately, dpcpp is still not able to compile the above C++ code example with user-provided literals, even when setting the flag.

The error in beta09 is still:

error: 'operator""_rt' requires 128 bit size 'long double' type support, but device 'spir64-unknown-unknown-sycldevice' does not support it

(C++ user literals worked with dpcpp prior to the beta08 release.)

Further help is greatly appreciated, because user-literals is literally (no pun intended) how we express mixed precision in our downstream, modern C++ code bases. One idea could also be to add the above example to the dpcpp tests?

So, when command line like "clang++ -fsycl -mlong-double-64 test.cppis used, It seems likemlong-double-64option is not passed to device front-end by the driver (I cannot see it in-### output). So, if you pass it directly through -Xclang -mlong-double-64` to the front-end, the case described above is compiled successfully. @mdtoguchi , @AGindinson , any ideas why this option is not passed to device front-end compilation?

@ax3l
Copy link
Author

ax3l commented Sep 15, 2020

Thanks for the hint with the pass-through issue. Now only the public apt-repos for oneAPI beta09 need to become un-broken (AMReX-Codes/amrex#1366 // supporttickets.intel.com no. 04801443).
Update: package renamed from intel-oneapi-dpcpp-compiler to intel-oneapi-dpcpp-cpp-compiler.

Please keep us posted when the pass-through of -mlong-double-64 receives further updates.

@mdtoguchi
Copy link
Contributor

@Fznamznon we are currently validating the fixed -mlong-double-64 option from #2295 in oneAPI beta09. Unfortunately, dpcpp is still not able to compile the above C++ code example with user-provided literals, even when setting the flag.
The error in beta09 is still:

error: 'operator""_rt' requires 128 bit size 'long double' type support, but device 'spir64-unknown-unknown-sycldevice' does not support it

(C++ user literals worked with dpcpp prior to the beta08 release.)
Further help is greatly appreciated, because user-literals is literally (no pun intended) how we express mixed precision in our downstream, modern C++ code bases. One idea could also be to add the above example to the dpcpp tests?

So, when command line like "clang++ -fsycl -mlong-double-64 test.cppis used, It seems likemlong-double-64option is not passed to device front-end by the driver (I cannot see it in-### output). So, if you pass it directly through -Xclang -mlong-double-64` to the front-end, the case described above is compiled successfully. @mdtoguchi , @AGindinson , any ideas why this option is not passed to device front-end compilation?

Looks like -mlong-double is being restricted for the device compilation (only goes through for x86, ppc)

@AlexeySachkov AlexeySachkov added the bug Something isn't working label Oct 13, 2020
@ax3l
Copy link
Author

ax3l commented Nov 2, 2020

Hi everyone, is there any progress on this?

@erichkeane
Copy link
Contributor

It looks like -mlong-double-64 should work through our driver now, based on my downstream's nightly. Were you able to give it a try?

@mdtoguchi
Copy link
Contributor

The issue of restricting -mlong-double from being passed to device compilations is being addressed here: #2729. It should be noted that there is a general filter that is done when processing arguments for the device compilations that prevents any -m* grouped option from being passed if the architecture does not match (host vs device). Using the patch specified (#2729) and using -Xsycl-target-frontend "-mlong-double-64" the option should be processed correctly.

@ax3l
Copy link
Author

ax3l commented Jun 30, 2021

Hi @bader @mdtoguchi, I just realized that even with the latest oneAPI release (2021.3.0), we still need to pass -mlong-double-64 -Xclang -mlong-double-64 to our compile logic to avoid hitting this bug when offloading to Intel GPUs.

Is this the permanent solution or will this still be relaxed/adjusted?

@bader
Copy link
Contributor

bader commented Jul 14, 2021

@ax3l, sorry for the delay.
I'm not aware of plans to adjust/relax diagnostics for long double data type.
Probably we can discuss this with @AaronBallman, @erichkeane, @premanandrao, @elizabethandrews if it can be achieved.
The reason to have this diagnostics is to make sure that device code produced by clang will not use long double data type - this data type is not supported by accelerators.
The example provided in the description downcasts long double to supported float "at compile time" (i.e. before we invoke device specific back-end compiler).
It might be possible to relax the diagnostics if we can reliably separate cases when produced device code will have long double data type operations/storage from the cases when all instances of long double are converted to supported types. I'm not sure if it's possible.
@AaronBallman, @erichkeane, @premanandrao, @elizabethandrews any thoughts on that?

WeiqunZhang pushed a commit to AMReX-Codes/amrex that referenced this issue Jul 19, 2021
## Summary

`IntelLLVM` is now a recognized CMake compiler, so we don't need to add a Clang-ish identification anymore.

Also fix a CMake -Wdev warning (DPC++/SYCL)

-  beta08 work-around: `-mlong-double-64` (intel/llvm#2187) -> still needed!
- beta09 work-around: `-fno-sycl-early-optimizations` (link to upstream issue?) -> keep for now, RT tests pending to verify fix

## Additional background

setting the CXX compiler ID was a work-around for the 2021.1 DPC++ release / CMake 3.19.0-3.19.1
  https://gitlab.kitware.com/cmake/cmake/-/issues/21551#note_869580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants