-
-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First steps towards implementing execution par_unseq #3063
Conversation
hpx/parallel/util/loop.hpp
Outdated
#elif defined(HPX_INTEL_VERSION) || defined(HPX_CLANG_VERSION) | ||
#pragma ivdep | ||
#pragma omp simd | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to leave omp out of the picture here since we'd require users to compile with -fopenmp
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other compilers, should we issue a warning?
Would be nice to have some confidence if it really has the expected effects (better code gen) before pushing it further. |
I agree, but how can we get confidence without actually trying out things? |
> Would be nice to have some confidence if it really has the expected effects (better code gen) before pushing it further.
I agree, but how can we get confidence without actually trying out things?
Trying it out doesn't mean it has be merged ;)
|
85ae248
to
5409978
Compare
SummaryYou can't use OpenMP +#if defined(HPX_HAVE_OPENMP_SIMD)
+#pragma omp simd
+#endif
+ for (/**/; it != end; ++it)
+ {
+ f(it);
+ } OpenMP 5 is expected to fix this. I don't know the precise status of that proposal but can look if you want. I recall that Intel 18 compilers do not object to I would not expect a compiler to attempt to vectorize container accesses if a random access iterator is not supported... DetailsThe relevant portions of the OpenMP 4.5 specification are:
|
@jeffhammond thanks for this information. This branch is really meant to be used for experimentation towards implementing |
find_package(OpenMP QUIET) | ||
if(OPENMP_FOUND) | ||
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel") | ||
hpx_add_compile_flag_if_available(-openmp-simd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-openmp-simd
is deprecated (https://software.intel.com/en-us/node/693432). It was replaced by -qopenmp-simd
in version 16 or 17.
I recommend that CMake test for -fopenmp-simd
, then -qopenmp-simd
then -openmp-simd
. ICC supports many of the GCC flags for compatibility...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks.
@@ -253,9 +253,9 @@ namespace hpx { namespace parallel { namespace util | |||
HPX_FORCEINLINE void prefetch_addresses(T const& ... ts) | |||
{ | |||
int const sequencer[] = { | |||
(_mm_prefetch( | |||
0, (_mm_prefetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the non-x86 code path, you may want to use __builtin_prefetch
, which both GCC and Clang support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that, MSVC however supports _mm_prefetch
only.
@hkaiser These are important experiments, in part because the OpenMP standard working group recognizes that support for C++ in OpenMP is lacking. While 5.0 is pretty close to finished, your experience in implementing PSTL will be useful in identifying gaps that should be addressed in the next iteration. I don't know what code you allow yourself to look at (due to licensing), but both RAJA and Intel PSTL may be useful. In particular, RAJA supports a bunch of different pragmas for persuading compilers to vectorize inner loops. |
Closing this since it's not actively being worked on, but if someone feels like picking this up again, feel free to do so! |
This is related to #2271