Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vectorization to the par_vec (aka par_unseq) implementations of the parallel algorithms #2271

Open
brycelelbach opened this issue Jul 29, 2016 · 6 comments

Comments

@brycelelbach
Copy link
Member

brycelelbach commented Jul 29, 2016

The par_vec (aka par_unseq) policy allows interleaving of element access functions, e.g. it is safe to the iterations of the algorithm.

Explicit engagement of compiler vectorizers through pragmas is probably the best way to ensure this occurs (e.g. #pragma simd, #pragma omp simd).

I will probably take a look into doing this myself while preparing my CppCon talk on parallel algorithms.

@diehlpk
Copy link
Member

diehlpk commented Jan 24, 2017

@brycelelbach @hkaiser Could you please add a project description here https://github.com/STEllAR-GROUP/hpx/wiki/GSoC-2017-Project-Ideas

@Johan511
Copy link
Contributor

Johan511 commented Mar 1, 2023

I am interested in working on this project. I have seen that in the previous PRs we have added openMP pragmas for vectorization and parallelisation of a loop. Can someone guide me on how I can start out with working on this issue?

@hkaiser
Copy link
Member

hkaiser commented Mar 1, 2023

I am interested in working on this project. I have seen that in the previous PRs we have added openMP pragmas for vectorization and parallelisation of a loop. Can someone guide me on how I can start out with working on this issue?

Yes, we have implemented this for the first batch of algorithms. There are still algorithms left that have not been touched, though. Also, we would need a thorough performance analysis of the existing implementation, combined with improvements, if needed.

@Johan511
Copy link
Contributor

Johan511 commented Mar 12, 2023

par_unseq implementation for algorithms, checking for all (work under progress)

  • adjacent_difference
  • inner_product does it support any execution policy, could not find doc. Do we Implement using transform reduce?
  • adjacent_find
  • all_of any_of none_of
  • copy copy_if copy_n (copy uses memmove, copy_if has unseq)
  • move (uses memmove)
  • count count_if
  • equal mismatch (unable to trace bp in loop.hpp), likely does not support par_unseq
  • exclusive_scan inclusive_scan
  • reduce transform
  • fill fill_n
  • find find_end find_first_of find_if find_if_not (yet to check)
  • for_each for_each_n
  • generate generate_n
  • is_heap is_heap_until (falls back to seq or par)
  • is_partitioned is_sorted is_sorted_until
  • lexicographical_compare
  • max_element min_element minmax_element
  • make_heap
  • partial_sort (implemented using async)
  • partial_sort_copy nth_element (implemented using async futures)

  • sort (parallel async implementation)
  • stable_sort
  • partition
  • partition_copy
  • stable_partition
  • remove remove_if (conditional in loop body)
  • remove_copy remove_copy_if (conditional in loop body)
  • replace replace_copy replace_copy_if replace_if (conditional in loop body)
  • reverse reverse_copy
  • rotate rotate_copy
  • search search_n (conditional is loop body, can not vectorize)
  • set_difference set_intersection set_symmetric_difference set_union includes
  • inplace_merge
  • merge
  • swap_ranges
  • uninitialized_copy uninitialized_copy_n
  • uninitialized_fill uninitialized_fill_n
  • uninitialized_default_construct uninitialized_default_construct_n
  • uninitialized_value_construct uninitialized_value_construct_n
  • uninitialized_move uninitialized_move_n
  • destroy destroy_n
  • unique
  • unique_copy
  • transform_reduce
  • transform_exclusive_scan transform_inclusive_scan
  • shift_left shift_right
  • starts_with ends_with

@trkk28097402
Copy link

Hello @hkaiser , I am interest in this topic on gsoc24 ,I have a qeustion.
Is this restricted to only use the #pragma omp simd to vectorize or using something like __m128d, __m256d, some SIMD instructions are unreadable.

@hkaiser
Copy link
Member

hkaiser commented Mar 9, 2024

Hello @hkaiser , I am interest in this topic on gsoc24 ,I have a qeustion. Is this restricted to only use the #pragma omp simd to vectorize or using something like __m128d, __m256d, some SIMD instructions are unreadable.

Everything is possible, I guess - as long as it is portable across architectures (beyond x86), at least in the long run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants