-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added language standard parallelism to dot product #251
base: main
Are you sure you want to change the base?
Changes from 3 commits
d2ae43f
dfaf869
6574231
b6cb626
c3f4957
2b8f143
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,7 @@ | |
// Make mdspan less verbose | ||
using std::experimental::mdspan; | ||
using std::experimental::extents; | ||
using std::experimental::dynamic_extent; | ||
using std::dynamic_extent; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change suggests that CI doesn't actually build the examples. Should we consider fixing that? This change might actually need to depend on the C++ version, as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CI builds the examples; I just wasn't building them on my PC. I was unaware of the macros change. Can you give me an example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Using these macros might require updating the version of the reference mdspan implementation that github's CI pulls in. |
||
|
||
int main(int argc, char* argv[]) { | ||
std::cout << "Scale" << std::endl; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,9 +5,9 @@ | |
// Make mdspan less verbose | ||
using std::experimental::mdspan; | ||
using std::experimental::extents; | ||
using std::experimental::dynamic_extent; | ||
using std::dynamic_extent; | ||
using std::experimental::submdspan; | ||
using std::experimental::full_extent; | ||
using std::full_extent; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see above comment on |
||
|
||
int main(int argc, char* argv[]) { | ||
std::cout << "Matrix Vector Product MixedPrec" << std::endl; | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -43,6 +43,7 @@ | |||||
#ifndef LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_ | ||||||
#define LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_ | ||||||
|
||||||
#include <ranges> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! Can you point me to an example of using a feature test macro? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! This web page gives a good summary. // Ensure that the feature test macro __cpp_lib_ranges is available;
// <version> also defines this macro, but that is a C++20 header.
#include <algorithm>
#if defined(__cpp_lib_ranges)
# include <ranges>
#endif
void some_function() {
#if defined(__cpp_lib_ranges_iota)
// ... code using views::iota ...
#else
// ... fall-back code ...
#endif
} The point of using two different macros -- one for the header, and one for the specific feature |
||||||
#include <type_traits> | ||||||
|
||||||
namespace std { | ||||||
|
@@ -90,7 +91,7 @@ template<class ElementType1, | |||||
class Accessor2, | ||||||
class Scalar> | ||||||
Scalar dot( | ||||||
std::experimental::linalg::impl::inline_exec_t&& /* exec */, | ||||||
std::experimental::linalg::impl::inline_exec_t&& exec, | ||||||
std::experimental::mdspan<ElementType1, std::experimental::extents<SizeType1, ext1>, Layout1, Accessor1> v1, | ||||||
std::experimental::mdspan<ElementType2, std::experimental::extents<SizeType2, ext2>, Layout2, Accessor2> v2, | ||||||
Scalar init) | ||||||
|
@@ -100,10 +101,18 @@ Scalar dot( | |||||
v1.static_extent(0) == v2.static_extent(0)); | ||||||
|
||||||
using size_type = std::common_type_t<SizeType1, SizeType2>; | ||||||
for (size_type k = 0; k < v1.extent(0); ++k) { | ||||||
init += v1(k) * v2(k); | ||||||
} | ||||||
return init; | ||||||
using scalar_type = std::common_type_t<ElementType1, ElementType2, Scalar>; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not necessary the right type. For example, if |
||||||
using std::ranges::iota_view; | ||||||
using std::ranges::begin; | ||||||
using std::ranges::end; | ||||||
|
||||||
iota_view range{size_type{}, v1.extent(0)}; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The preferred way to use range factories is |
||||||
|
||||||
Scalar sum = std::transform_reduce(exec, begin(range), end(range), init, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "inline" executor should be executing inline. This means that, while it can use Standard Library algorithms, it shouldn't be passing along any execution policy (even if that execution policy is "sequential" -- please see my comments below). Would you consider changing this to remove the execution policy argument?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. What do we need to add to handle other execution policies? |
||||||
std::plus<scalar_type> {}, | ||||||
amklinv-nnl marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
[=](size_type i) { return v1[i] * v2[i]; }); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks correct. Another approach would be to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the input. Can you show me what that would look like? |
||||||
|
||||||
return sum; | ||||||
} | ||||||
|
||||||
template<class ExecutionPolicy, | ||||||
|
@@ -155,7 +164,7 @@ Scalar dot(std::experimental::mdspan<ElementType1, std::experimental::extents<Si | |||||
std::experimental::mdspan<ElementType2, std::experimental::extents<SizeType2, ext2>, Layout2, Accessor2> v2, | ||||||
Scalar init) | ||||||
{ | ||||||
return dot(std::experimental::linalg::impl::default_exec_t(), v1, v2, init); | ||||||
return dot(std::experimental::linalg::impl::default_exec(), v1, v2, init); | ||||||
} | ||||||
|
||||||
template<class ElementType1, | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,11 +12,12 @@ inline namespace __p1673_version_0 { | |
namespace linalg { | ||
namespace impl { | ||
// the execution policy used for default serial inline implementations | ||
struct inline_exec_t {}; | ||
using inline_exec_t = std::execution::sequenced_policy; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see note above about the "inline" executor. Any Standard Algorithm overload that takes an Therefore, it's really important that the "inline" executor not use any parallel algorithm, even if |
||
|
||
// The execution policy used when no execution policy is provided | ||
// It must be remapped to some other execution policy, which the default mapper does | ||
struct default_exec_t {}; | ||
using default_exec_t = std::execution::parallel_policy; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I remember correctly, the execpolicy_mapper did not like this default. I can restore it and show you the error it gave me. |
||
auto default_exec() { return std::execution::par; }; | ||
|
||
// helpers | ||
template<class T> struct is_inline_exec : std::false_type{}; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,7 +87,7 @@ TEST_F(blas2_signed_float_fixture, kokkos_matrix_frob_norm_trivial_empty) | |
{ | ||
std::vector<value_type> v; | ||
|
||
constexpr auto de = std::experimental::dynamic_extent; | ||
constexpr auto de = std::dynamic_extent; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does CI not actually test this test? (Please see comment above about |
||
using s_t = std::experimental::mdspan<value_type, std::experimental::extents<de, de>>; | ||
s_t M(v.data(), 0, 0); | ||
namespace stdla = std::experimental::linalg; | ||
|
@@ -109,7 +109,7 @@ TEST_F(blas2_signed_float_fixture, kokkos_matrix_frob_norm_trivial_one_element) | |
constexpr value_type myvalue = static_cast<value_type>(-1.5); | ||
std::vector<value_type> v = {myvalue}; | ||
|
||
constexpr auto de = std::experimental::dynamic_extent; | ||
constexpr auto de = std::dynamic_extent; | ||
using s_t = std::experimental::mdspan<value_type, std::experimental::extents<de, de>>; | ||
s_t M(v.data(), 1, 1); | ||
namespace stdla = std::experimental::linalg; | ||
|
@@ -143,7 +143,7 @@ TEST_F(blas2_signed_double_fixture, kokkos_matrix_frob_norm_trivial_empty) | |
{ | ||
std::vector<value_type> v; | ||
|
||
constexpr auto de = std::experimental::dynamic_extent; | ||
constexpr auto de = std::dynamic_extent; | ||
using s_t = std::experimental::mdspan<value_type, std::experimental::extents<de, de>>; | ||
s_t M(v.data(), 0, 0); | ||
namespace stdla = std::experimental::linalg; | ||
|
@@ -165,7 +165,7 @@ TEST_F(blas2_signed_double_fixture, kokkos_matrix_frob_norm_trivial_one_element) | |
constexpr value_type myvalue = static_cast<value_type>(-1.5); | ||
std::vector<value_type> v = {myvalue}; | ||
|
||
constexpr auto de = std::experimental::dynamic_extent; | ||
constexpr auto de = std::dynamic_extent; | ||
using s_t = std::experimental::mdspan<value_type, std::experimental::extents<de, de>>; | ||
s_t M(v.data(), 1, 1); | ||
namespace stdla = std::experimental::linalg; | ||
|
@@ -202,7 +202,7 @@ TEST_F(blas2_signed_complex_double_fixture, kokkos_matrix_frob_norm_trivial_empt | |
using stdc_t = value_type; | ||
if constexpr (alignof(value_type) == alignof(kc_t)){ | ||
std::vector<value_type> v; | ||
constexpr auto de = std::experimental::dynamic_extent; | ||
constexpr auto de = std::dynamic_extent; | ||
using s_t = std::experimental::mdspan<value_type, std::experimental::extents<de, de>>; | ||
s_t M(v.data(), 0, 0); | ||
namespace stdla = std::experimental::linalg; | ||
|
@@ -229,7 +229,7 @@ TEST_F(blas2_signed_complex_double_fixture, kokkos_matrix_frob_norm_trivial_one_ | |
constexpr value_type myvalue{-1.5, 2.2}; | ||
std::vector<value_type> v = {myvalue}; | ||
|
||
constexpr auto de = std::experimental::dynamic_extent; | ||
constexpr auto de = std::dynamic_extent; | ||
using s_t = std::experimental::mdspan<value_type, std::experimental::extents<de, de>>; | ||
s_t M(v.data(), 1, 1); | ||
namespace stdla = std::experimental::linalg; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More recent versions of GCC (for instance) shouldn't require TBB for
std::execution::par
to work. Furthermore, nvc++ comes with its own, non-TBB implementation ofstd::execution::par
. Would you consider gating this on compiler version, instead of requiring a third-party library? That could even be done in the code -- the code would just need to test the appropriate compiler version to ensure that the C++ algorithms are available (see https://en.cppreference.com/w/cpp/compiler_support/17 and search for "Parallel algorithms and execution policies").