diff --git a/README.md b/README.md index c620df1..9e1e52a 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ - Document Number: N4670 - Date: 2017-06-19 + Document Number: N4699 + Date: 2017-10-16 Revises: Project: Programming Language C++ Project Number: TS 19570 @@ -7,20 +7,22 @@ NVIDIA Corporation jhoberock@nvidia.com -# Parallelism TS Editor's Report, pre-Toronto mailing +# Parallelism TS Editor's Report, pre-Albuquerque mailing -N4669 is the proposed working draft of Parallelism TS Version 2. It contains editorial changes to the Parallelism TS to make it consistent with new ISO directives concerning "Scope", "Normative references", and "Terms and definitions" clauses. +N4698 is the proposed working draft of Parallelism TS Version 2. It contains changes to the Parallelism TS as directed by the committee at the Toronto meeting, and editorial changes. -N4669 updates the previous draft, N4579, published in the pre-Jacksonville mailing. +N4698 updates the previous draft, N4669, published in the pre-Toronto mailing. # Technical Changes -None. +* Apply P0076R4 - Vector and Wavefront Policies. # Editorial Changes -* Introduced new Clause 1 - Scope -* Introduced new Clause 2 - Normative references -* Introduced new Clause 3 - Terms and definitions -* Incremented the numbers of all existing clauses +* Reformat Table 1 - Feature Test Macro(s), to match the style of the Library Fundamentals TS. + +# Notes + +* The pre-existing content of N4698 has not yet been harmonized with C++17. As a result, this content is named and namespaced inconsistently with the newly applied content of P0076R4. We anticipate that these inconsistencies will be harmonized by a future revision. +* N4698 contains forward references to `for_loop` and `for_loop_strided`. We anticipate their introduction in a future revision. diff --git a/algorithms.html b/algorithms.html index d644d30..275c08a 100644 --- a/algorithms.html +++ b/algorithms.html @@ -2,36 +2,45 @@

Parallel algorithms

-

In general

+

In general

+ This clause describes components that C++ programs may use to perform operations on containers and other sequences in parallel. + -

Requirements on user-provided function objects

+

Requirements on user-provided function objects

-

- Function objects passed into parallel algorithms as objects of type BinaryPredicate, - Compare, and BinaryOperation shall not directly or indirectly modify - objects via their arguments. + +

+ Function objects passed into parallel algorithms as objects of type BinaryPredicate, + Compare, and BinaryOperation shall not directly or indirectly modify + objects via their arguments.

+
-

Effect of execution policies on algorithm execution

+

Effect of execution policies on algorithm execution

+

Parallel algorithms have template parameters named ExecutionPolicy which describe the manner in which the execution of these algorithms may be parallelized and the manner in which they apply the element access functions.

+
+

The invocations of element access functions in parallel algorithms invoked with an execution policy object of type sequential_execution_policy execute in sequential order in the calling thread.

+
+

The invocations of element access functions in parallel algorithms invoked with an execution policy object of type parallel_execution_policy are permitted to execute in an @@ -44,7 +53,9 @@

Effect of execution policies on algorithm execution

not introduce data races or deadlocks.

+
+
using namespace std::experimental::parallel;
 int a[] = {0,1};
 std::vector<int> v;
@@ -55,9 +66,10 @@ 

Effect of execution policies on algorithm execution

The program above has a data race because of the unsynchronized access to the container v. -
+      
 
+
 using namespace std::experimental::parallel;
 std::atomic<int> x = 0;
@@ -70,9 +82,10 @@ 

Effect of execution policies on algorithm execution

The above example depends on the order of execution of the iterations, and is therefore undefined (may deadlock). -
+      
 
+
 using namespace std::experimental::parallel;
 int x=0;
@@ -82,12 +95,39 @@ 

Effect of execution policies on algorithm execution

m.lock(); ++x; m.unlock(); -});
+});
The above example synchronizes access to object x ensuring that it is incremented correctly. +

+ The invocations of element access functions in parallel algorithms invoked with an + execution policy of type unsequenced_policy are permitted to execute + in an unordered fashion in the calling thread, unsequenced with respect to one another + within the calling thread. + + + This means that multiple function object invocations may be interleaved on a single thread. + +

+
+ + + This overrides the usual guarantee from the C++ standard, Section 1.9 [intro.execution] that + function executions do not interleave with one another. + +

+ +

+ The invocations of element access functions in parallel algorithms invoked with an + executino policy of type vector_policy are permitted to execute + in an unordered fashion in the calling thread, unsequenced with respect to one another + within the calling thread, subject to the sequencing constraints of wavefront application + () for the last argument to + for_loop or for_loop_strided. +

+

The invocations of element access functions in parallel algorithms invoked with an execution policy of type parallel_vector_execution_policy @@ -163,9 +203,109 @@

Effect of execution policies on algorithm execution

+ +

Wavefront Application

+

+ For the purposes of this section, an evaluation is a value computation or side effect of + an expression, or an execution of a statement. Initialization of a temporary object is considered a + subexpression of the expression that necessitates the temporary object. +

+ +

+ An evaluation A contains an evaluation B if: + +

    +
  • A and B are not potentially concurrent ([intro.races]); and
  • +
  • the start of A is the start of B or the start of A is sequenced before the start of B; and
  • +
  • the completion of B is the completion of A or the completion of B is sequenced before the completion of A.
  • +
+ + This includes evaluations occurring in function invocations. +

+ +

+ An evaluation A is ordered before an evaluation B if A is deterministically + sequenced before B. If A is indeterminately sequenced with respect to B + or A and B are unsequenced, then A is not ordered before B and B is not ordered + before A. The ordered before relationship is transitive. +

+ +

+ For an evaluation A ordered before an evaluation B, both contained in the same + invocation of an element access function, A is a vertical antecedent of B if: + +

    +
  • there exists an evaluation S such that: +
      +
    • S contains A, and
    • +
    • S contains all evaluations C (if any) such that A is ordered before C and C is ordered before B,
    • +
    • but S does not contain B, and
    • +
    +
  • +
  • + control reached B from A without executing any of the following: +
      +
    • a goto statement or asm declaration that jumps to a statement outside of S, or
    • +
    • a switch statement executed within S that transfers control into a substatement of a nested selection or iteration statement, or
    • +
    • a throw even if caught, or
    • +
    • a longjmp. +
    +
  • +
+ + + Vertical antecedent is an irreflexive, antisymmetric, nontransitive relationship between two evaluations. + Informally, A is a vertical antecedent of B if A is sequenced immediately before B or A is nested zero or + more levels within a statement S that immediately precedes B. + +

+ +

+ In the following, Xi and Xj refer to evaluations of the same expression + or statement contained in the application of an element access function corresponding to the ith and + jth elements of the input sequence. There might be several evaluations Xk, + Yk, etc. of a single expression or statement in application k, for example, if the + expression or statement appears in a loop within the element access function. +

+ +

+ Horizontally matched is an equivalence relationship between two evaluations of the same expression. An + evaluation Bi is horizontally matched with an evaluation Bj if: + +

    +
  • both are the first evaluations in their respective applications of the element access function, or
  • +
  • there exist horizontally matched evaluations Ai and Aj that are vertical antecedents of evaluations Bi and Bj, respectively. +
+ + + Horizontally matched establishes a theoretical lock-step relationship between evaluations in different applications of an element access function. + +

+ +

+ Let f be a function called for each argument list in a sequence of argument lists. + Wavefront application of f requires that evaluation Ai be sequenced + before evaluation Bi if i < j and and: + +

    +
  • Ai is sequenced before some evaluation Bi and Bi is horizontally matched with Bj, or
  • +
  • Ai is horizontally matched with some evaluation Aj and Aj is sequenced before Bj.
  • +
+ + + Wavefront application guarantees that parallel applications i and j execute such that progress on application j never gets ahead of application i. + + + + The relationships between Ai and Bi and between Aj and Bj are sequenced before, not vertical antecedent. + +

+
+ -

ExecutionPolicy algorithm overloads

+

ExecutionPolicy algorithm overloads

+

The Parallel Algorithms Library provides overloads for each of the algorithms named in Table 1, corresponding to the algorithms with the same name in the C++ Standard Algorithms Library. @@ -179,19 +319,25 @@

ExecutionPolicy algorithm overloads

In addition, each such overload shall have the new function parameter as the first function parameter of type ExecutionPolicy&&.

+
+

Unless otherwise specified, the semantics of ExecutionPolicy algorithm overloads are identical to their overloads without.

+
+

Parallel algorithms shall not participate in overload resolution unless is_execution_policy<decay_t<ExecutionPolicy>>::value is true.

+
+ - + @@ -313,16 +459,20 @@

ExecutionPolicy algorithm overloads

Table of parallel algorithmsTable of parallel algorithms
adjacent_difference adjacent_find
+
+ Not all algorithms in the Standard Library have counterparts in . +
-

Definitions

+

Definitions

+

Define GENERALIZED_SUM(op, a1, ..., aN) as follows: @@ -340,7 +490,9 @@

Definitions

+
+

Define GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aN) as follows: @@ -353,6 +505,7 @@

Definitions

+
@@ -362,10 +515,12 @@

Non-Numeric Parallel Algorithms

Header <experimental/algorithm> synopsis

-namespace std {
-namespace experimental {
-namespace parallel {
-inline namespace v2 {
+#include <algorithm>
+
+namespace std {
+namespace std::experimental {
+inline namespace parallelism_v2 {
+inline namespace v2 {
   template<class ExecutionPolicy,
            class InputIterator, class Function>
     void for_each(ExecutionPolicy&& exec,
@@ -378,66 +533,103 @@ 

Header <experimental/algorithm> synopsis

class InputIterator, class Size, class Function> InputIterator for_each_n(ExecutionPolicy&& exec, InputIterator first, Size n, - Function f); -} + Function f);
+ +namespace execution { + + template<class F> + auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)()); + + + template<class T> + class ordered_update_t; + + + template<class T> + ordered_update_t<T> ordered_update(T& ref) noexcept; } +} } } +}
-

For each

+

For each

- template<class ExecutionPolicy, + template<class ExecutionPolicy, class InputIterator, class Function> void for_each(ExecutionPolicy&& exec, InputIterator first, InputIterator last, - Function f); + Function f); + + Applies f to the result of dereferencing every iterator in the range [first,last). If the type of first satisfies the requirements of a mutable iterator, f may apply nonconstant functions through the dereferenced iterator. + + + - Applies f exactly last - first times. + Applies f exactly last - first times. + + - If f returns a result, the result is ignored. + If f returns a result, the result is ignored. + + + Unlike its sequential form, the parallel overload of for_each does not return a copy of its Function parameter, since parallelization may not permit efficient state accumulation. + + + + + Unlike its sequential form, the parallel overload of for_each requires Function to meet the requirements of CopyConstructible. + + - template<class InputIterator, class Size, class Function> + template<class InputIterator, class Size, class Function> InputIterator for_each_n(InputIterator first, Size n, - Function f); +Function f); + + Function shall meet the requirements of MoveConstructible Function need not meet the requirements of CopyConstructible. + + + + Applies f to the result of dereferencing every iterator in the range [first,first + n), starting from first and proceeding to first + n - 1. @@ -445,25 +637,37 @@

For each

If the type of first satisfies the requirements of a mutable iterator, f may apply nonconstant functions through the dereferenced iterator. +
+
+ + first + n for non-negative values of n and first for negative values. + + + + If f returns a result, the result is ignored. + +
- template<class ExecutionPolicy, + template<class ExecutionPolicy, class InputIterator, class Size, class Function> InputIterator for_each_n(ExecutionPolicy && exec, InputIterator first, Size n, - Function f); + Function f); + + Applies f to the result of dereferencing every iterator in the range [first,first + n), starting from first and proceeding to first + n - 1. @@ -471,30 +675,174 @@

For each

If the type of first satisfies the requirements of a mutable iterator, f may apply nonconstant functions through the dereferenced iterator. +
+
+ - first + n for non-negative values of n and first for negative values. + first + n for non-negative values of n and first for negative values. + + - If f returns a result, the result is ignored. + If f returns a result, the result is ignored. + + + Unlike its sequential form, the parallel overload of for_each_n requires Function to meet the requirements of CopyConstructible. + + + +
+
+ + +

No vec

+ + + template<class F> +auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)()); + + + Evaluates std::forward>F<(f)(). When invoked within an element access function + in a parallel algorithm using vector_policy, if two calls to no_vec are + horizontally matched within a wavefront application of an element access function over input + sequence S, then the execution of f in the application for one element in S is + sequenced before the execution of f in the application for a subsequent element in + S; otherwise, there is no effect on sequencing. + + + + the result of f. + + + + + If f returns a result, the result is ignored. + + + + + If f exits via an exception, then terminate will be called, consistent + with all other potentially-throwing operations invoked with vector_policy execution. + + +
extern int* p;
+for_loop(vec, 0, n[&](int i) {
+  y[i] +=y[i+1];
+  if(y[i] < 0) {
+    no_vec([]{
+      *p++ = i;
+    });
+  }
+});
+ + The updates *p++ = i will occur in the same order as if the policy were seq. +
+ + +

Ordered update class

+ +
+class ordered_update_t {
+  T& ref_; // exposition only
+public:
+  ordered_update_t(T& loc) noexcept
+    : ref_(loc) {}
+  ordered_update_t(const ordered_update_t&) = delete;
+  ordered_update_t& operator=(const ordered_update_t&) = delete;
+
+  template <class U>
+    auto operator=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ = std::move(rhs); }); }
+  template <class U>
+    auto operator+=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ += std::move(rhs); }); }
+  template <class U>
+    auto operator-=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ -= std::move(rhs); }); }
+  template <class U>
+    auto operator*=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ *= std::move(rhs); }); }
+  template <class U>
+    auto operator/=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ /= std::move(rhs); }); }
+  template <class U>
+    auto operator%=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ %= std::move(rhs); }); }
+  template <class U>
+    auto operator>>=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ >>= std::move(rhs); }); }
+  template <class U>
+    auto operator<<=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ <<= std::move(rhs); }); }
+  template <class U>
+    auto operator&=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ &= std::move(rhs); }); }
+  template <class U>
+    auto operator^=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ ^= std::move(rhs); }); }
+  template <class U>
+    auto operator|=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ |= std::move(rhs); }); }
+
+  auto operator++() const noexcept
+    { return no_vec([&]{ return ++ref_; }); }
+  auto operator++(int) const noexcept
+    { return no_vec([&]{ return ref_++; }); }
+  auto operator--() const noexcept
+    { return no_vec([&]{ return --ref_; }); }
+  auto operator--(int) const noexcept
+    { return no_vec([&]{ return ref_--; }); }
+};
+
+ +

+ An object of type ordered_update_t><T<> is a proxy for an object of type T + intended to be used within a parallel application of an element access function using a + policy object of type vector_policy. Simple increments, assignments, and compound + assignments to the object are forwarded to the proxied object, but are sequenced as though + executed within a no_vec invocation. + + + The return-value deduction of the forwarded operations results in these operations returning by + value, not reference. This formulation prevents accidental collisions on accesses to the return + value. + +

+
+ + +

Ordered update function template

+ + + template<T> +ordered_update_t<T> ordered_update(T& loc) noexcept; + + + { loc }. + + + +
-

Numeric Parallel Algorithms

+

Numeric Parallel Algorithms

-

Header <experimental/numeric> synopsis

+

Header <experimental/numeric> synopsis

+
 namespace std {
 namespace experimental {
@@ -658,267 +1006,364 @@ 

Header <experimental/numeric> synopsis

} }
+
-

Reduce

+

Reduce

- template<class InputIterator> + template<class InputIterator> typename iterator_traits<InputIterator>::value_type - reduce(InputIterator first, InputIterator last); + reduce(InputIterator first, InputIterator last); + - Same as reduce(first, last, typename iterator_traits<InputIterator>::value_type{}). + Same as reduce(first, last, typename iterator_traits<InputIterator>::value_type{}). + - template<class InputIterator, class T> -T reduce(InputIterator first, InputIterator last, T init); + template<class InputIterator, class T> +T reduce(InputIterator first, InputIterator last, T init); + - Same as reduce(first, last, init, plus<>()). + Same as reduce(first, last, init, plus<>()). + - template<class InputIterator, class T, class BinaryOperation> + template<class InputIterator, class T, class BinaryOperation> T reduce(InputIterator first, InputIterator last, T init, - BinaryOperation binary_op); + BinaryOperation binary_op); + - GENERALIZED_SUM(binary_op, init, *first, ..., *(first + (last - first) - 1)). + GENERALIZED_SUM(binary_op, init, *first, ..., *(first + (last - first) - 1)). + + - binary_op shall not invalidate iterators or subranges, nor modify elements in the - range [first,last). + binary_op shall not invalidate iterators or subranges, nor modify elements in the + range [first,last). + + - O(last - first) applications of binary_op. + O(last - first) applications of binary_op. + + + The primary difference between reduce and accumulate is that the behavior of reduce may be non-deterministic for non-associative or non-commutative binary_op. + +
-

Exclusive scan

+

Exclusive scan

- template<class InputIterator, class OutputIterator, class T> + template<class InputIterator, class OutputIterator, class T> OutputIterator exclusive_scan(InputIterator first, InputIterator last, OutputIterator result, - T init); + T init); + - Same as exclusive_scan(first, last, result, init, plus<>()). + Same as exclusive_scan(first, last, result, init, plus<>()). + - template<class InputIterator, class OutputIterator, class T, class BinaryOperation> + template<class InputIterator, class OutputIterator, class T, class BinaryOperation> OutputIterator exclusive_scan(InputIterator first, InputIterator last, OutputIterator result, - T init, BinaryOperation binary_op); + T init, BinaryOperation binary_op); + + Assigns through each iterator i in [result,result + (last - first)) the value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, *first, ..., *(first + (i - result) - 1)). + + + - The end of the resulting range beginning at result. + The end of the resulting range beginning at result. + + + binary_op shall not invalidate iterators or subranges, nor modify elements in the ranges [first,last) or [result,result + (last - first)). + + + - O(last - first) applications of binary_op. + O(last - first) applications of binary_op. + + + The difference between exclusive_scan and inclusive_scan is that exclusive_scan excludes the ith input element from the ith sum. If binary_op is not mathematically associative, the behavior of exclusive_scan may be non-deterministic. + +
-

Inclusive scan

+

Inclusive scan

- template<class InputIterator, class OutputIterator> + template<class InputIterator, class OutputIterator> OutputIterator inclusive_scan(InputIterator first, InputIterator last, - OutputIterator result); + OutputIterator result); + + Same as inclusive_scan(first, last, result, plus<>()). + + - template<class InputIterator, class OutputIterator, class BinaryOperation> + template<class InputIterator, class OutputIterator, class BinaryOperation> OutputIterator inclusive_scan(InputIterator first, InputIterator last, OutputIterator result, - BinaryOperation binary_op); - template<class InputIterator, class OutputIterator, class BinaryOperation, class T> + BinaryOperation binary_op); + template<class InputIterator, class OutputIterator, class BinaryOperation, class T> OutputIterator inclusive_scan(InputIterator first, InputIterator last, OutputIterator result, - BinaryOperation binary_op, T init); + BinaryOperation binary_op, T init); + + Assigns through each iterator i in [result,result + (last - first)) the value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, *first, ..., *(first + (i - result))) or GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, *first, ..., *(first + (i - result))) if init is provided. + + + + The end of the resulting range beginning at result. + + + + binary_op shall not invalidate iterators or subranges, nor modify elements in the ranges [first,last) or [result,result + (last - first)). + + + - O(last - first) applications of binary_op. + O(last - first) applications of binary_op. + + + The difference between exclusive_scan and inclusive_scan is that inclusive_scan includes the ith input element in the ith sum. If binary_op is not mathematically associative, the behavior of inclusive_scan may be non-deterministic. + +
-

Transform reduce

+

Transform reduce

- template<class InputIterator, class UnaryFunction, class T, class BinaryOperation> + template<class InputIterator, class UnaryFunction, class T, class BinaryOperation> T transform_reduce(InputIterator first, InputIterator last, - UnaryOperation unary_op, T init, BinaryOperation binary_op); + UnaryOperation unary_op, T init, BinaryOperation binary_op); + + GENERALIZED_SUM(binary_op, init, unary_op(*first), ..., unary_op(*(first + (last - first) -
1))). +
+
+ - Neither unary_op nor binary_op shall invalidate subranges, or modify elements in the range [first,last) + Neither unary_op nor binary_op shall invalidate subranges, or modify elements in the range [first,last) + + - O(last - first) applications each of unary_op and binary_op. + O(last - first) applications each of unary_op and binary_op. + + - transform_reduce does not apply unary_op to init. + transform_reduce does not apply unary_op to init. +
-

Transform exclusive scan

+

Transform exclusive scan

- template<class InputIterator, class OutputIterator, + template<class InputIterator, class OutputIterator, class UnaryOperation, class T, class BinaryOperation> OutputIterator transform_exclusive_scan(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation unary_op, - T init, BinaryOperation binary_op); + T init, BinaryOperation binary_op); + + Assigns through each iterator i in [result,result + (last - first)) the value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, unary_op(*first), ..., unary_op(*(first + (i
- result) - 1))). +
+
+ - The end of the resulting range beginning at result. + The end of the resulting range beginning at result. + + + Neither unary_op nor binary_op shall invalidate iterators or subranges, or modify elements in the ranges [first,last) or [result,result + (last - first)). + + + - O(last - first) applications each of unary_op and binary_op. + O(last - first) applications each of unary_op and binary_op. + + + The difference between transform_exclusive_scan and transform_inclusive_scan is that transform_exclusive_scan excludes the ith input element from the ith sum. If binary_op is not mathematically associative, the behavior of transform_exclusive_scan may be non-deterministic. transform_exclusive_scan does not apply unary_op to init. + +
-

Transform inclusive scan

+

Transform inclusive scan

- template<class InputIterator, class OutputIterator, + template<class InputIterator, class OutputIterator, class UnaryOperation, class BinaryOperation> OutputIterator transform_inclusive_scan(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation unary_op, - BinaryOperation binary_op); + BinaryOperation binary_op); - template<class InputIterator, class OutputIterator, + template<class InputIterator, class OutputIterator, class UnaryOperation, class BinaryOperation, class T> OutputIterator transform_inclusive_scan(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation unary_op, - BinaryOperation binary_op, T init); + BinaryOperation binary_op, T init); + + Assigns through each iterator i in [result,result + (last - first)) the value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, unary_op(*first), ..., unary_op(*(first + (i -
result)))) or GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, unary_op(*first), ..., unary_op(*(first + (i
- result)))) if init is provided. +
+
+ - The end of the resulting range beginning at result. + The end of the resulting range beginning at result. + + + Neither unary_op nor binary_op shall invalidate iterators or subranges, or modify elements in the ranges [first,last) or [result,result + (last - first)). + + + - O(last - first) applications each of unary_op and binary_op. + O(last - first) applications each of unary_op and binary_op. + + + The difference between transform_exclusive_scan and transform_inclusive_scan is that transform_inclusive_scan includes the ith input element from the ith sum. If binary_op is not mathematically associative, the behavior of transform_inclusive_scan may be non-deterministic. transform_inclusive_scan does not apply unary_op to init. + +
diff --git a/cxx17_index.json b/cxx17_index.json new file mode 100644 index 0000000..bf5a888 --- /dev/null +++ b/cxx17_index.json @@ -0,0 +1,5 @@ +{ + "intro.execution": "4.6", + "library": "20" +} + diff --git a/cxx_N3797_index.json b/cxx_N3797_index.json deleted file mode 100644 index 1535b70..0000000 --- a/cxx_N3797_index.json +++ /dev/null @@ -1,49 +0,0 @@ -{ - "basic.stc.dynamic.deallocation": "3.7.4.2", - "expr": "5", - "expr.call": "5.2.2", - "expr.cond": "5.16", - "dcl.constexpr": "7.1.5", - "over.match.best": "13.3.3", - "temp.deduct": "14.8.2", - "library": "17", - "hash.requirements": "17.6.3.4", - "allocator.requirements": "17.6.3.5", - "syserr": "19.5", - "utility.swap": "20.2.2", - "tuple.helper": "20.4.2.5", - "allocator.uses": "20.7.7", - "allocator.uses.construction": "20.7.7.2", - "util.smartptr": "20.8.2", - "util.smartptr.shared": "20.8.2.2", - "util.smartptr.shared.const": "20.8.2.2.1", - "util.smartptr.shared.obs": "20.8.2.2.5", - "util.smartptr.shared.cast": "20.8.2.2.9", - "util.smartptr.weak": "20.8.2.3", - "util.smartptr.weak.const": "20.8.2.3.1", - "function.objects": "20.9", - "bind": "20.9.9", - "func.wrap.func": "20.9.11.2", - "func.wrap.func.con": "20.9.11.2.1", - "func.wrap.func.mod": "20.9.11.2.2", - "unord.hash": "20.9.12", - "meta.rqmts": "20.10.1", - "meta.unary.cat": "20.10.4.1", - "meta.unary.comp": "20.10.4.2", - "meta.unary.prop": "20.10.4.3", - "meta.unary.prop.query": "20.10.5", - "meta.rel": "20.10.6", - "meta.trans.other": "20.10.7.6", - "ratio.comparison": "20.11.5", - "time.traits": "20.12.4", - "strings.general": "21.1", - "char.traits": "21.2", - "container.requirements": "23.2", - "iterator.traits": "24.4.1", - "iterator.range": "24.7", - "algorithms.general": "25.1", - "rand.req.urng": "26.5.1.3", - "futures.promise": "30.6.5", - "futures.task": "30.6.9", - "futures.task.nonmembers": "30.6.9.2" -} diff --git a/exceptions.html b/exceptions.html index 2e6bb4c..a3e19aa 100644 --- a/exceptions.html +++ b/exceptions.html @@ -1,7 +1,8 @@

Parallel exceptions

-

Exception reporting behavior

+

Exception reporting behavior

+

During the execution of a standard parallel algorithm, if temporary memory resources are required and none are available, @@ -14,7 +15,7 @@

Exception reporting behavior

  • - If the execution policy object is of type class parallel_vector_execution_policy, + If the execution policy object is of type parallel_vector_execution_policy, unsequenced_policy, or vector_policy, std::terminate shall be called.
  • @@ -49,20 +50,22 @@

    Exception reporting behavior

+

Header <experimental/exception_list> synopsis

 
-namespace std {
-namespace experimental {
-namespace parallel {
-inline namespace v2 {
+namespace std {
+namespace std::experimental {
+inline namespace parallelism_v2 {
+inline namespace v2 {
 
   class exception_list : public exception
   {
     public:
-      typedef unspecified iterator;
+      typedef unspecified iterator;
+      using iterator = unspecified;
   
       size_t size() const noexcept;
       iterator begin() const noexcept;
@@ -70,16 +73,16 @@ 

Header <experimental/exception_list> synopsis

const char* what() const noexcept override; }; +} } } -} -} +}

- The class exception_list owns a sequence of exception_ptr objects. The parallel + The class exception_list owns a sequence of exception_ptr objects. The parallel algorithms may use the exception_list to communicate uncaught exceptions encountered during parallel execution to the - caller of the algorithm. + caller of the algorithm.

diff --git a/execution_policies.html b/execution_policies.html index 13fbe0a..0ee67d7 100644 --- a/execution_policies.html +++ b/execution_policies.html @@ -1,6 +1,7 @@

Execution policies

- + +

In general

This clause describes classes that are execution policy types. An object @@ -44,15 +45,18 @@

In general

may provide additional execution policies to those described in this Technical Specification as extensions. +
-

Header <experimental/execution_policy> synopsis

+

Header <experimental/execution_policy> synopsis

-namespace std {
-namespace experimental {
-namespace parallel {
-inline namespace v2 {
+#include <execution>
+
+namespace std {
+namespace std::experimental {
+inline namespace parallelism_v2 {
+inline namespace v2 {
   
   template<class T> struct is_execution_policy;
   template<class T> constexpr bool is_execution_policy_v = is_execution_policy<T>::value;
@@ -63,18 +67,31 @@ 

Header <experimental/execution_policy> synopsis

class parallel_execution_policy; - + class parallel_vector_execution_policy; class execution_policy; +
+namespace execution { + + class unsequenced_policy; + + + class vector_policy; + + + inline constexpr unsequenced_policy unseq{ unspecified }; + inline constexpr parallel_policy par{ unspecified }; } +} } } -} +}
- + +

Execution policy type trait

@@ -93,9 +110,11 @@ 

Execution policy type trait

The behavior of a program that adds specializations for is_execution_policy is undefined.

+ - + +

Sequential execution policy

@@ -104,8 +123,10 @@ 

Sequential execution policy

The class sequential_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized.

+ - + +

Parallel execution policy

@@ -114,8 +135,10 @@ 

Parallel execution policy

The class parallel_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.

+ - + +

Parallel+Vector execution policy

@@ -124,9 +147,44 @@ 

Parallel+Vector execution policy

The class class parallel_vector_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized and parallelized.

+ + + + +

Unsequenced execution policy

+ +
+class unsequenced_policy{ unspecified };
+
+ +

The class unsequenced_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized, e.g., executed on a single thread using instructions that operate on multiple data items.

+ +

The invocations of element access functions in parallel algorithms invoked with an execution policy of type unsequenced_policy are permitted to execute in an unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread. + This means that multiple function object invocations may be interleaved on a single thread.

+ +

This overrides the usual guarantee from the C++ Standard, [intro.execution] that function executions do not overlap with one another.

+ +

During the execution of a parallel algorithm with the experimental::execution::unsequenced_policy policy, if the invocation of an element access function exits via an uncaught exception, terminate() shall be called.

+ +
+ + +

Vector execution policy

+ +
+class vector_policy{ unspecified };
+
+ +

The class vector_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized. Additionally, such vectorization will result in an execution that respects the sequencing constraints of wavefront application ([parallel.alg.general.wavefront]). The implementation thus makes stronger guarantees than for unsequenced_policy, for example.

+ +

The invocations of element access functions in parallel algorithms invoked with an execution policy of type vector_policy are permitted to execute in unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread, subject to the sequencing constraints of wavefront application () for the last argument to for_loop or for_loop_strided.

+ +

During the execution of a parallel algorithm with the experimental::execution::vector_policy policy, if the invocation of an element access function exits via an uncaught exception, terminate() shall be called.

+
- + +

Dynamic execution policy

@@ -164,47 +222,56 @@ 

Dynamic execution policy

Objects of type execution_policy shall be constructible and assignable from objects of type T for which is_execution_policy<T>::value is true.

+ -

execution_policy construct/assign

+

execution_policy construct/assign

- template<class T> execution_policy(const T& exec); + template<class T> execution_policy(const T& exec); - Constructs an execution_policy object with a copy of exec's state. + Constructs an execution_policy object with a copy of exec's state. + + This constructor shall not participate in overload resolution unless is_execution_policy<T>::value is true. + + - template<class T> execution_policy& operator=(const T& exec); + + template<class T> execution_policy& operator=(const T& exec); - Assigns a copy of exec's state to *this. + Assigns a copy of exec's state to *this. - *this. + *this. +
-

execution_policy object access

+

execution_policy object access

- const type_info& type() const noexcept; + const type_info& type() const noexcept; - typeid(T), such that T is the type of the execution policy object contained by *this. + typeid(T), such that T is the type of the execution policy object contained by *this. - template<class T> T* get() noexcept; - template<class T> const T* get() const noexcept; + template<class T> T* get() noexcept; + template<class T> const T* get() const noexcept; - If target_type() == typeid(T), a pointer to the stored execution policy object; otherwise a null pointer. + If target_type() == typeid(T), a pointer to the stored execution policy object; otherwise a null pointer. + - is_execution_policy<T>::value is true. + is_execution_policy<T>::value is true. +
@@ -214,11 +281,13 @@

execution_policy object access

Execution policy objects

-constexpr sequential_execution_policy      seq{};
+constexpr sequential_execution_policy      seq{};
 constexpr parallel_execution_policy        par{};
-constexpr parallel_vector_execution_policy par_vec{};
+constexpr parallel_vector_execution_policy par_vec{};
+constexpr execution::unsequenced_policy unseq{};
+constexpr execution::vector_policy vec{};
 
-

The header <experimental/execution_policy> declares a global object associated with each type of execution policy defined by this Technical Specification.

+

The header <experimental/execution_policy> declares a global object associated with each type of execution policy defined by this Technical Specification.

diff --git a/front_matter.html b/front_matter.html index 931dd7e..9a1ad15 100644 --- a/front_matter.html +++ b/front_matter.html @@ -1,8 +1,8 @@ -4669 +N4698 19570 - - N4579 + + N4669 Jared Hoberock
NVIDIA Corporation
diff --git a/general.html b/general.html index c3e9ac0..6c82000 100644 --- a/general.html +++ b/general.html @@ -7,7 +7,7 @@

Namespaces and headers

experimental and not part of the C++ Standard Library, they should not be declared directly within namespace std. Unless otherwise specified, all components described in this Technical Specification are declared in namespace - std::experimental::parallel::v2.

+ std::experimental::parallelism_v2parallel::v2.

Once standardized, the components described by this Technical Specification are expected to be promoted to namespace std. @@ -15,7 +15,7 @@

Namespaces and headers

Unless otherwise specified, references to such entities described in this Technical Specification are assumed to be qualified with - std::experimental::parallel::v2, and references to entities described in the C++ + std::experimental::parallelism_v2parallel::v2, and references to entities described in the C++ Standard Library are assumed to be qualified with std::.

Extensions that are expected to eventually be added to an existing header @@ -36,27 +36,50 @@

Feature-testing recommendations

- Name + Doc. No. + Title + Primary Section + Macro Name Value Header - __cpp_lib_experimental_parallel_algorithm - 201505 + N4505 + Working Draft, Technical Specification for C++ Extensions for Parallelism + + __cpp_lib_experimental_parallel_algorithm + 201505 + <experimental/algorithm>
<experimental/exception_list>
<experimental/execution_policy>
<experimental/numeric> +
+ P0155R0 + Task Block R5 + __cpp_lib_experimental_parallel_task_block - 201510 + 201711201510 + <experimental/exception_list>
<experimental/task_block>
+ + P0076R4 + Vector and Wavefront Policies + , + __cpp_lib_experimental_execution_vector_policy + 201711201707 + + <experimental/algorithm>
+ <experimental/execution>
+ + diff --git a/normative_references.html b/normative_references.html index 4aad4a1..a082cbf 100644 --- a/normative_references.html +++ b/normative_references.html @@ -7,19 +7,19 @@

Normative references

of the referenced document (including any amendments) applies.

    -
  • ISO/IEC 14882:—To be published. Section references are relative to N3937., +
  • ISO/IEC 14882:2017To be published. Section references are relative to N3937., Programming Languages — C++ -
  • +
-

ISO/IEC 14882:— is herein called the C++ Standard. - The library described in ISO/IEC 14882:— clauses 17-30 is herein called +

ISO/IEC 14882:2017 is herein called the C++ Standard. + The library described in ISO/IEC 14882:2017 clauses 20-3317-30 is herein called the C++ Standard Library. The C++ Standard Library components described in - ISO/IEC 14882:— clauses 25, 26.7 and 20.7.2 are herein called the C++ Standard + ISO/IEC 14882:2017 clauses 28, 29.8 and 23.10.1025, 26.7 and 20.7.2 are herein called the C++ Standard Algorithms Library.

Unless otherwise specified, the whole of the C++ Standard's Library - introduction () is included into this + introduction (C++14 §20) is included into this Technical Specification by reference.

diff --git a/task_block.html b/task_block.html index e006c52..4855e55 100644 --- a/task_block.html +++ b/task_block.html @@ -5,10 +5,10 @@

Task Block

Header <experimental/task_block> synopsis

-namespace std {
-namespace experimental {
-namespace parallel {
-inline namespace v2 {
+namespace std {
+namespace std::experimental {
+inline namespace parallelism_v2 {
+inline namespace v2 {
   class task_cancelled_exception;
 
   class task_block;
@@ -18,10 +18,10 @@ 

Header <experimental/task_block> synopsis

template<class f> void define_task_block_restore_thread(F&& f); +} } } -} -} +}
@@ -29,21 +29,21 @@

Header <experimental/task_block> synopsis

Class task_cancelled_exception

 
-namespace std {
-namespace experimental {
-namespace parallel
-inline namespace v2 {
+namespace std {
+namespace std::experimental {
+inline namespace parallelism_v2 {
+inline namespace v2 {
 
   class task_cancelled_exception : public exception
   {
     public:
       task_cancelled_exception() noexcept;
-      virtual const char* what() const noexcept;
+      virtual const char* what() const noexcept override;
   };
+}
 }
 }
-}
-}
+}
      

@@ -69,10 +69,10 @@

task_cancelled_exception member function what

Class task_block

 
-namespace std {
-namespace experimental {
-namespace parallel {
-inline namespace v2 {
+namespace std {
+namespace std::experimental {
+inline namespace parallelism_v2 {
+inline namespace v2 {
 
   class task_block
   {
@@ -89,10 +89,10 @@ 

Class task_block

void wait(); }; +} } } -} -} +}

diff --git a/terms_and_definitions.html b/terms_and_definitions.html index 22aa5e4..86a659d 100644 --- a/terms_and_definitions.html +++ b/terms_and_definitions.html @@ -1,6 +1,18 @@

Terms and definitions

+ +
    +
  • No terms and definitions are listed in this document.
  • +
  • ISO and IEC maintained terminological databases for us in standardization at the following addresses:
  • + +
+
+ +

For the purposes of this document, the terms and definitions given in the C++ Standard and the following apply.

A parallel algorithm is a function template described by this Technical Specification declared in namespace std::experimental::parallel::v2 with a formal template parameter named ExecutionPolicy.

@@ -50,5 +62,6 @@

Terms and definitions

+