algorithms.html

<cxx-clause id="parallel.alg">
  <h1>Parallel algorithms</h1>

  <cxx-section id="parallel.alg.general">
    <h1>In general</h1>

    This clause describes components that C++ programs may use to perform operations on containers
    and other sequences in parallel.

    <cxx-section id="parallel.alg.general.exec">
      <h1>Effect of execution policies on algorithm execution</h1>

      <p>
        Parallel algorithms have template parameters named <code>ExecutionPolicy</code> which describe
        the manner in which the execution of these algorithms may be parallelized and the manner in
        which they apply user-provided function objects.
      </p>

      <p>
        The applications of function objects in parallel algorithms invoked with an execution policy
        object of type <code>sequential_execution_policy</code> execute in sequential order in the
        calling thread.
      </p>

      <p>
        The applications of function objects in parallel algorithms invoked with an execution policy
        object of type <code>parallel_execution_policy</code> are permitted to execute in an unordered
        fashion in unspecified threads, and indeterminately sequenced within each thread. 

        <cxx-note>
          It is the caller's responsibility to ensure correctness, for example that the invocation does
          not introduce data races or deadlocks.
        </cxx-note>
      </p>

      <cxx-example><pre>using namespace std::experimental::parallel<ins2>_v1</ins2>;
int a[] = {0,1};
std::vector&lt;int&gt; v;
for_each(par, std::begin(a), std::end(a), [&amp;](int i) {
  v.push_back(i*2+1);
});
<del2>foo bar</del2></pre>

        The program above has a data race because of the unsynchronized access to the container
        <code>v</code>.
      </cxx-example><pre>
</pre>

      <cxx-example><pre>
using namespace std::experimental::parallel<ins2>_v1</ins2>;
std::atomic<ins2>&lt;int></ins2> x = 0;
int a[] = {1,2};
for_each(par, std::begin(a), std::end(a), [<ins2>&amp;</ins2>](int n) {
  x.fetch_add(1, std::memory_order_relaxed);
  // spin wait for another iteration to change the value of x
  while (x.load(std::memory_order_relaxed) == 1) { }
});</pre>

        The above example depends on the order of execution of the iterations, and is therefore
        undefined (may deadlock).
      </cxx-example><pre>
</pre>

      <cxx-example><pre>
using namespace std::experimental::parallel<ins2>_v1</ins2>;
int x<ins2>=0</ins2>;
std::mutex m;
int a[] = {1,2};
for_each(par, std::begin(a), std::end(a), [&amp;](int) {
  m.lock();
  ++x;
  m.unlock();
});</pre>

        The above example synchronizes access to object <code>x</code> ensuring that it is
        incremented correctly.
      </cxx-example>

      <p>
        The applications of function objects in parallel algorithms invoked with an execution policy
        of type <code><del2>vector_execution_policy</del2><ins2>parallel_vector_execution_policy</ins2></code> are permitted to execute in an unordered fashion
        in unspecified threads, and unsequenced within each thread.
        <ins2>
        <cxx-note>
          This means that multiple function-object invocations may be interleaved on a single thread.
        </cxx-note>
        </ins2>
        <cxx-note>
          As a consequence, function objects governed by the <code><del2>vector_execution_policy</del2><ins2>parallel_vector_execution_policy</ins2></code>
          policy must not synchronize with each other. Specifically, they must not acquire locks.
        </cxx-note>
      </p>

      <cxx-example><pre>
using namespace std::experimental::parallel<ins2>_v1</ins2>;
int x<ins2>=0</ins2>;
std::mutex m;
int a[] = {1,2};
for_each(par_vec, std::begin(a), std::end(a), [&amp;](int) {
  m.lock();
  ++x;
  m.unlock();
});</pre>

        The above program is invalid because the applications of the function object are not
        guaranteed to run on different threads.
      </cxx-example><pre>
</pre>

      <cxx-note>
        The application of the function object may result in two consecutive calls to
        <code>m.lock</code> on the same thread, which may deadlock.
      </cxx-note><pre>
</pre>

      <cxx-note>
        The semantics of the <code>parallel_execution_policy</code> or the
        <code><del2>vector_execution_policy</del2><ins2>parallel_vector_execution_policy</ins2></code> invocation allow the implementation to fall back to
        sequential execution if the system cannot parallelize an algorithm invocation due to lack of
        resources.
      </cxx-note>

      <p>
        <del2>If they exist, a</del2><ins2>A</ins2> parallel algorithm invoked with an execution policy object of type
        <code>parallel_execution_policy</code> or <code><del2>vector_execution_policy</del2><ins2>parallel_vector_execution_policy</ins2></code> may apply
        iterator member functions of a stronger category than its specification requires<ins2>, if such iterators exist</ins2>. In this
        case, the application of these member functions are subject to provisions 3. and 4. above,
        respectively.
      </p>

      <cxx-note>
        For example, an algorithm whose specification requires <code>InputIterator</code> but
        receives a concrete iterator of the category <code>RandomAccessIterator</code> may use
        <code>operator[]</code>. In this case, it is the algorithm caller's responsibility to ensure
        <code>operator[]</code> is race-free.
      </cxx-note>

      <p>
        Algorithms invoked with an execution policy object of type <code>execution_policy</code>
        execute internally as if invoked with <del>instances of type <code>sequential_execution_policy</code>,
        <code>parallel_execution_policy</code>, or an implementation-defined execution policy type depending
        on the dynamic value of the <code>execution_policy</code> object.</del>
        <ins>the contained execution policy object.</ins>
      </p>

      <p>
        The semantics of parallel algorithms invoked with an execution policy object of
        implementation-defined type are <del2>unspecified</del2><ins2>implementation-defined</ins2>.
      </p>
    </cxx-section>

    <cxx-section id="parallel.alg.overloads">
      <h1><code>ExecutionPolicy</code> algorithm overloads</h1>

      <p>
        <del2>
        Parallel algorithms coexist alongside their sequential counterparts as overloads
        distinguished by a formal template parameter named <code>ExecutionPolicy</code>. This
        <del>template parameter corresponds to the parallel algorithm's first function parameter, whose
        type is <code>ExecutionPolicy</code></del>
        <insdel>is the first template parameter and corresponds to the parallel algorithm's first function
        parameter, whose type is <code>ExecutionPolicy&amp;&amp;</code></insdel>.</del2>
        <ins2>
The Parallel Algorithms Library provides overloads for each of the algorithms named in
Table 1, corresponding to the algorithms with the same name in the C++ Standard Algorithms Library.
For each algorithm in <cxx-ref to="tab.parallel.algorithms"></cxx-ref>, there shall be overloads with an additional
template type parameter named <code>ExecutionPolicy</code>, which is the first template parameter.
In addition, each such overload shall have the new function parameter as the
first function parameter of type <code>ExecutionPolicy&amp;&amp;</code>.
        </ins2>

      </p>

      <p>
        Unless otherwise specified, the semantics of <code>ExecutionPolicy</code> algorithm overloads
        are identical to their overloads without.
      </p>
        
      <p>
        Parallel algorithms
        <del>have the requirement <code>is_execution_policy&lt;ExecutionPolicy&gt;::value</code> is <code>true</code></del>
        <ins>shall not participate in overload resolution unless
        <code>is_execution_policy&lt;ExecutionPolicy&gt;::value</code> is <code>true</code></ins>.
      </p>
        
      <del2><p>The algorithms listed in <cxx-ref to="tab.parallel.algorithms"></cxx-ref> shall have <code>ExecutionPolicy</code> overloads.</p></del2>

      <table is="cxx-table" id="tab.parallel.algorithms" class="list">
        <caption>Table of parallel algorithms</caption>
          <tr>
            <td><ins><code>adjacent_difference</code></ins></td>
            <td><code>adjacent_find</code></td>
            <td><code>all_of</code></td>
            <td><code>any_of</code></td>
          </tr>
          <tr>
            <td><code>copy</code></td>
            <td><code>copy_if</code></td>
            <td><code>copy_n</code></td>
            <td><code>count</code></td>
          </tr>
          <tr>
            <td><code>count_if</code></td>
            <td><code>equal</code></td>
            <td><code>exclusive_scan</code></td>
            <td><code>fill</code></td>
          </tr>
          <tr>
            <td><code>fill_n</code></td>
            <td><code>find</code></td>
            <td><code>find_end</code></td>
            <td><code>find_first_of</code></td>
          </tr>
          <tr>
            <td><code>find_if</code></td>
            <td><code>find_if_not</code></td>
            <td><code>for_each</code></td>
            <td><code>for_each_n</code></td>
          </tr>
          <tr>
            <td><code>generate</code></td>
            <td><code>generate_n</code></td>
            <td><code>includes</code></td>
            <td><code>inclusive_scan</code></td>
          </tr>
          <tr>
            <td><ins><code>inner_product</code></ins></td>
            <td><code>inplace_merge</code></td>
            <td><code>is_heap</code></td>
            <td><code>is_heap_until</code></td>
          </tr>
          <tr>
            <td><code>is_partitioned</code></td>
            <td><code>is_sorted</code></td>
            <td><code>is_sorted_until</code></td>
            <td><code>lexicographical_compare</code></td>
          </tr>
          <tr>
            <td><code>max_element</code></td>
            <td><code>merge</code></td>
            <td><code>min_element</code></td>
            <td><code>minmax_element</code></td>
          </tr>
          <tr>
            <td><code>mismatch</code></td>
            <td><code>move</code></td>
            <td><code>none_of</code></td>
            <td><code>nth_element</code></td>
          </tr>
          <tr>
            <td><code>partial_sort</code></td>
            <td><code>partial_sort_copy</code></td>
            <td><code>partition</code></td>
            <td><code>partition_copy</code></td>
          </tr>
          <tr>
            <td><code>reduce</code></td>
            <td><code>remove</code></td>
            <td><code>remove_copy</code></td>
            <td><code>remove_copy_if</code></td>
          </tr>
          <tr>
            <td><code>remove_if</code></td>
            <td><code>replace</code></td>
            <td><code>replace_copy</code></td>
            <td><code>replace_copy_if</code></td>
          </tr>
          <tr>
            <td><code>replace_if</code></td>
            <td><code>reverse</code></td>
            <td><code>reverse_copy</code></td>
            <td><code>rotate</code></td>
          </tr>
          <tr>
            <td><code>rotate_copy</code></td>
            <td><code>search</code></td>
            <td><code>search_n</code></td>
            <td><code>set_difference</code></td>
          </tr>
          <tr>
            <td><code>set_intersection</code></td>
            <td><code>set_symmetric_difference</code></td>
            <td><code>set_union</code></td>
            <td><code>sort</code></td>
          </tr>
          <tr>
            <td><code>stable_partition</code></td>
            <td><code>stable_sort</code></td>
            <td><code>swap_ranges</code></td>
            <td><code>transform</code></td>
          </tr>
          <tr>
            <td><code>uninitialized_copy</code></td>
            <td><code>uninitialized_copy_n</code></td>
            <td><code>uninitialized_fill</code></td>
            <td><code>uninitialized_fill_n</code></td>
          </tr>
          <tr>
            <td><code>unique</code></td>
            <td><code>unique_copy</code></td>
            <td></td>
            <td></td>
          </tr>
      </table>
    </cxx-section>
  </cxx-section>

  <cxx-section id="parallel.alg.defns">
    <h1>Definitions</h1>

    <p>
      Define <code><em>GENERALIZED_SUM</em>(op, a1, ..., aN)</code> as follows:

      <ul>
        <li><code>a1</code> when <code>N</code> is <code>1</code></li>
      
        <li>
          <code>op(<em>GENERALIZED_SUM</em>(op, b1, ..., b<del2>M</del2><ins2>K</ins2>)</code>, <code><em>GENERALIZED_SUM</em>(op, bM, ..., bN))</code> where
           
          <ul>
            <li><code>b1, ..., bN</code> may be any permutation of <code>a1, ..., aN</code> and</li>

            <del2><li><code>0 &lt; M &lt; N</code>.</li></del2>
            <ins2><li><code>1 &lt; K+1 = M &le; N</code>.</li></ins2>
          </ul>
        </li>
      </ul>
    </p>

    <p>
      Define <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(op, a1, ..., aN)</code> as follows:

      <ul>
        <li><code>a1</code> when <code>N</code> is <code>1</code></li>

        <li>
          <code>op(<em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(op, a1, ..., a<del2>M</del2><ins2>K</ins2>), <em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(op, aM, ..., aN)</code> where <del2><code>0 &lt; M &lt; N</code></del2> <ins2><code>1 &lt; K+1 = M &le; N</code></ins2>.
        </li>
      </ul>
    </p>
  </cxx-section>

  <cxx-section id="parallel.alg.added">
    <h1>Novel algorithms</h1>

    This subclause describes novel algorithms introduced by this Technical Specification.

    <cxx-section id="parallel.alg.added.algorithms.synop">
      <h1>Header <code>&lt;experimental/algorithm&gt;</code> synopsis</h1>

      <pre>
namespace std {
namespace experimental {
namespace parallel<ins2>_v1</ins2> {
  template&lt;class ExecutionPolicy,
           class InputIterator, class Function&gt;
    void for_each(ExecutionPolicy&amp;&amp; exec,
                  InputIterator first, InputIterator last,
                  Function f);
  template&lt;class InputIterator, class Size, class Function&gt;
    InputIterator for_each_n(InputIterator first, Size n,
                             Function f);
}
}
}
</pre>
    </cxx-section>

    <cxx-section id="parallel.alg.added.foreach">
      <h1>For each</h1>

      <cxx-function>
        <cxx-signature>
          template&lt;class ExecutionPolicy,
                   class InputIterator, class Function&gt;
            void for_each(ExecutionPolicy&amp;&amp; exec,
                          InputIterator first, InputIterator last,
                          Function f);
        </cxx-signature>

        <cxx-effects>
          Applies <code>f</code> to the result of dereferencing every iterator in the range <code>[first,last)</code>.

          <cxx-note>
            If the type of <code>first</code> satisfies the requirements of a mutable iterator, <code>f</code> may
            apply nonconstant functions through the dereferenced iterator.
          </cxx-note>
        </cxx-effects>

        <cxx-complexity>
          Applies <code>f</code> exactly <code>last - first</code> times.
        </cxx-complexity>
        
        <cxx-remarks>
          If <code>f</code> returns a result, the result is ignored.
        </cxx-remarks>

        <cxx-notes>
          Unlike its sequential form, the parallel overload of <code>for_each</code> does not return a copy of
          its <code>Function</code> parameter, since parallelization may not permit efficient state
          accumulation.
        </cxx-notes>
        <cxx-requires>
          <ins>
            Unlike its sequential form, the parallel overload of <code>for_each</code> requires
            <code>Function</code> to meet the requirements of <code>CopyConstructible</code><insdel>, but not
            <code>MoveConstructible</code></insdel>.
          </ins>
        </cxx-requires>
      </cxx-function>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class Size, class Function&gt;
            InputIterator for_each_n(InputIterator first, Size n,
                                     Function f);
        </cxx-signature>

        <cxx-requires>
          <code>Function</code> shall meet the requirements of <code>MoveConstructible</code>
          
          <cxx-note>
            <code>Function</code> need not meet the requirements of <code>CopyConstructible</code>.
          </cxx-note>
        </cxx-requires>

        <cxx-effects>
          Applies <code>f</code> to the result of dereferencing every iterator in the range
          <code>[first,first + n)</code>, starting from <code>first</code> and proceeding to <code>first + n - 1</code>.

          <cxx-note>
            If the type of <code>first</code> satisfies the requirements of a mutable iterator,
            <code>f</code> may apply nonconstant functions through the dereferenced iterator.
          </cxx-note>
        </cxx-effects>

        <cxx-returns>
          <code>first + n</code><ins> for non-negative values of <code>n</code> and <code>first</code> for negative values</ins>.
        </cxx-returns>

        <cxx-remarks>
          If <code>f</code> returns a result, the result is ignored.
        </cxx-remarks>
      </cxx-function>

      <ins>
        <cxx-function>
          <cxx-signature>
          template&lt;class ExecutionPolicy,
                   class InputIterator, class Size, class Function&gt;
                   InputIterator for_each_n(ExecutionPolicy &amp;&amp; exec,
                                            InputIterator first, Size n,
                                            Function f);
          </cxx-signature>

          <cxx-effects>
            Applies <code>f</code> to the result of dereferencing every iterator in the range
            <code>[first,first + n)</code>, starting from <code>first</code> and proceeding to <code>first + n - 1</code>.

            <cxx-note>
              If the type of <code>first</code> satisfies the requirements of a mutable iterator,
              <code>f</code> may apply nonconstant functions through the dereferenced iterator.
            </cxx-note>
          </cxx-effects>

          <cxx-returns>
          <code>first + n</code><ins> for non-negative values of <code>n</code> and <code>first</code> for negative values.
          </cxx-returns>

          <cxx-remarks>
            If <code>f</code> returns a result, the result is ignored.
          </cxx-remarks>

          <cxx-notes>
            Unlike its sequential form, the parallel overload of <code>for_each_n</code> requires
            <code>Function</code> to meet the requirements of <code>CopyConstructible</code>, but not
            <code>MoveConstructible</code>.
          </cxx-notes>
        </cxx-function>
      </ins>
    </cxx-section>

    <cxx-section id="parallel.alg.added.numeric.synop">
      <h1>Header <code>&lt;experimental/numeric&gt;</code></h1>

      <pre>
namespace std {
namespace experimental {
namespace parallel<ins2>_v1</ins2> {
  template&lt;class InputIterator&gt;
    typename iterator_traits&lt;InputIterator&gt;::value_type
      reduce(InputIterator first, InputIterator last);
  template&lt;class InputIterator, class T&gt;
    T reduce(InputIterator first, InputIterator last<ins2>,</ins2> T init);
  template&lt;class InputIterator, class T, class BinaryOperation&gt;
    T reduce(InputIterator first, InputIterator last, T init,
             BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator&gt;
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result);
  template&lt;class InputIterator, class OutputIterator,
           class T&gt;
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init);
  template&lt;class InputIterator, class OutputIterator,
           class T, class BinaryOperation&gt;
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init, BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator&gt;
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result);
  template&lt;class InputIterator, class OutputIterator,
           class BinaryOperation&gt;
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     <del2>BinaryOperation binary_op</del2><ins2>T init</ins2>); <!-- Jared to verify! -->
  template&lt;class InputIterator, class OutputIterator,
           class T, class BinaryOperation&gt;
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init, BinaryOperation binary_op);
}
}
}
</pre>
    </cxx-section>

    <cxx-section id="parallel.alg.added.reduce">
      <h1>Reduce</h1>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator&gt;
            typename iterator_traits&lt;InputIterator&gt;::value_type
              reduce(InputIterator first, InputIterator last);
        </cxx-signature>

        <cxx-returns>
          <code>reduce(first, last, typename iterator_traits&lt;InputIterator&gt;::value_type{})</code>
        </cxx-returns>

        <cxx-requires>
          <code>typename iterator_traits&lt;InputIterator&gt;::value_type{}</code>
          shall be a valid expression. The <code>operator+</code> function associated with
          <code>iterator_traits&lt;InputIterator&gt;::value_type</code> shall not invalidate iterators or
          subranges, nor modify elements in the range <code>[first,last)</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>operator+</code>.
        </cxx-complexity>

        <cxx-notes>
          The primary difference between <code>reduce</code> and <code>accumulate</code> is that the behavior
          of <code>reduce</code> may be non-deterministic for non-associative or non-commutative
          <code>operator+</code>.
        </cxx-notes>
      </cxx-function>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class T&gt;
            T reduce(InputIterator first, InputIterator last, T init);
        </cxx-signature>

        <cxx-returns>
          <code>reduce(first, last, init, plus&lt;&gt;())</code>
        </cxx-returns>

        <cxx-requires>
          The <code>operator+</code> function associated with <code>T</code> shall not invalidate iterators
          or subranges, nor modify elements in the range <code>[first,last)</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>operator+</code>.
        </cxx-complexity>

        <cxx-notes>
          The primary difference between <code>reduce</code> and <code>accumulate</code> is that the behavior
          of <code>reduce</code> may be non-deterministic for non-associative or non-commutative <code>operator+</code>.
        </cxx-notes>
      </cxx-function>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class T, class BinaryOperation&gt;
            T reduce(InputIterator first, InputIterator last, T init,
                     BinaryOperation binary_op);
        </cxx-signature>

        <cxx-returns>
          <code><em>GENERALIZED_SUM</em>(binary_op, init, *first, ..., <del2>*(first + last - first - 1)</del2><ins2>*(first + (last - first) - 1)</ins2>)</code>.
        </cxx-returns>

        <cxx-requires>
          <code>binary_op</code> shall not invalidate iterators or subranges, nor modify elements in the
          range <code>[first,last)</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The primary difference between <code>reduce</code> and <code>accumulate</code> is that the behavior
          of <code>reduce</code> may be non-deterministic for non-associative or non-commutative <del><code>operator+</code></del><ins><code>binary_op</code></ins>.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.added.exclusive.scan">
      <h1>Exclusive scan</h1>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class OutputIterator,
                   class T&gt;
            OutputIterator
              exclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             T init);
        </cxx-signature>

        <cxx-returns>
          <code>exclusive_scan(first, last, result, init, plus&lt;&gt;())</code>
        </cxx-returns>

        <cxx-requires>
          The <code>operator+</code> function associated with <code>iterator_traits&lt;InputIterator&gt;::value_type</code> shall
          not invalidate iterators or subranges, nor modify elements in the ranges <code>[first,last)</code> or
          <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>operator+</code>.
        </cxx-complexity>

        <cxx-notes>
          The primary difference between <code>exclusive_scan</code> and <code>inclusive_scan</code> is that
          <code>exclusive_scan</code> excludes the <code>i</code>th input element from the <code>i</code>th sum.
          If the <code>operator+</code> function is not mathematically associative, the behavior of
          <code>exclusive_scan</code> may be non-deterministic.
        </cxx-notes>
      </cxx-function>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class OutputIterator,
                   class T, class BinaryOperation&gt;
            OutputIterator
              exclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             T init, BinaryOperation binary_op);
        </cxx-signature>

        <cxx-effects>
          Assigns through each iterator <code>i</code> in <code>[result,result + (last - first))</code> the
          value of <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, init, *first, ..., <del2>(*first + i - result - 1)</del2><ins2>*(first + (i - result) - 1)</ins2>)</code>.
        </cxx-effects>

        <cxx-returns>
          The end of the resulting range beginning at <code>result</code>.
        </cxx-returns>

        <cxx-requires>
          <code>binary_op</code> shall not invalidate iterators or subranges, nor modify elements in the
          ranges <code>[first,last)</code> or <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The primary difference between <code>exclusive_scan</code> and <code>inclusive_scan</code> is that
          <code>exclusive_scan</code> excludes the <code>i</code>th input element from the <code>i</code>th
          sum. If <code>binary_op</code> is not mathematically associative, the behavior of
          <code>exclusive_scan</code> may be non-deterministic.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.added.inclusive.scan">
      <h1>Inclusive scan</h1>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class OutputIterator&gt;
            OutputIterator
              inclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result);
        </cxx-signature>

        <cxx-returns>
          <code>inclusive_scan(first, last, result, plus&lt;&gt;())</code>
        </cxx-returns>

        <cxx-requires>
          The <code>operator+</code> function associated with
          <code>iterator_traits&lt;InputIterator&gt;::value_type</code> shall not invalidate iterators or
          subranges, nor modify elements in the ranges <code>[first,last)</code> or
          <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>operator+</code>.
        </cxx-complexity>

        <cxx-notes>
          The <del2>primary</del2> difference between <code>exclusive_scan</code> and <code>inclusive_scan</code> is that
          <code>exclusive_scan</code> excludes the <code>i</code>th input element from the <code>i</code>th sum.
          If the <code>operator+</code> function is not mathematically associative, the behavior of
          <code>inclusive_scan</code> may be non-deterministic.
        </cxx-notes>
      </cxx-function>

      <cxx-function>
        <cxx-signature>
          template&lt;class InputIterator, class OutputIterator,
                   class BinaryOperation&gt;
            OutputIterator
              inclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             BinaryOperation binary_op);
          
          template&lt;class InputIterator, class OutputIterator,
                   class T, class BinaryOperation&gt;
            OutputIterator
              inclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             T init, BinaryOperation binary_op);
        </cxx-signature>

        <cxx-effects>
          Assigns through each iterator <code>i</code> in <code>[result,result + (last - first))</code> the value of
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, *first, ..., <del2>(*first + i - result)</del2><ins2>*(first + (i - result))</ins2>)</code> or
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, init, *first, ..., <del2>(*first + i - result)</del2><ins2>*(first + (i - result))</ins2>)</code>
          if <code>init</code> is provided.
        </cxx-effects>

        <cxx-returns>
          The end of the resulting range beginning at <code>result</code>.
        </cxx-returns>

        <cxx-requires>
          <code>binary_op</code> shall not invalidate iterators or subranges, nor modify elements in the 
          ranges <code>[first,last)</code> or <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first)</code> applications of <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The <del2>primary</del2> difference between <code>exclusive_scan</code> and <code>inclusive_scan</code> is that
          <code>inclusive_scan</code> includes the <code>i</code>th input element in the <code>i</code>th sum.
          If <code>binary_op</code> is not mathematically associative, the behavior of
          <code>inclusive_scan</code> may be non-deterministic.
        </cxx-notes>
      </cxx-function>
    </cxx-section>
  </cxx-section>
</cxx-clause>