Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/algorithm.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ Convert a sequence of hexadecimal characters into a sequence of integers or char
Convert a sequence of integral types into a lower case hexadecimal sequence of characters
[endsect:hex_lower]

[include indirect_sort.qbk]

[include is_palindrome.qbk]

[include is_partitioned_until.qbk]
Expand Down
71 changes: 71 additions & 0 deletions doc/indirect_sort.qbk
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
[/ File indirect_sort.qbk]

[section:indirect_sort indirect_sort ]

[/license
Copyright (c) 2023 Marshall Clow

Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]

There are times that you want a sorted version of a sequence, but for some reason or another, you don't really want to sort them. Maybe the elements in the sequence are non-copyable (or non-movable), or the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.

Nevertheless, you might want to sort them. That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns to you a "permutation" of the elements that, when applied, will leave the elements in the sequence in a sorted order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this a bit shorter wording especially avoiding to mention the need to sort twice:

Suggested change
There are times that you want a sorted version of a sequence, but for some reason or another, you don't really want to sort them. Maybe the elements in the sequence are non-copyable (or non-movable), or the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.
Nevertheless, you might want to sort them. That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns to you a "permutation" of the elements that, when applied, will leave the elements in the sequence in a sorted order.
There are times that you want a sorted version of a sequence, but for some reason you don't want to modify it. Maybe the elements in the sequence can't be moved/copied, e.g. the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.
That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns a "permutation" of the elements that, when applied, will put the elements in the sequence in a sorted order.

Are the double-spaces after each sentence intended?


Say you have a sequence `[first, last)` of 1000 items that are expensive to swap:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Say you have a sequence `[first, last)` of 1000 items that are expensive to swap:
Assume a sequence `[first, last)` of 1000 items that are expensive to swap:

```
std::sort(first, last); // ['O(N ln N)] comparisons and ['O(N ln N)] swaps (of the element type).
```

On the other hand, using indirect sorting:
```
auto permutation = boost::algorithm::indirect_sort(first, last); // ['O(N lg N)] comparisons and ['O(N lg N)] swaps (of size_t).
boost::algorithm::apply_permutation(first, last, perm.begin(), perm.end()); // ['O(N)] swaps (of the element type)
```

If the element type is sufficiently expensive to swap, then 10,000 swaps of size_t + 1000 swaps of the element_type could be cheaper than 10,000 swaps of the element_type.

Or maybe you don't need the elements to actually be sorted - you just want to traverse them in a sorted order:
```
auto permutation = boost::algorithm::indirect_sort(first, last);
for (size_t idx: permutation)
std::cout << first[idx] << std::endl;
```


More to come here ....

[heading interface]

The function `indirect_sort` a `vector<size_t>` containing the permutation necessary to put the input sequence into a sorted order. One version uses `std::less` to do the comparisons; the other lets the caller pass predicate to do the comparisons.

```
template <typename RAIterator>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last);

template <typename RAIterator, typename BinaryPredicate>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, BinaryPredicate pred);
```

[heading Examples]

[heading Iterator Requirements]

`indirect_sort` requires random-access iterators.

[heading Complexity]

Both of the variants of `indirect_sort` run in ['O(N lg N)] time; they are not more (or less) efficient than `std::sort`. There is an extra layer of indirection on each comparison, but all off the swaps are done on values of type `size_t`

[heading Exception Safety]

[heading Notes]

[endsect]

[/ File indirect_sort.qbk
Copyright 2023 Marshall Clow
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt).
]
83 changes: 83 additions & 0 deletions include/boost/algorithm/indirect_sort.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
/*
Copyright (c) Marshall Clow 2023.

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

*/

/// \file indirect_sort.hpp
/// \brief indirect sorting algorithms
/// \author Marshall Clow
///

#ifndef BOOST_ALGORITHM_IS_INDIRECT_SORT
#define BOOST_ALGORITHM_IS_INDIRECT_SORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual include guard. Why not BOOST_ALGORITHM_INDIRECT_SORT?


#include <algorithm> // for std::sort (and others)
#include <functional> // for std::less
#include <vector> // for std:;vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
#include <vector> // for std:;vector
#include <vector> // for std::vector

But is that comment really required?


#include <boost/algorithm/cxx11/iota.hpp>

namespace boost { namespace algorithm {

namespace detail {

template <class Predicate, class Iter>
struct indirect_predicate {
indirect_predicate (Predicate pred, Iter iter)
: pred_(pred), iter_(iter) {}

bool operator ()(size_t a, size_t b) const {
return pred_(iter_[a], iter_[b]);
}

Predicate pred_;
Iter iter_;
};

}

typedef std::vector<size_t> Permutation;

// ===== sort =====

/// \fn indirect_sort (RAIterator first, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, Pred pred) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, Pred pred) {
Permutation indirect_sort (RAIterator first, RAIterator last, Pred pred) {

Permutation ret(std::distance(first, last));
boost::algorithm::iota(ret.begin(), ret.end(), size_t(0));
std::sort(ret.begin(), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_sort (RAIterator first, RAIterator las )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// \fn indirect_sort (RAIterator first, RAIterator las )
/// \fn indirect_sort (RAIterator first, RAIterator last)

/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the result is sorted according to the predicate pred.
/// the result is sorted in non-descending order.

///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last) {
Permutation indirect_sort (RAIterator first, RAIterator last) {

return indirect_sort(first, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== stable_sort =====
// ===== partial_sort =====
// ===== nth_element =====
}}

#endif // BOOST_ALGORITHM_IS_INDIRECT_SORT
4 changes: 4 additions & 0 deletions test/Jamfile.v2
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ alias unit_test_framework

# Apply_permutation tests
[ run apply_permutation_test.cpp unit_test_framework : : : : apply_permutation_test ]

# Indirect_sort tests
[ run indirect_sort_test.cpp unit_test_framework : : : : indirect_sort_test ]

# Find tests
[ run find_not_test.cpp unit_test_framework : : : : find_not_test ]
[ run find_backward_test.cpp unit_test_framework : : : : find_backward_test ]
Expand Down
100 changes: 100 additions & 0 deletions test/indirect_sort_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*
Copyright (c) Marshall Clow 2011-2012.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copyright (c) Marshall Clow 2011-2012.
Copyright (c) Marshall Clow 2023.


Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

For more information, see http://www.boost.org
*/

#include <boost/config.hpp>
#include <boost/algorithm/indirect_sort.hpp>
#include <boost/algorithm/apply_permutation.hpp>
#include <boost/algorithm/cxx11/is_sorted.hpp>

#define BOOST_TEST_MAIN
#include <boost/test/unit_test.hpp>

#include <iostream>
#include <string>
#include <vector>
#include <list>

typedef std::vector<size_t> Permutation;

// A permutation of size N is a sequence of values in the range [0..N)
// such that no value appears more than once in the permutation.
bool isa_permutation(Permutation p, size_t N) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool isa_permutation(Permutation p, size_t N) {
bool is_a_permutation(Permutation p, size_t N) {

is more readable.

if (p.size() != N) return false;

// Sort the permutation, and ensure that each value appears exactly once.
std::sort(p.begin(), p.end());
for (size_t i = 0; i < N; ++i)
if (p[i] != i) return false;
return true;
}

template <typename Iter,
typename Comp = typename std::less<typename std::iterator_traits<Iter>::value_type> >
struct indirect_comp {
indirect_comp (Iter it, Comp c = Comp())
: iter_(it), comp_(c) {}

bool operator ()(size_t a, size_t b) const { return comp_(iter_[a], iter_[b]);}

Iter iter_;
Comp comp_;
};

template <typename Iter>
void test_one_sort(Iter first, Iter last) {
Permutation perm = boost::algorithm::indirect_sort(first, last);
BOOST_CHECK (isa_permutation(perm, std::distance(first, last)));
BOOST_CHECK (boost::algorithm::is_sorted(perm.begin(), perm.end(), indirect_comp<Iter>(first)));

// Make a copy of the data, apply the permutation, and ensure that it is sorted.
std::vector<typename std::iterator_traits<Iter>::value_type> v(first, last);
boost::algorithm::apply_permutation(v.begin(), v.end(), perm.begin(), perm.end());
BOOST_CHECK (boost::algorithm::is_sorted(v.begin(), v.end()));
}

template <typename Iter, typename Comp>
void test_one_sort(Iter first, Iter last, Comp comp) {
Permutation perm = boost::algorithm::indirect_sort(first, last, comp);
BOOST_CHECK (isa_permutation(perm, std::distance(first, last)));
BOOST_CHECK (boost::algorithm::is_sorted(perm.begin(), perm.end(),
indirect_comp<Iter, Comp>(first, comp)));

// Make a copy of the data, apply the permutation, and ensure that it is sorted.
std::vector<typename std::iterator_traits<Iter>::value_type> v(first, last);
boost::algorithm::apply_permutation(v.begin(), v.end(), perm.begin(), perm.end());
BOOST_CHECK (boost::algorithm::is_sorted(v.begin(), v.end(), comp));
}


void test_sort () {
BOOST_CXX14_CONSTEXPR int num[] = { 1,3,5,7,9, 2, 4, 6, 8, 10 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BOOST_CXX14_CONSTEXPR int num[] = { 1,3,5,7,9, 2, 4, 6, 8, 10 };
int num[] = { 1,3,5,7,9, 2, 4, 6, 8, 10 };

or int *first = &num[0]; is invalid isn't it?

const int sz = sizeof (num)/sizeof(num[0]);
int *first = &num[0];
int const *cFirst = &num[0];

// Test subsets
for (size_t i = 0; i <= sz; ++i) {
test_one_sort(first, first + i);
test_one_sort(first, first + i, std::greater<int>());

// test with constant inputs
test_one_sort(cFirst, cFirst + i);
test_one_sort(cFirst, cFirst + i, std::greater<int>());
}

// make sure we work with iterators as well as pointers
std::vector<int> v(first, first + sz);
test_one_sort(v.begin(), v.end());
test_one_sort(v.begin(), v.end(), std::greater<int>());
}

BOOST_AUTO_TEST_CASE( test_main )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that extra method and not using BOOST_AUTO_TEST_CASE(test_sort) directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I expect there to be more test cases in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the whole idea of BOOST_AUTO_TEST_CASE is that you simply "decorate" each test case with that and NOT have a "main" function. By default it will run each such function sequentially even allowing you to filter test based on their name from the CLI.

-->

BOOST_AUTO_TEST_CASE( test_sort ){
...
}

BOOST_AUTO_TEST_CASE( test_indirect_stable_sort ){
...
}

{
test_sort ();
}