Skip to content

Commit

Permalink
revise sort.md and docstrings in sort.jl, take 1 (part of PR #48363)
Browse files Browse the repository at this point in the history
  • Loading branch information
Lilith Hafner authored and Lilith Hafner committed Jan 28, 2023
1 parent bd217d4 commit 7256a03
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 78 deletions.
30 changes: 26 additions & 4 deletions base/sort.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1978,16 +1978,37 @@ struct MergeSortAlg <: Algorithm end
"""
PartialQuickSort{T <: Union{Integer,OrdinalRange}}
Indicate that a sorting function should use the partial quick sort
algorithm. Partial quick sort returns the smallest `k` elements sorted from smallest
to largest, finding them and sorting them using [`QuickSort`](@ref).
Indicate that a sorting function should use the partial quick sort algorithm.
Partial quick sort is like quick sort, but is only required to find and sort the
elements that would end up in `v[k]` were `v` fully sorted.
Characteristics:
* *not stable*: does not preserve the ordering of elements that
compare equal (e.g. "a" and "A" in a sort of letters that
ignores case).
* *in-place* in memory.
* *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref).
Note that `PartialQuickSort(k)` does not necessarily sort the whole array. For example,
```jldoctest
julia> x = rand(100);
julia> k = 50:100;
julia> s1 = sort(x; alg=QuickSort);
julia> s2 = sort(x; alg=PartialQuickSort(k));
julia> map(issorted, (s1, s2))
(true, false)
julia> map(x->issorted(x[k]), (s1, s2))
(true, true)
julia> s1[k] == s2[k]
true
```
"""
struct PartialQuickSort{T <: Union{Integer,OrdinalRange}} <: Algorithm
k::T
Expand Down Expand Up @@ -2022,7 +2043,8 @@ Characteristics:
* *stable*: preserves the ordering of elements that compare
equal (e.g. "a" and "A" in a sort of letters that ignores
case).
* *not in-place* in memory.
* *not in-place* in memory — requires a temporary
array of half the size of the input array.
* *divide-and-conquer* sort strategy.
"""
const MergeSort = MergeSortAlg()
Expand Down
109 changes: 35 additions & 74 deletions doc/src/base/sort.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Sorting and Related Functions

Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays of
values. By default, Julia picks reasonable algorithms and sorts in standard ascending order:
Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays
of values. By default, Julia picks reasonable algorithms and sorts in ascending order:

```jldoctest
julia> sort([2,3,1])
Expand All @@ -11,7 +11,7 @@ julia> sort([2,3,1])
3
```

You can easily sort in reverse order as well:
You can sort in reverse order as well:

```jldoctest
julia> sort([2,3,1], rev=true)
Expand All @@ -21,7 +21,8 @@ julia> sort([2,3,1], rev=true)
1
```

To sort an array in-place, use the "bang" version of the sort function:
`sort` constructs a sorted copy leaving its input unchanged. Use the "bang" version of
the sort function to mutate an existing array:

```jldoctest
julia> a = [2,3,1];
Expand All @@ -35,8 +36,8 @@ julia> a
3
```

Instead of directly sorting an array, you can compute a permutation of the array's indices that
puts the array into sorted order:
Instead of directly sorting an array, you can compute a permutation of the array's
indices that puts the array into sorted order:

```julia-repl
julia> v = randn(5)
Expand Down Expand Up @@ -64,7 +65,7 @@ julia> v[p]
0.382396
```

Arrays can easily be sorted according to an arbitrary transformation of their values:
Arrays can be sorted according to an arbitrary transformation of their values:

```julia-repl
julia> sort(v, by=abs)
Expand Down Expand Up @@ -100,9 +101,12 @@ julia> sort(v, alg=InsertionSort)
0.382396
```

All the sorting and order related functions rely on a "less than" relation defining a total order
All the sorting and order related functions rely on a "less than" relation defining a
[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order)
on the values to be manipulated. The `isless` function is invoked by default, but the relation
can be specified via the `lt` keyword.
can be specified via the `lt` keyword, a function that takes two array elements and returns true
if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for
more info.

## Sorting Functions

Expand Down Expand Up @@ -134,88 +138,45 @@ Base.Sort.partialsortperm!

## Sorting Algorithms

There are currently four sorting algorithms available in base Julia:
There are currently four sorting algorithms publicly available in base Julia:

* [`InsertionSort`](@ref)
* [`QuickSort`](@ref)
* [`PartialQuickSort(k)`](@ref)
* [`MergeSort`](@ref)

`InsertionSort` is an O(n²) stable sorting algorithm. It is efficient for very small `n`,
and is used internally by `QuickSort`.
By default, the `sort` family of functions uses stable sorting algorithms that are fast
on most inputs. The exact algorithm choice is an implementation detail to allow for
future performance improvements. Currently, a hybrid of `RadixSort`, `ScratchQuickSort`,
`InsertionSort`, and `CountingSort` is used based on input type, size, and composition.
Implementation details are subject to change but currently availible in the extended help
of `??Base.DEFAULT_STABLE` and the docstrings of internal sorting algorithms listed there.

`QuickSort` is a very fast sorting algorithm with an average-case time complexity of
O(n log n). `QuickSort` is stable, i.e., elements considered equal will remain in the same
order. Notice that O(n²) is worst-case complexity, but it gets vanishingly unlikely as the
pivot selection is randomized.

`PartialQuickSort(k::OrdinalRange)` is similar to `QuickSort`, but the output array is only
sorted in the range of `k`. For example:

```jldoctest
julia> x = rand(1:500, 100);
julia> k = 50:100;
julia> s1 = sort(x; alg=QuickSort);
julia> s2 = sort(x; alg=PartialQuickSort(k));
julia> map(issorted, (s1, s2))
(true, false)
julia> map(x->issorted(x[k]), (s1, s2))
(true, true)
julia> s1[k] == s2[k]
true
```

!!! compat "Julia 1.9"
The `QuickSort` and `PartialQuickSort` algorithms are stable since Julia 1.9.

`MergeSort` is an O(n log n) stable sorting algorithm but is not in-place – it requires a temporary
array of half the size of the input array – and is typically not quite as fast as `QuickSort`.
It is the default algorithm for non-numeric data.

The default sorting algorithms are chosen on the basis that they are fast and stable.
Usually, `QuickSort` is selected, but `InsertionSort` is preferred for small data.
You can also explicitly specify your preferred algorithm, e.g.
`sort!(v, alg=PartialQuickSort(10:20))`.

The mechanism by which Julia picks default sorting algorithms is implemented via the
`Base.Sort.defalg` function. It allows a particular algorithm to be registered as the
default in all sorting functions for specific arrays. For example, here is the default
method from [`sort.jl`](https://github.com/JuliaLang/julia/blob/master/base/sort.jl):

```julia
defalg(v::AbstractArray) = DEFAULT_STABLE
```

You may change the default behavior for specific types by defining new methods for `defalg`.
You can explicitly specify your preferred algorithm with the `alg` keyword
(e.g. `sort!(v, alg=PartialQuickSort(10:20))`) or reconfigure the default sorting algorithm
for a custom types by adding a specialized method to the `Base.Sort.defalg` function.
For example, [InlineStrings.jl](https://github.com/JuliaStrings/InlineStrings.jl/blob/v1.3.2/src/InlineStrings.jl#L903)
defines the following method:
```julia
Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = InlineStringSort
```

!!! compat "Julia 1.9"
The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed
to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays.
The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to be stable
since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays.

## Alternate Orderings

By default, `sort` and related functions use [`isless`](@ref) to compare two
elements in order to determine which should come first. The
[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining
alternate orderings on the same set of elements. Instances of `Ordering` define
a [total order](https://en.wikipedia.org/wiki/Total_order) on a set of elements,
so that for any elements `a`, `b`, `c` the following hold:

* Exactly one of the following is true: `a` is less than `b`, `b` is less than
`a`, or `a` and `b` are equal (according to [`isequal`](@ref)).
* The relation is transitive - if `a` is less than `b` and `b` is less than `c`
then `a` is less than `c`.
By default, `sort`, `searchsorted`, and related functions use [`isless`](@ref) to compare
two elements in order to determine which should come first. The
[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate
orderings on the same set of elements. Instances of `Ordering` define a
[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order).
To be a strict partial order, for any elements `a`, `b`, `c` the following hold:

* if `a == b`, then `lt(a, b) == false`;
* `lt(a, b) && lt(b, a) == false`; and
* if `lt(a, b) && lt(b, c) == true`, then `lt(a, c) == true`

The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to
test whether `a` is less than `b` according to a given order.
Expand Down

0 comments on commit 7256a03

Please sign in to comment.