Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise sort.md and docstrings in sort.jl #48363

Merged
merged 11 commits into from
Jul 10, 2023
Merged

Revise sort.md and docstrings in sort.jl #48363

merged 11 commits into from
Jul 10, 2023

Conversation

LilithHafner
Copy link
Member

@LilithHafner LilithHafner commented Jan 20, 2023

Fixes #47789.

There are a lot of small changes here, some of which reflect changes in functionality since 1.8, others of which fix errors in the documentation of behavior in 1.8. I've used quoting review comments to provide point by point justification.

@LilithHafner LilithHafner added docs This change adds or pertains to documentation sorting Put things in order backport 1.9 Change should be backported to release-1.9 labels Jan 20, 2023
base/sort.jl Outdated
Comment on lines 1884 to 1886
Indicate that a sorting function should use the partial quick sort algorithm.
Partial quick sort is like quick sort, but is only required to find and sort the
elements that would end up in `v[k]` were `v` fully sorted.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug fix. In 1.8 we have

julia> sort(rand(100), alg=PartialQuickSort(70))
100-element Vector{Float64}:
 0.08554408610058495
 0.11471083183551722
 0.22328988544282513
 0.14111179015558972
 
 0.7224556463142775
 0.9572936633487816
 0.8133704423642836

Which
a) returns the whole array and
b) gets the kth element right, but does not sort the elements before index k

base/sort.jl Outdated Show resolved Hide resolved
base/sort.jl Outdated Show resolved Hide resolved
Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays of
values. By default, Julia picks reasonable algorithms and sorts in standard ascending order:
Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays
of values. By default, Julia picks reasonable algorithms and sorts in ascending order:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"standard" is a bit presumptuous imo. I hope what we chose as Julia's standard is also what the user things of as standard, but maybe they are used to seeing big things first. "ascending" is the more useful and unambiguous descriptor here.

Comment on lines 24 to 25
`sort` constructs a sorted copy leaving its input unchanged. Use the "bang" version of
the sort function to mutate an existing array:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort! was not always in-place by default in 1.8, it is even less often in place by default in 1.9. The foo! convention is defined by mutation, not allocation.

1.8:

julia> @btime sort!(x) setup=(x=rand(300)) evals=1;
  8.784 μs (0 allocations: 0 bytes)

julia> @btime sort!(x; alg=QuickSort) setup=(x=rand(300)) evals=1;
  8.855 μs (0 allocations: 0 bytes)

julia> @btime sort!(x) setup=(x=rand(1:100, 300)) evals=1;
  1.363 μs (1 allocation: 896 bytes)

julia> @btime sort!(x; alg=QuickSort) setup=(x=rand(1:100, 300)) evals=1;
  1.480 μs (1 allocation: 896 bytes)

1.9:

julia> @btime sort!(x) setup=(x=rand(300)) evals=1;
  5.661 μs (1 allocation: 2.50 KiB)

julia> @btime sort!(x; alg=QuickSort) setup=(x=rand(300)) evals=1;
  8.472 μs (0 allocations: 0 bytes)

julia> @btime sort!(x) setup=(x=rand(1:100, 300)) evals=1;
  1.309 μs (1 allocation: 896 bytes)

julia> @btime sort!(x; alg=QuickSort) setup=(x=rand(1:100, 300)) evals=1;
  7.965 μs (0 allocations: 0 bytes)

doc/src/base/sort.md Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
two elements in order to determine which should come first. The
[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate
orderings on the same set of elements. Instances of `Ordering` define a
[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, strict because we are talking about < not <=. Partial because that is all that we can actually guarantee. By(abs) is an example of a strict partial order that is not a strict total order because none of lt(By(abs), -1, 1), lt(By(abs), 1, -1), and isequal(-1, 1) hold.

Weakening the guarantee of totality is a bug fix because we never honored it nor could reasonably have honored it.

Copy link
Member

@knuesel knuesel Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sorting algorithms often require something a bit stronger: a strict weak ordering. Is it not the case in Julia?

(Compared to a strict partial ordering, a strict weak ordering also requires that !lt(a,b) && !lt(b,a) together with !lt(b,c) && !lt(c,b) imply !lt(a,c) && !lt(c,a), meaning that the "non-comparability" is transitive.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If !isless(b,a) and !isless(c,b), then sort([a,b,c]) will call isless twice and return [a,b,c] without checking isless(c,a). So yes, a strict weak ordering is required. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is equally powerful to state that !lt(a,b) && !lt(b,c) implies !lt(a,c).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is equally powerful to state that !lt(a,b) && !lt(b,c) implies !lt(a,c).

I don't think that's right, for example consider the following (invalid) order on complex numbers:

lt(a,b) = isreal(a) && isreal(b) ? a<b : abs(a)<abs(b)

Then with a, b, c = 1, im, -1:

julia> !lt(a,b) && !lt(b,c)
true

julia> !lt(a,c)
true

but

incomparable(a,b) = !lt(a,b) && !lt(b,a)

julia> incomparable(a,b) && incomparable(b,c)
true

julia> incomparable(a,c)
false

I think even if we find a way to reduce the conditions to something simpler, it's probably a bad idea: better give the conditions that are standard in the literature and that make the point clear, for example here the point is "transitiveness of incomparability", which would not be obvious if we gave a kind of reduced condition...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a valid counterexample because for a, b, c = 0, -1, im we have

julia> !lt(a,b) && !lt(b,c)
true

julia> !lt(a,c)
true

To prove that transitivity of not less than implies transitivity of incomparability, take arbitrary lt which satisfies !lt(a,b) && !lt(b,c) implies !lt(a,c). Take arbitrary a, b, c with incomparable(a, b) && incomparable(b, c), expand to (!lt(a,b) && !lt(b,a)) && (!lt(b,c) && !lt(c,b)), re-arrange to (!lt(a,b) && !lt(b,c)) && (!lt(c,b) && !lt(b,a)), and apply transitivity of not less than to each half to get !lt(a,c) && !lt(c,a) which is incomparable(a,c). Because a, b, and c were arbitrary that satisfy incomparable(a, b) && incomparable(b, c), we know lt satisfies transitivity of incomparability. Because lt was and arbitrary function with !lt(a,b) && !lt(b,c) implies !lt(a,c), we have that transitivity of not less than implies transitivity of incomparability.

The converse is more difficult. Take arbitrary lt which satisfies transitivity and transitivity of incomparability. Take arbitrary a, b, c with !lt(a, b) && !lt(b, c). Assume to reach a contradiction that lt(a,c). By antisymetry, we have !lt(c,a). If lt(b, a), then we could apply transitivity to claim lt(b,c) which is false, and similarly if lt(c,b) then we could apply transitivity to claim lt(c,a) which is also false. Consequuently, we have !lt(b,a) and !lt(c,b). We can now apply transitivity of incomaprability to claim a and c are incomparable which contradicts lt(a,c). Therefore our original assumption was false so !lt(a,c). Because a, b, and c were arbitrary with !lt(a, b) && !lt(b, c), we know that lt satisfies transitivity of not less than. Because lt was arbitrary with transitivity and transitivity of incomparability, we know that transitivity together with transitivity of incomparability implies transitivity of not less than.

In conclusion, if we already have transitivity (which we do) "!lt(a,b) && !lt(b,c) implies !lt(a,c)" and "incomparable(a,b) && incomarable(b,c) implies incomarable(a,c)" are mathematically interchangeable.

It is equally valid to think of the criterion as "transitivity of incomparability" and "transitivity of not less than", though switching back and forth rigorously is nontrivial.

That said, I agree with you that incomparability is a better criterion to pick for intuition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the correction, my counterexample was wrong indeed. And I understand now that you were talking of equivalence in the context where the other conditions (in practice, transitivity) are satisfied.

@LilithHafner LilithHafner changed the title Revise sort.md and docstrings in sort.jl [WIP] Revise sort.md and docstrings in sort.jl Jan 20, 2023
Comment on lines 177 to 179
* if `a == b`, then `lt(a, b) == false`;
* `lt(a, b) && lt(b, a) == false`; and
* if `lt(a, b) && lt(b, c) == true`, then `lt(a, c) == true`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is == used in the sorting algorithms? I'm a bit surprised to see it here while the documentation of isless mentions isequal:

Test whether x is less than y, according to a fixed total order (defined together with isequal).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that neither == nor isequal are used in sorting to compare elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe the first point should read like the following?

  • lt(a, a) == false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted it entirely because it is now a strict subset of what was point two.

Copy link
Member

@knuesel knuesel Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #48387 it might be good to leave the irreflexivity condition even if it's implied by the asymmetric condition (to follow the usual definition and help the reader get an important point)...

doc/src/base/sort.md Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
base/sort.jl Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
on the values to be manipulated. The `isless` function is invoked by default, but the relation
can be specified via the `lt` keyword.
can be specified via the `lt` keyword, a function that takes two array elements and returns true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
can be specified via the `lt` keyword, a function that takes two array elements and returns true
can be specified via the `lt` or `order` keyword, a function that takes two array elements and returns true

doc/src/base/sort.md Outdated Show resolved Hide resolved
on most inputs. The exact algorithm choice is an implementation detail to allow for
future performance improvements. Currently, a hybrid of `RadixSort`, `ScratchQuickSort`,
`InsertionSort`, and `CountingSort` is used based on input type, size, and composition.
Implementation details are subject to change but currently availible in the extended help
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implementation details are subject to change but currently availible in the extended help
Implementation details are subject to change but currently available in the extended help

doc/src/base/sort.md Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
@LilithHafner
Copy link
Member Author

Hold for #48440

@LilithHafner LilithHafner removed the backport 1.9 Change should be backported to release-1.9 label Jan 28, 2023
@LilithHafner LilithHafner marked this pull request as draft January 28, 2023 15:45
@LilithHafner LilithHafner marked this pull request as ready for review July 7, 2023 23:16
@LilithHafner LilithHafner added the backport 1.10 Change should be backported to the 1.10 release label Jul 7, 2023
Copy link
Member

@knuesel knuesel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, except for the strict weak order conditions (see suggestion to simply refer to the sort! documentation).

doc/src/base/sort.md Outdated Show resolved Hide resolved
doc/src/base/sort.md Outdated Show resolved Hide resolved
LilithHafner and others added 4 commits July 10, 2023 11:23
Co-authored-by: Jeremie Knuesel <knuesel@gmail.com>
In an effort to clarify that it's `Base.Order.lt`'s behavior on custom orders that users need to be worried about, not the function itself
@LilithHafner LilithHafner merged commit a134076 into master Jul 10, 2023
@LilithHafner LilithHafner deleted the sort-doc branch July 10, 2023 22:30
IanButterworth pushed a commit that referenced this pull request Aug 19, 2023
Co-authored-by: Jeremie Knuesel <knuesel@gmail.com>
(cherry picked from commit a134076)
KristofferC added a commit that referenced this pull request Oct 2, 2023
Backported PRs:
- [x] #48625 <!-- add replace(io, str, patterns...) -->
- [x] #48387 <!-- Improve documentation of sort-related functions -->
- [x] #48363 <!-- Revise sort.md and docstrings in sort.jl -->
- [x] #48977 <!-- Update SparseArrays.jl stdlib for SuiteSparse 7 -->
- [x] #50719 <!-- fix `CyclePadding(::DataType)` -->
- [x] #50694 <!-- inference: permit recursive type traits -->
- [x] #50860 <!-- Add `Base.get_extension` to docs/API -->
- [x] #50594 <!-- Disallow non-index Integer types in isassigned -->
- [x] #50802 <!-- Makes IntrusiveLinkedListSynchronized mutable to avoid
allocation on poptask -->
- [x] #50858 <!-- Add a `threadpool` parameter to `Channel` constructor
-->
- [x] #50874 <!-- Restrict COFF to a single thread when symbol count is
high -->
- [x] #50822 <!-- Add default method for setmodifiers! -->
- [x] #50730 <!-- Fix integer overflow in `isapprox` -->
- [x] #50850 <!-- Remove weird Rational dispatch and add pi functions to
list -->
- [x] #50809 <!-- Limit type-printing in MethodError -->
- [x] #50915 <!-- Add note the `Task` about sticky bit -->
- [x] #50929 <!-- when widening tuple types in tmerge, only widen the
complex parts -->
- [x] #50928 <!-- Bump JuliaSyntax to 0.4.6 -->
- [x] #50959 <!-- Update libssh2 patches -->
- [x] #50823 <!-- Make ranges more robust with unsigned indexes. -->
- [x] #48542 <!-- Add docs on task-specific buffering using
multithreading -->
- [x] #50912 <!-- Separate foreign threads into a :foreign threadpool
-->
- [x] #51010 <!-- Add ORIGIN to SuiteSparse rpath on Linux/FreeBSD -->
- [x] #50753 <!-- cat: remove unused promote_eltype methods that confuse
inference -->
- [x] #51027 <!-- Implement realloc accounting correctly -->
- [x] #51019 <!-- fix a case of potentially use of undefined variable
when handling error in distributed message processing -->
- [x] #51039 <!-- Revert "Optimize findall(f, ::AbstractArray{Bool})
(#42202)" -->
- [x] #51036 <!-- add missing invoke edge for nospecialize targets -->
- [x] #51042 <!-- inference: fix return_type_tfunc modeling of concrete
functions -->
- [x] #51114 <!-- Workaround upstream FreeBSD issue #272992 -->
- [x] #50892 <!-- Add `JL_DLLIMPORT` to `small_typeof` declaration -->
- [x] #51154 <!-- broadcast: use recursion rather than ntuple to map
over a tuple -->
- [x] #51153 <!-- fix debug typo in "add missing invoke edge for
nospecialize targets (#51036)" -->
- [x] #51222 <!-- Check again if the tty is open inside the IO lock -->
- [x] #51236 <!-- Add lock around uv_unref during init -->
- [x] #51243 <!-- GMP: Gracefully handle more overflows. -->
- [x] #51254 <!-- Ryu: make sure adding zeros does not overwrite
trailing dot -->
- [x] #51175 <!-- shorten stale_age for cachefile lock -->
- [x] #51300 <!-- fix method definition error for bad vararg -->
- [x] #51307 <!-- fix force-throw ctrl-C on Windows -->
- [x] #51303 <!-- ensure revising structs is safe -->
- [x] #51393 
- [x] #51403 

Need manual backport:
- [x] #51009 <!-- fix #50562, regression in `in` of tuple of Symbols -->
- [x] #51053 <!-- Bump Statistics stdlib -->
- [x] #51013 <!-- fix #50709, type bound error due to free typevar in
sparam env -->
- [x] #51305 <!-- reduce test time for rounding and floatfuncs -->

Contains multiple commits, manual intervention needed:
- [ ] #50663 <!-- Fix Expr(:loopinfo) codegen -->
- [ ] #51035 <!-- refactor GC scanning code to reflect jl_binding_t are
now first class -->
- [ ] #51092 <!-- inference: fix bad effects for recursion -->
- [x] #51247 <!-- Check if malloc has succeeded before incrementing gc
stats -->
- [x] #51294 <!-- use LibGit2_jll for LibGit2 library -->

Non-merged PRs with backport label:
- [ ] #51132 <!-- Handle `AbstractQ` in concatenation -->
- [x] #51029 <!-- testdefs: make sure that if a test set changes the
active project, they change it back when they're done -->
- [ ] #50919 <!-- Code loading: do the "skipping mtime check for stdlib"
check regardless of the value of `ispath(f)` -->
- [ ] #50824 <!-- Add some aliasing warnings to docstrings for mutating
functions -->
- [x] #50385 <!-- Precompile pidlocks: add to NEWS and docs -->
- [ ] #49805 <!-- Limit TimeType subtraction to AbstractDateTime -->
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Oct 2, 2023
nsajko added a commit to nsajko/julia that referenced this pull request Oct 12, 2023
KristofferC pushed a commit that referenced this pull request Oct 17, 2023
"Total order" -> "strict weak order".
nalimilan pushed a commit that referenced this pull request Nov 5, 2023
Co-authored-by: Jeremie Knuesel <knuesel@gmail.com>
(cherry picked from commit a134076)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation sorting Put things in order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revise sort.md
3 participants