Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Mar 17, 2017

Tracking issue for the feature: #40585

This is essentially integration of pdqsort into libcore.

There's plenty of unsafe blocks to review. The heart of pdqsort is fn partition_in_blocks and is probably the most challenging function to understand. It requires some patience, but let me know if you find it too difficult - comments could always be improved.

Changes

  • Added sort_unstable feature.
  • Tweaked insertion sort constants for stable sort. Sorting integers is now up to 5% slower, but sorting big elements is much faster (in particular, sort_large_big_random is 35% faster). The old constants were highly optimized for sorting integers, so overall the configuration is more balanced now. A minor regression in case of integers is forgivable as we recently had performance improvements (Slightly optimize slice::sort #39538) that completely make up for it.
  • Removed some uninteresting sort benchmarks.
  • Added a new sort benchmark for string sorting.

Benchmarks

The following table compares stable and unstable sorting:

name                                 stable ns/iter        unstable ns/iter     diff ns/iter   diff %
slice::sort_large_ascending          7,240 (11049 MB/s)    7,380 (10840 MB/s)            140    1.93%
slice::sort_large_big_random         1,454,138 (880 MB/s)  910,269 (1406 MB/s)      -543,869  -37.40%
slice::sort_large_descending         13,450 (5947 MB/s)    10,895 (7342 MB/s)         -2,555  -19.00%
slice::sort_large_mostly_ascending   204,041 (392 MB/s)    88,639 (902 MB/s)        -115,402  -56.56%
slice::sort_large_mostly_descending  217,109 (368 MB/s)    99,009 (808 MB/s)        -118,100  -54.40%
slice::sort_large_random             477,257 (167 MB/s)    346,028 (231 MB/s)       -131,229  -27.50%
slice::sort_large_random_expensive   21,670,537 (3 MB/s)   22,710,238 (3 MB/s)     1,039,701    4.80%
slice::sort_large_strings            6,284,499 (38 MB/s)   6,410,896 (37 MB/s)       126,397    2.01%
slice::sort_medium_random            3,515 (227 MB/s)      3,327 (240 MB/s)             -188   -5.35%
slice::sort_small_ascending          42 (1904 MB/s)        41 (1951 MB/s)                 -1   -2.38%
slice::sort_small_big_random         503 (2544 MB/s)       514 (2490 MB/s)                11    2.19%
slice::sort_small_descending         72 (1111 MB/s)        69 (1159 MB/s)                 -3   -4.17%
slice::sort_small_random             369 (216 MB/s)        367 (217 MB/s)                 -2   -0.54%

Interesting cases:

  • Expensive comparison function and string sorting - it's a really close race, but timsort performs a slightly smaller number of comparisons. This is a natural difference of bottom-up merging versus top-down partitioning.
  • large_descending - unstable sort is slightly faster. Both just check whether the slice is descending and if so, they reverse it. I believe the slight difference in performance is because unstable sort traces the slice forwards, while stable sort goes backwards.

r? @alexcrichton

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not added or anything. It was just reordered with sort_by_key.
Previous order: sort, sort_by_key, sort_by.
Current order: sort, sort_by, sort_by_key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME because we shouldn't mention unstable functions in the documentation for stable functions.... right?
I assume we'll fix this as soon as the feature gets stabilized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems reasonable to me, punting on this for now until the function is stabilized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's worthwhile differentiating between big and small Ts - it doesn't have a large effect.

Instead, it's better to focus on some middle ground that works well all around: big/small Ts and expensive/cheap comparison functions as well. I took a look at a std::stable_sort implementation in C++ and it has similar constants.

@bors
Copy link
Collaborator

bors commented Mar 17, 2017

☔ The latest upstream changes (presumably #40598) made this pull request unmergeable. Please resolve the merge conflicts.

@ghost
Copy link
Author

ghost commented Mar 17, 2017

I have troubles making Travis tests pass - the problem is in src/libcollections/lib.rs, line #![feature(sort_unstable)].

The compiler warns about unused future, and warnings are configured to be denied.
I tried changing to #![cfg_attr(not(test), feature(sort_unstable))], but then doctests complain that the feature is not enabled.

Whatever I do, the compiler complains one way or another... Any idea how to fix that? :/

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome implementation, thanks for the PR!

I didn't review the sorting itself in too close detail, but everything I looked at seemed quite solid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Travis is failing on this line during tests: https://travis-ci.org/rust-lang/rust/jobs/212204751

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems reasonable to me, punting on this for now until the function is stabilized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to add #![feature] to this doctest to get past bors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That worked, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok to omit this annotation I believe as it's a private module anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alexcrichton
Copy link
Member

Whatever I do, the compiler complains one way or another... Any idea how to fix that? :/

Oh your fix for lib.rs is correct I believe, you'll just need to add #![feature(...)] to the doctests themselves I believe.

@ghost
Copy link
Author

ghost commented Mar 17, 2017

cc @steveklabnik

Not long ago we discussed what guarantees sort should provide.
This unstable sort is somewhat relevant: you might want to take a look at documentation for sort_unstable, which lives in src/libcollections/slice.rs.

@ghost
Copy link
Author

ghost commented Mar 18, 2017

I managed to close the gap between stable and unstable sorting on ascending & descending inputs (see the new benchmarks). :)

One more thing: pdqsort falls back to heapsort after several imbalanced partitions. I'm pretty sure we never hit this case in our tests, so it'd be wise to write tests for heapsort (fn heapsort in sort.rs).
However, I don't know how to call that function from libcoretest without making it public. Suggestions?

@bors
Copy link
Collaborator

bors commented Mar 19, 2017

☔ The latest upstream changes (presumably #40538) made this pull request unmergeable. Please resolve the merge conflicts.

@orlp
Copy link
Contributor

orlp commented Mar 19, 2017

@stjepang Perhaps you can trigger the worst case using a modification of the technique described in the 1999 paper "A Killer Adversary for Quicksort" by M. D. McIlroy.

I used it a couple years ago to prove that libc++'s sort routine is quadratic.

@orlp
Copy link
Contributor

orlp commented Mar 19, 2017

Also, for those interested in a deeper explanation in the how/why of pattern-defeating quicksort, a (nearly finished) draft of my paper is available here: https://drive.google.com/file/d/0B1-vl-dPgKm_T0Fxeno1a0lGT0E/view

There are some subtle differences between my C++ implementation and @stjepang's version due to design goals, so some parts might not apply, but most of it does.

@arthurprs
Copy link
Contributor

@stjepang I guess the new pub(...) syntax doesn't help here since code/test is in different crates?

@ghost
Copy link
Author

ghost commented Mar 21, 2017

@arthurprs No, doesn't help unfortunately. They are totally separate crates.

However, I did a workaround @alexcrichton suggested to me over IRC - expose the heapsort function publicly, but gate it through a dummy feature. Can't say I feel good about this hack, but apparently there's no better option at the moment.

@alexcrichton
Copy link
Member

@bors: r+

Yeah perhaps one day we'll be able to move tests back into core (maybe selectively this time) but for now this is functionally the same (an internal function to the library)

@bors
Copy link
Collaborator

bors commented Mar 21, 2017

📌 Commit 038d605 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Mar 21, 2017

🔒 Merge conflict

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Collaborator

bors commented Mar 21, 2017

📌 Commit a718051 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Mar 21, 2017

⌛ Testing commit a718051 with merge cab4bff...

bors added a commit that referenced this pull request Mar 21, 2017
Implement feature sort_unstable

Tracking issue for the feature: #40585

This is essentially integration of [pdqsort](https://github.com/stjepang/pdqsort) into libcore.

There's plenty of unsafe blocks to review. The heart of pdqsort is `fn partition_in_blocks` and is probably the most challenging function to understand. It requires some patience, but let me know if you find it too difficult - comments could always be improved.

#### Changes

* Added `sort_unstable` feature.
* Tweaked insertion sort constants for stable sort. Sorting integers is now up to 5% slower, but sorting big elements is much faster (in particular, `sort_large_big_random` is 35% faster). The old constants were highly optimized for sorting integers, so overall the configuration is more balanced now. A minor regression in case of integers is forgivable as we recently had performance improvements (#39538) that completely make up for it.
* Removed some uninteresting sort benchmarks.
* Added a new sort benchmark for string sorting.

#### Benchmarks

The following table compares stable and unstable sorting:
```
name                                 stable ns/iter        unstable ns/iter     diff ns/iter   diff %
slice::sort_large_ascending          7,240 (11049 MB/s)    7,380 (10840 MB/s)            140    1.93%
slice::sort_large_big_random         1,454,138 (880 MB/s)  910,269 (1406 MB/s)      -543,869  -37.40%
slice::sort_large_descending         13,450 (5947 MB/s)    10,895 (7342 MB/s)         -2,555  -19.00%
slice::sort_large_mostly_ascending   204,041 (392 MB/s)    88,639 (902 MB/s)        -115,402  -56.56%
slice::sort_large_mostly_descending  217,109 (368 MB/s)    99,009 (808 MB/s)        -118,100  -54.40%
slice::sort_large_random             477,257 (167 MB/s)    346,028 (231 MB/s)       -131,229  -27.50%
slice::sort_large_random_expensive   21,670,537 (3 MB/s)   22,710,238 (3 MB/s)     1,039,701    4.80%
slice::sort_large_strings            6,284,499 (38 MB/s)   6,410,896 (37 MB/s)       126,397    2.01%
slice::sort_medium_random            3,515 (227 MB/s)      3,327 (240 MB/s)             -188   -5.35%
slice::sort_small_ascending          42 (1904 MB/s)        41 (1951 MB/s)                 -1   -2.38%
slice::sort_small_big_random         503 (2544 MB/s)       514 (2490 MB/s)                11    2.19%
slice::sort_small_descending         72 (1111 MB/s)        69 (1159 MB/s)                 -3   -4.17%
slice::sort_small_random             369 (216 MB/s)        367 (217 MB/s)                 -2   -0.54%
```

Interesting cases:
* Expensive comparison function and string sorting - it's a really close race, but timsort performs a slightly smaller number of comparisons. This is a natural difference of bottom-up merging versus top-down partitioning.
* `large_descending` - unstable sort is faster, but both sorts should have equivalent performance. Both just check whether the slice is descending and if so, they reverse it. I blame LLVM for the discrepancy.

r? @alexcrichton
@bors
Copy link
Collaborator

bors commented Mar 21, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing cab4bff to master...

@bors bors merged commit a718051 into rust-lang:master Mar 21, 2017
@bors bors mentioned this pull request Mar 21, 2017
@ghost ghost deleted the sort-unstable branch March 21, 2017 22:42
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Mar 27, 2017
libcore: fix compilation on 16bit target (MSP430).

Since PR rust-lang#40601 has been merged, libcore no longer compiles on MSP430.
The reason is this code in `break_patterns`:
```rust
 let mut random = len;
 random ^= random << 13;
 random ^= random >> 17;
 random ^= random << 5;
 random &= modulus - 1;
```
It assumes that `len` is at least a 32 bit integer.
As a workaround replace `break_patterns` with an empty function for 16bit targets.

cc @stjepang
cc @alexcrichton
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 29, 2017
libcore: fix compilation on 16bit target (MSP430).

Since PR rust-lang#40601 has been merged, libcore no longer compiles on MSP430.
The reason is this code in `break_patterns`:
```rust
 let mut random = len;
 random ^= random << 13;
 random ^= random >> 17;
 random ^= random << 5;
 random &= modulus - 1;
```
It assumes that `len` is at least a 32 bit integer.
As a workaround replace `break_patterns` with an empty function for 16bit targets.

cc @stjepang
cc @alexcrichton
bors added a commit that referenced this pull request Jul 24, 2018
Prefer sort_unstable*() over sort*()

Since `sort_unstable` is considered typically faster than the regular `sort` ([benchmarks](#40601 (comment))), it might be a good idea to check for potential wins in the compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants