Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for sorting by expensive-to-compute keys (feature slice_sort_by_cached_key) #34447

Closed
dato opened this issue Jun 24, 2016 · 19 comments · Fixed by #58074
Closed
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@dato
Copy link
Contributor

dato commented Jun 24, 2016

Hi—

Ideally, the implementation of sort_by_key() would invoke the received key function exactly once per item. This is highly beneficial when a costly computation (e.g., a distance function) needs to be used for sorting.

But Rust’s implementation as of 1.9 (link) calls the key function each time:

 pub fn sort_by_key(&mut self, mut f: F)
{
    self.sort_by(|a, b| f(a).cmp(&f(b)))
}

For comparison, on its day Python highlighted this in the release notes for 2.4. As per their current Sorting HOW TO:

The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.

Many thanks for considering. It’d be great if Rust could behave the same way.

@Thiez
Copy link
Contributor

Thiez commented Jun 24, 2016

I suspect the method works this way to avoid allocation. Most people would expect this method not to allocate, and to sort in place. If you want to calculate keys only once, perhaps you could introduce some kind of caching inside f. This should be possible because sort_by_key takes a FnMut.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jun 24, 2016

@Thiez [T]::sort_by uses merge sort and thus already allocates.

@Thiez
Copy link
Contributor

Thiez commented Jun 24, 2016

I stand corrected :-)

@ExpHP
Copy link
Contributor

ExpHP commented Jun 24, 2016

I imagine that the current method may still be more optimal for simple key functions like |obj| obj.some_member which lend themselves well to further optimization.

This is a marked difference from Python's case, where even the simplest key function still has overhead (making a Schwartzian transform-style sort the clear winner).

@DemiMarie
Copy link
Contributor

I think @ExpHP is correct: CPython's string sort, which is written in C, is much faster than any key function written in Python. Note that this is not necessarily true for PyPy (which has a JIT), and is definitely not true in Rust.

@arthurprs
Copy link
Contributor

arthurprs commented Jul 1, 2016

The lambda is just passed down as a comparator, you'd be surprised by how much the optimizer can do with that. The name is misleading but this is not the key argument python in python sort(ed) this would actually be the cmp argument

The behavior you suggest has it's uses but it's probably something that belong to an external crate.

@ExpHP
Copy link
Contributor

ExpHP commented Jul 1, 2016

The name is misleading but this is not the key argument python in python sort(ed) this would actually be the cmp argument

To play devil's advocate a bit, this is not entirely a fair assessment; rust already provides the capability of cmp via sort_by. So to me it does not seem unreasonable for some to expect that sort_by_key might be more than just a convenience method.

Actually, for that reason, I was surprised to learn that a sort_by_key method had been added in the first place! As somebody coming from Python, prior to 1.7.0 I was always frustrated by having to write things like |a,b| a.member.cmp(&b.member), and often dearly wished for a convenience method like the sort_by_key that exists today.

Then one day while converting some Python code I came across a sort with an expensive key method, and suddenly it all made sense. There are two different idioms to sorting a list by a key, each suited to different use cases. At the time, I concluded that this must have been the reason why rust had no sort_by_key.

@steveklabnik steveklabnik added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-libs labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 25, 2017
kennytm added a commit to kennytm/rust that referenced this issue Mar 27, 2018
Add slice::sort_by_cached_key as a memoised sort_by_key

At present, `slice::sort_by_key` calls its key function twice for each comparison that is made. When the key function is expensive (which can often be the case when `sort_by_key` is chosen over `sort_by`), this can lead to very suboptimal behaviour.

To address this, I've introduced a new slice method, `sort_by_cached_key`, which has identical semantic behaviour to `sort_by_key`, except that it guarantees the key function will only be called once per element.

Where there are `n` elements and the key function is `O(m)`:
- `slice::sort_by_cached_key` time complexity is `O(m n log m n)`, compared to `slice::sort_by_key`'s `O(m n + n log n)`.
- `slice::sort_by_cached_key` space complexity remains at `O(n + m)`. (Technically, it now reserves a slice of size `n`, whereas before it reserved a slice of size `n/2`.)

`slice::sort_unstable_by_key` has not been given an analogue, as it is important that unstable sorts are in-place, which is not a property that is guaranteed here. However, this also means that `slice::sort_unstable_by_key` is likely to be slower than `slice::sort_by_cached_key` when the key function does not have negligible complexity. We might want to explore this trade-off further in the future.

Benchmarks (for a vector of 100 `i32`s):
```
# Lexicographic: `|x| x.to_string()`
test bench_sort_by_key ... bench:      112,638 ns/iter (+/- 19,563)
test bench_sort_by_cached_key ... bench:       15,038 ns/iter (+/- 4,814)

# Identity: `|x| *x`
test bench_sort_by_key ... bench:        1,346 ns/iter (+/- 238)
test bench_sort_by_cached_key ... bench:        1,839 ns/iter (+/- 765)

# Power: `|x| x.pow(31)`
test bench_sort_by_key ... bench:        3,624 ns/iter (+/- 738)
test bench_sort_by_cached_key ... bench:        1,997 ns/iter (+/- 311)

# Abs: `|x| x.abs()`
test bench_sort_by_key ... bench:        1,546 ns/iter (+/- 174)
test bench_sort_by_cached_key ... bench:        1,668 ns/iter (+/- 790)
```
(So it seems functions that are single operations do perform slightly worse with this method, but for pretty much any more complex key, you're better off with this optimisation.)

I've definitely found myself using expensive keys in the past and wishing this optimisation was made (e.g. for rust-lang#47415). This feels like both desirable and expected behaviour, at the small cost of slightly more stack allocation and minute degradation in performance for extremely trivial keys.

Resolves rust-lang#34447.
@varkor
Copy link
Member

varkor commented Mar 28, 2018

There now exists an unstable method, sort_by_cached_key that achieves this purpose. A new method was added to make sure performance was never sacrificed. Essentially,sort_by_cached_key should be more efficient on anything but extremely simple key functions (e.g. property accesses).

Note that, when this is stabilised, we should add a clippy lint to suggest using this method over sort_by_key in appropriate circumstances, if possible.

@scottmcm
Copy link
Member

Reopening as the unstable tag on the method is pointing here as a tracking issue (https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.sort_by_cached_key).

@scottmcm scottmcm reopened this Apr 25, 2018
@scottmcm scottmcm changed the title slice::sort_by_key() computes each key more than once Tracking issue for sorting by expensive-to-compute keys (feature slice_sort_by_cached_key) Apr 25, 2018
@varkor
Copy link
Member

varkor commented Aug 19, 2018

If/when this is stabilised, it is worth considering whether adding is_sorted_by_cached_key would be a good idea, if sort_by_key is: #53485.

@varkor
Copy link
Member

varkor commented Sep 16, 2018

Also partition_dedup_by_cached_key (#54279), though this might be a bit niche.

@ghost
Copy link

ghost commented Sep 16, 2018

@varkor Can't we just cache the key for the last visited element in is_sorted_by_key and the last unique element in partition_dedup_by_key?

@varkor
Copy link
Member

varkor commented Sep 17, 2018

Yes, there's so little key recalculation in the two methods that there's probably no need for specific cached methods. I'll admit I hadn't thought too much about them when I referenced them here: I was just making a quick note to remind myself later! You're right, though!

@Kerollmops
Copy link
Contributor

Kerollmops commented Sep 17, 2018

partition_dedup_by_cached_key could make the difference when you have extensive key computation. For example you will have to compute for 100 elements that are the same 200 times the same keys and only 101 times for the cached version, according to the current algorithm which computes the first key two times.

There is something we must be careful with, the current function that is used by partition_dedup_by_key is FnMut(&mut T) -> K. We must be sure that the user know that the function will only be called once on the last unique element. Or we can use another function signature that doesn't let the user change the value.

@lcnr
Copy link
Contributor

lcnr commented Dec 14, 2018

Should we add sort_unstable_by_cached_key as well?

@scottmcm
Copy link
Member

@lcnr sort_unstable is in libcore because it doesn't use additional memory, but _by_cached_key needs additional memory, so if you need cached keys you're probably fine with stable sort.

@varkor
Copy link
Member

varkor commented Dec 14, 2018

In addition, the stability of the current sort_by_cached_key is a side-effect of the implementation. It's unclear to me how you would implement an unstable version that had any benefits over the stable version.

@lcnr
Copy link
Contributor

lcnr commented Dec 14, 2018

@varkor I would add an unstable version that currently has no benefit over the stable version for the sake of completeness/to express intent. If someone does not need the stability of sort_by_cached_key, he could use sort_unstable_by_cached_key instead, even if the function is implemented like this:

#[inline(always)]
pub fn sort_unstable_by_cached_key<K, F>(&mut self, f: F)
    where F: FnMut(&T) -> K, K: Ord
{
    self.sort_by_cached_key(f)
}

In case we are able to think of a slightly faster unstable version in the far distant future, there would be no need to check every single invocation of sort_by_cached_key to check if the stability is actually needed.

kennytm added a commit to kennytm/rust that referenced this issue Feb 16, 2019
…ey, r=SimonSapin

Stabilize slice_sort_by_cached_key

I was going to ask on the tracking issue (rust-lang#34447), but decided to just send this and hope for an FCP here.  The method was added last March by rust-lang#48639.

Signature: https://doc.rust-lang.org/std/primitive.slice.html#method.sort_by_cached_key
```rust
impl [T] {
    pub fn sort_by_cached_key<K, F>(&mut self, f: F)
        where F: FnMut(&T) -> K, K: Ord;
}
```

That's an identical signature to the existing `sort_by_key`, so I think the questions are just naming, implementation, and the usual "do we want this?".

The implementation seems to have proven its use in rustc at least, which many uses: https://github.com/rust-lang/rust/search?l=Rust&q=sort_by_cached_key

(I'm asking because it's exactly what I just needed the other day:
```rust
    all_positions.sort_by_cached_key(|&n|
        data::CITIES.iter()
            .map(|x| *metric_closure.get_edge(n, x.pos).unwrap())
            .sum::<usize>()
    );
```
since caching that key is a pretty obviously good idea.)

Closes rust-lang#34447
@scottmcm
Copy link
Member

And it's in! Anyone want to update https://en.wikipedia.org/wiki/Schwartzian_transform#Comparison_to_other_languages? 😏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.