Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort_by takes a comparison fn, but min_by/max_by take a "scoring" fn #15311

Closed
ben0x539 opened this issue Jul 1, 2014 · 21 comments
Closed

sort_by takes a comparison fn, but min_by/max_by take a "scoring" fn #15311

ben0x539 opened this issue Jul 1, 2014 · 21 comments
Labels
P-medium Medium priority

Comments

@ben0x539
Copy link
Contributor

ben0x539 commented Jul 1, 2014

This is sorta inconsistent :(

fn sort_by(self, compare: |&T, &T| -> Ordering)

fn max_by<B: Ord>(&mut self, f: |&A| -> B) -> Option<A>

Having a comparison function is kinda more useful because then you could use min_by/max_by with non-total Ord types more easily, but it's also a bit more effort in the easy case. I dunno. fwiw, ruby's sort_by takes a "scoring"-like block and haskell's sortBy takes a comparison function.

@alexcrichton
Copy link
Member

cc @aturon

One possibility could be that foo_by takes T to U and works on U, where foo_with takes (T, T) and goes to whatever's being expected (in these cases an Ordering).

@ftxqxd
Copy link
Contributor

ftxqxd commented Jan 25, 2015

Has there been any progress on this? I only recently noticed this inconsistency, and it looks like sort_by has been stabilised even though {min,max}_by have not (due to this issue). I like @alexcrichton’s suggestion about _by v.s. _with, but the stable sort_by does the opposite, using a comparison function even though it ends in _by.

Python 2 provides both cmp (basically, (T, T) -> Ordering) and key (T -> U) parameters to its sorted built-in, but Python 3 switched to key alone, removing the old cmp parameter. Perhaps we should follow Python’s lead and switch to key functions, although I’m not entirely sure what the motivation for them is (other than simplicity in some cases).

@mpdn
Copy link
Contributor

mpdn commented Feb 1, 2015

Key functions would not work well for min/max things without a total order like floats. Ideally you would use comparison based {min,max}_by and then use a function like Haskells comparing, as @kmcallister mentioned in #17648, for when you only want to compare keys.

Unfortunately, currently Rust functions cannot return a closure, so creating a comparing-like function seems impossible right now. So if it is comparison based it ends up being unergonomic for key comparison, and if its key based it ends up being difficult or impossible to express min or max for things without a total order.

@mpdn
Copy link
Contributor

mpdn commented Feb 11, 2015

Scratch that, you actually CAN make a comparing-like function:

#![feature(unboxed_closures)]
#![feature(core)]

use std::cmp::Ordering;

pub fn comparing<F,T,B>(f: F) -> Comparing<F> where
    F: Fn(&T) -> B,
    B: Ord
{
    Comparing(f)
}

pub struct Comparing<F>(F);

impl<'a,'b,F,T,B> Fn<(&'a T, &'b T)> for Comparing<F> where
    F: Fn(&T) -> B,
    B: Ord
{
    type Output = Ordering;

    extern "rust-call" fn call(&self, args: (&T, &T)) -> Ordering {
        self.0(args.0).cmp(&self.0(args.1))
    }
}

#[derive(Debug)]
struct A(u8);

fn main() {
    let mut a = [A(4), A(2), A(6), A(9)];
    a.sort_by(comparing(|x: &A| x.0));
    println!("{:?}", a);
}

I can't get it to properly infer the parameter to the closure though. This would make it a lot more ergonomic to use a comparator based min_by and max_by.

@pnkfelix
Copy link
Member

1.0 beta, P-backcompat-libs, I-needs-decision.

@pnkfelix pnkfelix added P-backcompat-libs I-needs-decision Issue: In need of a decision. labels Feb 19, 2015
@pnkfelix pnkfelix added this to the 1.0 beta milestone Feb 19, 2015
@shepmaster
Copy link
Member

I'm ✨ 👍 ✨ of having {sort,min,max}_{by,with}, where *_by takes a Fn(U) -> T and *_with takes a Fn(U, U) -> Ordering, more-or-less.

@kmcallister
Copy link
Contributor

2×3 = 6 functions isn't too bad as combinatorial explosion goes, but it still indicates that an important abstraction is missing. For example Option's methods had the same issue before we worked out as_ref() and such.

In this case the clear alternative is Haskell's comparing, which transforms the _with forms into the _by forms. I'm not certain that the ergonomics are acceptable in Rust, but I think they probably could be.

@shepmaster
Copy link
Member

It looks like comparing can call the conversion function multiple times for the same item, is that correct? That might be undesired if the conversion is expensive.

@mpdn
Copy link
Contributor

mpdn commented Feb 22, 2015

@shepmaster Yes, but you can avoid that cost by using a map, though somewhat less ergonomic. Let's say we want to find the max of some computationally expensive f(x). We could then just call i.map(|x| (x, f(x))).max_by(comparing(|(_,h)| h)).

@shepmaster
Copy link
Member

I agree with the "less ergonomic" comment - there's yet another map to get back to the original object:

i.map(|x| (x, f(x))).max_with(comparing(|(_, h)| h)).map(|(v, _)| v)

Maybe if nothing else, maybe this bug can be to just rename and normalize the *_by methods to *_with to prevent misunderstanding. *_with is probably the more powerful option, and future work could add comparing or _*by as determined.

@kmcallister
Copy link
Contributor

Yeah,

i.map(|x| (x, f(x))).max_with(comparing(|(_, h)| h)).map(|(v, _)| v)

looks a lot better in Haskell:

map fst . maximumBy (comparing snd) $ map (\x -> (x, f x)) i

Although the right-to-left data flow always bugged me. Naturally, this code can be made even shorter and less clear.

@alexcrichton
Copy link
Member

Upon thinking more on this, I'm starting to convince myself that the scoring fn variant is not all that useful for Rust unfortunately. For example if you have a struct Person which has a name: String and you want to sort an array of people by their name, you'd in theory want to do:

people.sort_by(|person| &person.name);

Unfortunately you cannot return a reference into the argument passed into the closure, so this will not compile. This means you'll need to either call clone (bad) or move into a wrapper struct (bad). This somewhat indicates to me that we want |A, B| -> C for these functions.

Additionally, the interaction between sort and max/min is interesting in that sort can greatly benefit from the closure returning an Ordering (e.g. the item in question could go either left or right in the sort). The min/max functions, however, only really need a bool return value indicating whether A is less than B (or greater than). For this case, returning an Ordering is probably overkill and may even be more inefficient in some cases.

There is a downside, however, to not returning Ordering, which is that it's not super clear what the bool returned means. Does it mean that A should be accepted? Does it mean that A is the smallest? Does it mean that B is the smallest? This is somewhat countered by the fact though that the documentation could clearly say to call a < b which may solve this problem.

All this is basically leading me to:

  1. Scoring functions may not be too useful, so comparison functions should probably be favored.
  2. Comparison functions may only want to return Ordering when required, otherwise a bool should be favored for performance reasons.

How does that sound to others?

@EdorianDark
Copy link
Contributor

For the min/max funcitons it is probably better to introduce a new enum {smaller, greatorOrEqual} or to return an bool.

The sort_by function is not very to discover. In servo there is even a other sort function https://github.com/servo/rust-selectors/blob/master/src/quicksort.rs .
They are also discussing the api servo/rust-selectors#11 .

I think it would be better to leave the api unstable and clean it with an rfc.

@aturon
Copy link
Member

aturon commented Mar 24, 2015

@alexcrichton You've raised some excellent points -- in particular about scoring not being as usable as one might imagine at first.

I'm going to take this off the milestone; I think the sort_by API, which is stable, is fine. We can work out exactly which API to use for max_by/min_by when we stabilize them, but there's not a great deal of pressure to stabilize them right this second.

triage: P-high ()

@rust-highfive rust-highfive added P-medium Medium priority and removed P-backcompat-libs labels Mar 24, 2015
@aturon aturon removed this from the 1.0 beta milestone Mar 24, 2015
@aturon aturon removed the I-needs-decision Issue: In need of a decision. label Mar 24, 2015
@gavinb
Copy link
Contributor

gavinb commented Apr 9, 2015

It would be great if max_by returned an Ordering like sort_by. This would enable comparisons with more than simply values, such as ordering by a value but returning a tuple so that you can get the index of the maximum value.

Nemikolh added a commit to oil-lang/oil-rs that referenced this issue Apr 18, 2015
Once max_by will be updated, this wrapper will be removed.
See rust-lang/rust#15311
@yongqli
Copy link

yongqli commented Jun 26, 2015

I'd like to see this issue resolved as well. It might also make sense to implement max_by for f64, like how there is a f64::max. We've currently had to write our own max_by for f64s.

@kornelski
Copy link
Contributor

I wouldn't mind if the function was renamed (max_with, max_using or just inconsistent name are all fine by me), but I like its simplicity. It's a very common pattern in my programs, so I like having a shortcut for this.

However, I wouldn't want to it to require Ordering, or some gymnastics with extra closures, since that would make it verbose to the point it's easier to write a for loop that does the same.

Clash with Ord-less f64 is a problem indeed, but it's a pervasive problem that affects all such functions in Rust, so I think it should be dealt with for all functions (on language level), not just by tweaking this one function.

@photino
Copy link

photino commented Sep 13, 2015

I would like to see this issue resolved. Usability, clarity, and consistency is more important. It will be great to implement max_by/min_by like sort_by using a comparison function returning an Ordering. Not just f32 and f64, there are many other types that don't have an Ord.

@cuviper
Copy link
Member

cuviper commented Sep 14, 2015

I do find scoring useful in practice, personally. I can see how the example "score" of person.name is bad, requiring cloning and all, but it's still useful when you're dealing with something that may be expensive to compute. Suppose I have a list of directories, and I want the one with the biggest total size -- that recursive traversal is potentially expensive, but the resulting score is a value I can just Copy. Yes, the map workaround is feasible, but a scoring max_by or max_with or whatever will still be nicer.

@huonw
Copy link
Member

huonw commented Jan 5, 2016

We decided in #27724 to go with max_by_key for the existing max_by functionality (resp. min_by_key & min_by), with max_by probably eventually taking the -> Ordering comparison function.

@huonw huonw closed this as completed Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-medium Medium priority
Projects
None yet
Development

No branches or pull requests