Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort items in rustdoc with integers in their names more naturally #39596

Closed
clarfonthey opened this issue Feb 6, 2017 · 19 comments
Closed

Sort items in rustdoc with integers in their names more naturally #39596

clarfonthey opened this issue Feb 6, 2017 · 19 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@clarfonthey
Copy link
Contributor

Right now, Trait10 will be shown before Trait2 in rustdoc. It'd be nice if this could be changed to sort numbers in order by value whenever they're encountered.

@steveklabnik steveklabnik added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Feb 7, 2017
@phungleson
Copy link
Contributor

Hey guys, wanted to help with this, but this is the first time i look into rustdoc, any suggestion where i can start?

@clarfonthey
Copy link
Contributor Author

When I took a look myself, it was just a call to slice::sort that did the sorting, so, if you make a slice sorting function that works and give it a few tests, someone should be able to point you to the place to call it.

@phungleson
Copy link
Contributor

phungleson commented Feb 11, 2017

Thanks @clarcharr any place in the code you can put here?

I tried to search the whole project for Trait10 Trait2 and slice::sort but couldn't find anything.

@keeperofdakeys
Copy link
Contributor

I suspect the sorting is done here (https://github.com/rust-lang/rust/blob/master/src/librustdoc/html/render.rs#L1707), but I may be wrong.

@phungleson
Copy link
Contributor

Yep thanks @keeperofdakeys looks like so, i will try to make changes around it.

Thanks again ;)

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Feb 20, 2017

So, I found out a relatively easy way to do this type of sorting and decided to make a crate for it, which I've called version-sort. I'm not sure how we'd add this type of functionality to rustdoc specifically, but I figured that I'd mention that I have created the code that would do this sort of sorting. (heh)

@phungleson
Copy link
Contributor

Hehe I think this functionality can be used for more than just in version number, but file name etc.., not sure how to call it though

@Boddlnagg
Copy link
Contributor

This is known as "natural sort order", and there are at least two crates for this, though I don't know about their quality.

@ranma42
Copy link
Contributor

ranma42 commented Feb 20, 2017

Notice that although for human-written code the suggested order would in general be better, it might be a regression if the "number" is not decimal (which might happen for example for machine-generated types) as T000 T010 T0A0 T110 would then not be considered sorted.

@clarfonthey
Copy link
Contributor Author

@ranma42 Check out the crate I linked; I specifically track leading zeroes so that 001 comes before 01, before 1, before 10.

@clarfonthey
Copy link
Contributor Author

Floating point strings don't sort by value (i.e. 2.2 < 2.10) which is why I'd consider this a version sort, not a numerical sort. For rustdoc this doesn't matter either way because periods can't be in identifiers.

@ranma42
Copy link
Contributor

ranma42 commented Feb 20, 2017

@clarcharr The issue is not with leading zeros, but with non-digit characters. For example, would T100 sort before T1A0? (if you interpret the numbers in hexadecimal, i.e. as 0x100 and 0x1A0, this is the obvious sorting, but if you split at every non-digit, then it is not).

EDIT: oops, I changed the example. in the original one, would T010 be sorted before T0A0?

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Feb 21, 2017

Digits come before letters lexically, so, that'd still work as expected. Although I think that automatically coercing a-f as numbers has the ability for too many false positives.

For example, "String1" < "StringArgument" < "String10" just doesn't make any sense.

@ranma42
Copy link
Contributor

ranma42 commented Feb 21, 2017

Even though digits come before letters in the lexicographic order, this breaks when you group them. Specifically, the version-sort crate seem to tokenize the strings in numeric/non-numeric parts depending on decimal digits. This is generally useful, but it might not always be the correct choice. For example, according to version-sort, Version::new("T010") < Version::new("T0A0") is false.

I agree that the comparison "String1" < "StringArgument" < "String10" is meaningless, but the comparison "T000" < "T010" < "T0A0" < "T110" is rather reasonable to me, especially under the assumption that the 3-digit hexadecimal number has not been written by a human but automatically emitted by some code generator.

A better way to choose how the items should be sorted would involve looking at all of them together and trying to understand which patterns have been used to construct them, but this is obviously much more complicated than just comparing them in a pairwise fashion.

@ranma42
Copy link
Contributor

ranma42 commented Feb 21, 2017

My concern might actually not be much of a problem in practice (especially if it only affects auto-generated code, as that will often miss documentation).

So far, the main example of documentation that would be affected by this change (that I know of) are the 8/16/32/64 types in the std docs; they would certainly benefit from being sorted naturally.

@keeperofdakeys
Copy link
Contributor

If this is only for Trait1, Trait2, etc. Could it work with these rules?

  1. Only apply to numbers at the end of a string
  2. The prefix must match
  3. Don't apply to numbers starting with 0

This would mean T1A0 and T120 would compare the same as they do today, but T140 and T8 would compare naturally.

@ranma42
Copy link
Contributor

ranma42 commented Feb 21, 2017

@keeperofdakeys yes, your ruleset would definitely work (at least it would cover both my examples and the ones by the OP), assuming that it is applied to all the items as a whole. Otherwise you would end up using two different comparison functions when sorting T1A0 T120 and T140.

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Feb 21, 2017

@keeperofdakeys I like this idea! It would also make the sorting substantially easier to implement:

fn sort_key(s: &str) -> (&str, u64) {
    let split = s.bytes().rposition(|b| b < b'0' || b'9' < b).unwrap_or(0);
    // get rid of leading zeroes
    let split = split + s[split..].bytes().position(|b| b != b'0').unwrap_or(0);
    let strkey = &s[..split];
    let intkey = s[split..].parse().unwrap_or(0);
    (strkey, intkey)
}

clarfonthey added a commit to clarfonthey/rust that referenced this issue Mar 23, 2017
frewsxcv added a commit to frewsxcv/rust that referenced this issue Mar 24, 2017
Fix for rust-lang#39596: sort Trait2 before Trait10.

This is a change discussed in rust-lang#39596. Essentially, item names will be sorted as if they're (&str, u64) pairs instead of just `&str`, meaning that `"Apple" < "Banana"` and also `"Fruit10" > "Fruit2"`.

Sample sorting:

1. Apple
2. Banana
3. Fruit
4. Fruit0
5. Fruit00
6. Fruit1
7. Fruit01
8. Fruit2
9. Fruit02
10. Fruit20
11. Fruit100
12. Pear

Examples of generated documentation:
https://docs.charr.xyz/before-doc/test_sorting/
https://docs.charr.xyz/after-doc/test_sorting/

Screenshots of generated documentation:
Before: http://imgur.com/Ktb10ti
After: http://imgur.com/CZJjqIN
@GuillaumeGomez
Copy link
Member

Seems this issue has been fixed. Closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants