Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align explorer sorting with platform sorting #27759

Open
bpasero opened this issue May 31, 2017 · 39 comments
Open

Align explorer sorting with platform sorting #27759

bpasero opened this issue May 31, 2017 · 39 comments
Assignees
Labels
feature-request Request for new features or functionality file-explorer Explorer widget issues
Milestone

Comments

@bpasero
Copy link
Member

bpasero commented May 31, 2017

It looks like our file sorting in the explorer does not match platform beahviour in some cases.

Windows:

  • a file foo.ts is sorted before foo_test.ts but we sort it the other way around

Linux:

  • a file foo.ts is sorted before foo_test.ts but we sort it the other way around
  • a lowercase file seems to be sorted before an upper case file but we seem to mix the sorting independent of the casing (e.g. folders [out, outb, outd, Outa, Outc] are showing up as [out, OutA, outb, Outc, outd]

macOS:

  • seems to be OK

We use a JavaScript Collator for the comparing here.

Unfortunately I am not able to tweak the Collator options to bring me the desired result...

@comerc
Copy link

comerc commented Aug 13, 2017

I want sort order like GitHub repo. Please!

@NickWest-appuri
Copy link

screen shot 2017-08-29 at 1 20 55 pm
I have some YML files whose names are based on GUIDs. They aren't even close to being alpha sorted.

vscode 1.15.1, macOS Sierra 10.12.6

@dlech
Copy link
Contributor

dlech commented Jan 8, 2018

Linux: a lowercase file seems to be sorted before an upper case file

Umm... I think it is the other way around, upper case first. Basically, just sorting using the ASCII value of each character.

@wildcart
Copy link

@dlech on Linux, at least, it depends on the chosen 'locale' and the environment variable LC_COLLATE can be used to influence this behaviour (this influences for example the ls command, ).

If you consider to implement / support platform specific behaviour you may want to consider evaluating the locale setting on Linux, specifically LC_COLLATE (if set, otherwise fallback to the set local).

user@host -- ~/tmp/casetest $ LC_COLLATE='en_GB.UTF-8' ls -1
2
5
a
A
Aa
aB
AZ
C
user@host -- ~/tmp/casetest $ LC_COLLATE='en_EN.UTF-8' ls -1
2
5
A
AZ
Aa
C
a
aB
user@host -- ~/tmp/casetest $ LC_COLLATE='C' ls -1
2
5
A
AZ
Aa
C
a
aB

@leidegre
Copy link

leidegre commented Jan 21, 2019

It's called shortlex, or lexiographic sort order. The only thing you need to tweak is the string length. If A is shorter than B then A is smaller than B. This is not going to be covered by any collation. An alternative to this is to introduce padding (padding of the sort so that the comparison is less) but I don't think it's reasonably to do that due to the extra garbage generated.

To be more specific.

For two strings a and b of unequal length, you take the Math.min(a.length, b.length) of both strings and compare that using whatever compare you like to use. If they are equal, i.e. c = 0 then you use the string length to finalize the sort order. i.e. if a and b shared a common prefix but a is shorter, then a is smaller. etc.

@bpasero something like this:

const a = one || "";
const b = other || "";

const minLen = Math.min(a.length, b.length);

const result = intlFileNameCollator
  .getValue()
  .collator.compare(a.substr(0, minLen), b.substr(0, minLen));

if (
  result === 0
) {
  if (a.length < b.length) {
    return -1;
  }
  if (b.length < a.length) {
    return +1;
  }
  return 0;
}

return result;

I dropped the collatorIsNumeric stuff because it just adds confusion.

@egonelbre
Copy link

egonelbre commented Jan 21, 2019

Shortlex orders primarily by length, the code presented implements a different ordering. The example:

aggregate.go
aggregate_registry.go
event.go
event_registry.go
README.md
rehydration.go

as shortlex order would be:

event.go
README.md
aggregate.go
rehydration.go
event_registry.go
aggregate_registry.go

Windows Explorer uses Natural Sort, because length is not sufficient for good order, as an example:

action_5_example.txt
action_10_ex.txt

@leidegre
Copy link

@egonelbre then I misunderstood the meaning of shortlex, I should have just said lexicographic. but the code does what it is supposed to do. that is, first sort up to X characters, then use the length as a discriminator.

@leidegre
Copy link

Why complicate this though. Lexicographic (not shortlex as I incorrect first called) it easy to implement and understand. This is not going to get done if we insist on extra work to align with something which is highly Windows Explorer specific. That should not be the high water mark here.

@egonelbre
Copy link

egonelbre commented Jan 22, 2019

It's not really Explorer specific, you can read more about it in https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/. I mentioned it because your other issue showed that as an example.

The reason you don't want to use lexicographic first is due to the last example.

action_5_example.txt
action_10_ex.txt

Sorted lexicographically is:

action_10_ex.txt
action_5_example.txt

@leidegre
Copy link

@egonelbre as a programmer, I don't care. But as a user of Windows Explorer, I could see why adhering to the natural sort order would seem more natural.

@isidorn isidorn self-assigned this Jan 28, 2019
@isidorn isidorn added this to the On Deck milestone Jan 28, 2019
@fenchu
Copy link

fenchu commented Jun 14, 2019

I tried making this: #75415 but was closed. There is currently no option to make the explorer sort native.

and no-one writes 5, 10 we write 05,10, this is a fundamental fact.

@AnyhowStep
Copy link

AnyhowStep commented Mar 1, 2020

vs code sorts like this,

log-2
log-10

I'd rather it sort like this,

log-10
log-2

It is very jarring for me when vs code does not sort lexicographically.

Almost every other tool I use sorts lexicographically. When vs code tries to be different, it just confuses me for a short moment. Repeatedly. And it adds up.

Like @fenchu said, if I wanted to sort by numeric values, I'd zero-pad those numbers to the desired length.

@leilapearson
Copy link
Contributor

@bpasero I'm hoping you can advise and save me some time if this doesn't make sense or is unlikely to be accepted as a pull request.

I'm considering creating a pull request that would add a new setting - explorer.sortCaseSensitive with a default value of false.

I considered adding one or more options to the existing explorer.sortOrder setting, but case sensitivity seems to be orthogonal to those options - and (nearly) doubling the number of options to add case sensitive versions doesn't seem like the best idea:

'explorer.sortOrder': {
  'type': 'string',
  'enum': [SortOrder.Default, SortOrder.Mixed, SortOrder.FilesFirst, SortOrder.Type, SortOrder.Modified],
  'default': SortOrder.Default,
  'enumDescriptions': [
    nls.localize('sortOrder.default', 'Files and folders are sorted by their names, in alphabetical order. Folders are displayed before files.'),
    nls.localize('sortOrder.mixed', 'Files and folders are sorted by their names, in alphabetical order. Files are interwoven with folders.'),
    nls.localize('sortOrder.filesFirst', 'Files and folders are sorted by their names, in alphabetical order. Files are displayed before folders.'),
    nls.localize('sortOrder.type', 'Files and folders are sorted by their extensions, in alphabetical order. Folders are displayed before files.'),
    nls.localize('sortOrder.modified', 'Files and folders are sorted by last modified date, in descending order. Folders are displayed before files.')
  ],
  'description': nls.localize('sortOrder', "Controls sorting order of files and folders in the explorer.")
},

This change would only partially address this open issue, since it:

  • only addresses case sensitivity
  • doesn't default to align with platform case sensitivity

I think it would probably satisfy a lot of people though, and TBH I'm not sure that aligning with platform case sensitivity is the best option. For example, I noticed a number of comments on this and related issues were asking to align the file sort order with github, which always does a case sensitive sort.

Anyway, by providing a setting, people can choose, and by defaulting to the current behavior nobody will be affected by the change unless they want to be.

Also, if the future default behavior is changed, this setting will still be useful for people who want to override that default behavior.

What do you think? Should I go ahead?

If yes, is this the right issue to reference in the PR or should I create a separate issue that just links to this one?

Thanks in advance!

@isidorn
Copy link
Contributor

isidorn commented Apr 15, 2020

@leilapearson thanks for the offer.
Ideally we would just align explorer sorting with platform sorting without any option. @bpasero already provided a code pointer where he is doing the comparing.

If that is not possible only then we can look into adding more settings.

An alternative is to look to open this up to extensions and then extensions could control this and satisfy the 20 different sorting styles that users want.

@leilapearson
Copy link
Contributor

leilapearson commented Apr 15, 2020

@isidorn thanks for the reply. That's why I asked before doing anything other than taking a look at the code.

Opening this to extensions is an interesting option. At the same time, I would still think that offering control over whether the sort is case sensitive or not should be a core option and not require an extension.

It isn't easy on some platforms to adjust the sort order - and having to figure out how to get your whole platform to sort case sensitive in order for VS Code to sort case sensitive seems a bit awkward? Especially if you primarily develop on one platform and only spend a bit of time developing on other platforms.

Also, I find that programming on a platform is a different context than using a platform for office work. Having different sort orders apply to the different contexts often makes sense.

For example, I tend to sort things by most recently modified when I'm working on documents and the like - so this is my default in file explorer and google docs. On the other hand, I don't want my code sorted by modification date and I'm happy with how things are sorted in my terminal - but unfortunately not so happy with how they are sorted in VS Code.

I do agree that too many settings can be a bad thing, but I'm curious if you agree or not that a setting to control case sensitivity would make sense regardless?

@leilapearson
Copy link
Contributor

P.S. An example of how hard it can be to change the sort (collate) order on a platform is OSX - which doesn't expose any nice way to do that it seems:

https://apple.stackexchange.com/questions/34054/case-insensitive-ls-sorting-in-mac-osx

@Sytten
Copy link

Sytten commented Jul 7, 2020

Any progress?

@leilapearson
Copy link
Contributor

@Sytten some sort order edge cases were addressed in #97200 and that change is available in vscode 1.46.0. See the PR for a detailed description.

I also have an open PR #97272 - old now and sure to need an update - to add some additional lexicographic options to allow sorting in unicode order, locale order with uppercase first, or locale order with lowercase first.

Which specific functionality were you hoping to see addressed?

@Sytten
Copy link

Sytten commented Jul 7, 2020

On MacOS the files are sorted case insensitive and I didnt find a way to sort them case sensitive (without affecting the order files/folders).

@leilapearson
Copy link
Contributor

PR #97272 adds the option to group files and folders by case, but that PR was submitted at a time when the reviewers weren't available and is out of date now. I'll take a look at resurrecting it.

@leilapearson
Copy link
Contributor

Just wanted to provide a couple of quick updates for anyone watching this issue.

  1. PR Compare full filenames #104528 was recently merged. This PR changes how aggregate.go and aggregate_repo.go are sorted. See Issue Folders in the Explorer with . are not sorted in alphabetical order #99955 if you want more details.
  2. PR New Sort Order Lexicographic Options setting for Explorer #97272 - which adds settings to group names by case and to sort in unicode order - has been updated, but will be left on hold for now.

@Papooch
Copy link

Papooch commented Jul 21, 2021

These discussions are taking years and new sorting order proposals are coming up. I think most sort options are so niche, that I think it would be better for an extension to provide them instead of putting them in the core. Will there ever be an option to provide a custom sorting algorithm, or a way for an extension to add one?

@leilapearson
Copy link
Contributor

Just a quick mention that PR #97272 was merged and released a while back.

This may resolve your issue @Sytten

@Sytten
Copy link

Sytten commented Jul 26, 2021

Thanks @leilapearson I will try it 👍

@ALIENQuake
Copy link

Having aligned explorer sorting with platform sorting is not 'niche',

Another example:
image

Please fix this!

@Mushr0000m
Copy link

Capture d’écran 2022-02-14 à 16 02 35

Same problem here shorter names are not first, prospect-edit.scss should be before prospect-edit-user.scss.

@leilapearson
Copy link
Contributor

leilapearson commented Feb 17, 2022

I agree it would be nice to be able to match the Windows file explorer order when desired (and probably by default when on Windows). It's complicated though and would require a function that can translate from unicode values to Windows Explorer sort order values - taking locale into account if necessary.

I wasn't able to find such a function, and implementing one without actually having the code used in Windows Explorer seems like it would require a lot of time-consuming experimentation with different characters and locales. That said, vscode is made by Microsoft, so someone on the team may be able to use their contacts to get access to the relevant Windows Explorer function to make this easier :-)

For my own possibly future reference - and maybe your interest - I've copied in a table showing one person's investigation

The table comes from:

https://superuser.com/questions/681322/windows-explorer-sorting-order-for-special-characters

Characters Allowed in File Names (sorted in File Explorer collating order)

Unicode
Character  Hex Value     Description
---------  ------------  ----------------------------------------
!          0021          exclamation mark
#          0023          number sign
$          0024          dollar sign
%          0025          percent sign
&          0026          ampersand
(          0028          left parenthesis
)          0029          right parenthesis
,          002C          comma
.          002E          full stop, period
'          0027          apostrophe
-          002D          hyphen, minus
;          003B          semicolon
@          0040          commercial at sign
[          005B          left square bracket
]          005D          right square bracket
^          005E          circumflex accent
_          005F          low line, underscore
`          0060          grave accent
{          007B          left curly bracket
}          007D          right curly bracket
~          007E          tilde
+          002B          plus sign
=          003D          equal sign
0-9        0030 – 0039   digit zero through digit nine
A-z        0041 – 005A,  capital letter A through Z
           0061 – 007A   small letter a through z

Note: File Explorer does not differentiate between capital and small letters in file names, i.e. ‘A’ and ‘a’ are considered to be the same character.

You can see in the above that Windows order is . _ - while unicode order - . _. I also know that my locale order is _ - . - so different yet again. I'm not sure if the actual order of punctuation characters in Windows Explorer also varies by locale or not.

Anyway, I'm open to helping out with this if someone can find me a function for reference :-)

@rodneyazev
Copy link

rodneyazev commented Mar 16, 2022

The main problem with not having these files sorted /nesting is having to maintain or support thousands of code and configuration files and folders, making adequate support using vscode difficult, as projects become more complex (bigger).

An easy way to build this algorithm would be to use a "weight" which would be defined by the sum of the char code or a numbered alphabetic list of each letter of the word. This sum would define who goes up and down. The "weight" of directories would always be greater than the "weight" of files.

@omBratteng
Copy link

I've noticed, that with the default explorer.sortOrder settings, files starting with aa comes after az.
Example:

touch {a,aa,ab,az}

image

Using macOS with Norwegian locale. But in finder, aa comes before ab, as expected.
image

@leilapearson
Copy link
Contributor

vscode uses Javascript's String.localeCompare() function - and it does behave oddly in Norwegian. I'm not familiar with Norwegian, but from what I'm reading it seems like the function may be treating "aa" as something like "å" which comes at the end of the Norwegian alphabet. Not exactly equivalent though based on my quick testing:

// English locale
console.log('a'.localeCompare('aa', 'en')); // -1
console.log('a'.localeCompare('z', 'en')); // -1
console.log('aa'.localeCompare('z', 'en')); // -1
console.log('aaa'.localeCompare('z', 'en')); // -1
console.log('aa'.localeCompare('az', 'en')); // -1
console.log('å'.localeCompare('aa', 'en')); // -1
console.log('å'.localeCompare('z', 'en')); // -1

// Norwegian locale
console.log('a'.localeCompare('aa', 'no')); // -1
console.log('a'.localeCompare('z', 'no')); // -1
console.log('aa'.localeCompare('z', 'no')); // 1
console.log('aaa'.localeCompare('z', 'no')); // 1
console.log('aa'.localeCompare('az', 'no')); // 1
console.log('å'.localeCompare('aa', 'no')); // -1
console.log('å'.localeCompare('z', 'no')); // 1

Interestingly, I noticed that if you set your primary language to Norwegian and sort your sample files above by file type in Finder - instead of by name - you will see the same order that you get using localeCompare() and in vscode explorer. So even in Finder the behaviour is inconsistent. You would expect all files of the same type to sort by name, and for the order to be consistent with what you see when sorting directly by name, but that isn't the case (at least not on my system).

image

I'll dig a bit further.

@omBratteng
Copy link

That’s correct, aa is pronounced as å, wasn’t till 1917 we converted from aa to å. Denmark in 1948.
I bet if you did the same with Danish, you would get same results as Norwegian.

Technically, it is correct. Aa is Å, and should be sorted with Å.

Correct alphabetization in Danish and Norwegian places Å as the last letter in the alphabet, the sequence being Æ, Ø, Å. This is also true for the alternative spelling "Aa". Unless manually corrected, sorting algorithms of programs localised for Danish or Norwegian will place e.g., Aaron after Zorro.

I just encountered this when I was directory named aad-pod-identity, and couldn’t understand why it ended up under wordpress in the list.

Perhaps a setting in VS Code that lets me override which locale to sort with. Or just using the "explorer.sortOrderLexicographicOptions": "unicode" setting, which put them in the “correct” order as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality file-explorer Explorer widget issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.