-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc sort improvements #1706
base: master
Are you sure you want to change the base?
Misc sort improvements #1706
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your PR submission. Setting aside performance, I think it is a good idea to use this collate
package anyway as it makes the code more readable.
That being said, I tried your branch very briefly and found that if dirfirst
is disabled then the sorting is skipped entirely since sort.SliceStable
is never called. Please do more extensive testing to ensure the various sorting options are covered.
0ea3551
to
51144aa
Compare
Thanks for the quick review, and you're right, I was a bit tired when I made this. |
nav.go
Outdated
case naturalSort: | ||
sort.SliceStable(dir.files, func(i, j int) bool { | ||
s1, s2 := normalize(dir.files[i].Name(), dir.files[j].Name(), dir.ignorecase, dir.ignoredia) | ||
if !dir.reverse { | ||
return naturalLess(s1, s2) | ||
} else { | ||
return naturalLess(s2, s1) | ||
} | ||
}) | ||
lessfun = func(i, j int) bool { | ||
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1 | ||
} | ||
case nameSort: | ||
sort.SliceStable(dir.files, func(i, j int) bool { | ||
s1, s2 := normalize(dir.files[i].Name(), dir.files[j].Name(), dir.ignorecase, dir.ignoredia) | ||
if !dir.reverse { | ||
return s1 < s2 | ||
} else { | ||
return s2 < s1 | ||
} | ||
}) | ||
lessfun = func(i, j int) bool { | ||
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two cases have the same code, so you should be able to combine them:
case nameSort, naturalSort:
lessfun = func(i, j int) bool {
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
It's probably also worth putting a comment stating that naturalSort
is already handled above.
This actually raises the question of whether natural sorting should have its own option, instead of being lumped under sortby
, since it is really a sorting modifier and not a property (e.g. name, size, time) that is used as a basis for sorting. But changing the configuration options is a breaking change, so I think it is not worth worrying about for now.
nav.go
Outdated
// Finally sort after filtering it all | ||
sort.SliceStable(dir.files, lessfun) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought this through, but does it make sense to do all the filtering (i.e. dironly
, hidden
, filter
) first before doing any sorting? It would make the logic cleaner if all of the filtering could be moved to the top of this function, but I don't know if it will cause any issues.
Since you are pretty much rewriting the entire function, it's probably a good time to clean it up as much as possible.
I bumped some dependencies yesterday, looks like it causes a merge conflict. |
OK so I decided to play around with this a bit more, and I wrote the following code in a rush, didn't test it too much yet. I wonder if this might be more desirable: func (dir *dir) sort() {
dir.sortby = getSortBy(dir.path)
dir.dirfirst = getDirFirst(dir.path)
dir.dironly = getDirOnly(dir.path)
dir.hidden = getHidden(dir.path)
dir.reverse = getReverse(dir.path)
dir.hiddenfiles = gOpts.hiddenfiles
dir.ignorecase = gOpts.ignorecase
dir.ignoredia = gOpts.ignoredia
dir.files = dir.allFiles
filterFiles := func(files []*file, f func(*file) bool) (result []*file) {
for _, file := range files {
if f(file) {
result = append(result, file)
}
}
return result
}
if dir.dironly {
dir.files = filterFiles(dir.files, func(file *file) bool {
return file.IsDir()
})
}
if !dir.hidden {
dir.files = filterFiles(dir.files, func(file *file) bool {
return !isHidden(file, dir.path, dir.hiddenfiles)
})
}
if len(dir.filter) != 0 {
dir.files = filterFiles(dir.files, func(file *file) bool {
return !isFiltered(file, dir.filter)
})
}
collopts := []collate.Option{}
if dir.ignorecase {
collopts = append(collopts, collate.IgnoreCase)
}
if dir.ignoredia {
collopts = append(collopts, collate.IgnoreDiacritics)
}
if dir.sortby == naturalSort {
collopts = append(collopts, collate.Numeric)
}
coll := collate.New(language.Und, collopts...)
var lessfun func(i, j int) bool
switch dir.sortby {
case nameSort, naturalSort:
lessfun = func(i, j int) bool {
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
case sizeSort:
lessfun = func(i, j int) bool {
return dir.files[i].TotalSize() < dir.files[j].TotalSize()
}
case timeSort:
lessfun = func(i, j int) bool {
return dir.files[i].ModTime().Before(dir.files[j].ModTime())
}
case atimeSort:
lessfun = func(i, j int) bool {
return dir.files[i].accessTime.Before(dir.files[j].accessTime)
}
case ctimeSort:
lessfun = func(i, j int) bool {
return dir.files[i].changeTime.Before(dir.files[j].changeTime)
}
case extSort:
lessfun = func(i, j int) bool {
cmp := coll.CompareString(dir.files[i].ext, dir.files[j].ext)
return cmp == -1 || cmp == 0 && coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
}
if dir.reverse {
oldlessfun := lessfun
lessfun = func(i, j int) bool {
return oldlessfun(j, i)
}
}
if dir.dirfirst {
oldlessfun := lessfun
lessfun = func(i, j int) bool {
if dir.files[i].IsDir() == dir.files[j].IsDir() {
return oldlessfun(i, j)
}
return dir.files[i].IsDir()
}
}
sort.SliceStable(dir.files, lessfun)
dir.ind = max(dir.ind, 0)
dir.ind = min(dir.ind, len(dir.files)-1)
} |
Thanks, this was actually in my head at some point, but I was a bit wary of doing changes too big with my lack of knowledge of Go and its slices. Good thinking about I included your changes except that I made filtering go through the file list only once. |
f138d49
to
abfb1fd
Compare
I didn't realize it was possible to combine filters in the way you have done, nice. Anyway the code looks quite good now There's a minor issue with formatting that is causing the CI build to fail, but otherwise it is probably close to being in a state where it can be merged. Just out of interest how much of an increase in performance is there with the new changes? |
Not very different: Before:
After:
Basically, a little overhead for small directories, but in the absolute, nothing. Significant gains for the big ones (not even counting the dirfirst skipped with dironly trick or less to sort if a lot is filtered). |
abfb1fd
to
c152029
Compare
Sorry to trouble you, but I did some more testing and I found that the The following examples all print c := collate.New(language.Und)
fmt.Println(c.CompareString("a", "B")) // "B" should come first since it is uppercase c := collate.New(language.Und, collate.IgnoreCase)
fmt.Println(c.CompareString("a", "B")) c := collate.New(language.Und)
fmt.Println(c.CompareString("é", "f")) // "é" should come last since it is a special character c := collate.New(language.Und, collate.IgnoreDiacritics)
fmt.Println(c.CompareString("é", "f")) I'm not sure how to go about solving this though, I'm haven't used the |
Not troubling me, I'm quite happy you're doing so much testing (I should be doing!). Filed golang/go#67296 for now, because it sure as hell looks like a bug to me. |
Seeing the helpful comments in the issue I opened, I'm thinking about reverting all collate changes and doing something like this instead:
The goal being to avoid the massive consing of the current solution. Does it make sense to still do it in that PR? |
For As for |
Hello and sorry for the (very) long delay, my energy is a bit low these days and that I reverted to our own Tested extSort a bit more (because I didn't remember why the "\x00" thing got removed) and everything seems to work. I get the following in a test dir:
|
In fact, sort after all the filtering is done, makes more sense
Mostly authored by @joelim-work
And add the following optimizations: * Hoist the branching out of normalize() * In extSort, normalize the names only when extensions are equal * In extSort, use strings.Compare that does a proper 3-way comparison in Go 1.23
5f6eae2
to
c8f195e
Compare
So it's been a while since I last looked at this too. TBH, given that the In the meantime I think there are smaller things that can be improved, and I would accept them if each are submitted separately:
|
It's a bit understandable, but I wouldn't call it merely different, personally; especially since ~40 lines are removed and performance certainly improved. But oh well, that's how it is, and I agree that I did let my functional side bleed a bit too much into a very imperative language. |
I'm not completely against functional programming - for instance I can see that reversing the sort comparator function at the end saves the need to specify Maybe case sizeSort:
sort.SliceStable(dir.files, func(i, j int) bool {
if dir.reverse {
i, j = j, i
}
return dir.files[i].TotalSize() < dir.files[j].TotalSize()
}) As for performance, I don't notice any difference, and I don't recall seeing any other issues raised about sorting speed. I haven't done any benchmarking myself, but if it's generally in the nanoseconds then I don't think it's an important factor compared to the actual code. |
Well, the entire point of it was to avoid a branch per comparison and to copy paste code for each sort; but knowing how bad Go's inliner is with closures, who knows... Sorry for wasting your time, I guess. |
Hello, first contribution and first time touching Go here, so don't be nice and roast me well if I did some dumb things.
I was looking at the sort function because I want to implement some features and noticed some less than optimal code, be it in performance or "looking nice to me".
I didn't add any dependency (though
text
isn't indirect anymore), but https://github.com/jeandeaual/go-locale might be a good idea to use the system locale instead of hardcoding English. The other improvement that might be a good idea is to useslices.SortStableFunc
, as recommended by Go; though it's only builtin starting with 1.22.Here are the performance improvements I measured by timing sort and logging
Before:
After:
The large test directory contains
touch test/{1..100000}
as a more throughput oriented benchmark, since the other directories are basically instantaneously sorted.