Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sorting for the file picker #1066

Closed
wants to merge 2 commits into from

Conversation

Triton171
Copy link
Contributor

@Triton171 Triton171 commented Nov 10, 2021

The main motivation for this change is that files under the current working directory should be prioritized over files that are somewhere else in the same workspace.

Matches of the file picker are now sorted by:

  • The closeness of the file name to the search with a small boost being given to files in the current directory
  • As a tie-breaker, the timestamp of the file is used as before

This sort of sorting could now similiarly be implemented for other pickers (for example for buffer access recency).

EDIT: I've previously used a strictly hierarchical sorting method which didn't work as well, I've updated this text to avoid any confusion.

@archseer
Copy link
Member

So I was imagining this would be more like a composite score where closer files are given a small boost to their match score. (Though I think the algorithm we use now already does this by giving a bonus to matches at the start of the string, if I search for "ca" the root Cargo.toml comes up, followed by ones in sub-directories.) In most cases the current working directory is set to the workspace root, because we want to pick files anywhere in the workspace. If you're trying to pick a file in the current directory, there's always :open that will do shell-like completion.

Similarly when I'm using the picker I want my query to matter the most, because that's what I'm searching for -- the current implementation will prioritize bad matches just because they happen to sit in the workspace root.

@Triton171
Copy link
Contributor Author

I think there is a misunderstanding: What I implemented doesn't change anything if the current directory is the workspace root (except fix a bug where previously the entries were sometimes ordered arbitrarily when they had the same match score).

The only thing it changes, is that if the current directory is not the workspace root, it prioritizes files that are (directly or indirectly) contained in the current directory.
For example, in the helix source code, if the current directory is helix/helix-term:

  • helix/helix-term/Cargo.toml will be prioritized over helix/Cargo.toml (and all other matches that are not contained in helix/helix-term
  • helix/helix-term/src/ui/mod.rs will be prioritized over all matching files that are not contained in helix/helix-term

The current master branch actually prioritizes helix/Cargo.toml over helix/helix-term/Cargo.toml which I think doesn't make sense: If I spawn helix in some subfolder of the workspace root, I expect to work primarily on files contained there, otherwise I could've just spawned it in the workspace root. This also makes it easier to use for people who've used different fuzzy finders that just search the current directory.

There are some cases where this leads to (probably) undesired sorting: Searching "mod" in the above example will display helix/helix-term/src/commands.rs before helix/helix-tui/src/backend/mod.rs. I'm not sure how this could be fixed, but I'd expect that boosting the score by some fixed amount will probably lead to weird behavior as well (and also depend on specifics of the fuzzy matcher implementation), I'm open to suggestions though.

@archseer
Copy link
Member

There are some cases where this leads to (probably) undesired sorting: Searching "mod" in the above example will display helix/helix-term/src/commands.rs before helix/helix-tui/src/backend/mod.rs.

Yeah this was my point. You don't want to do strict sorting (first sort by A, then B then C), instead you want a composite weighed score where the fuzzy matcher score is ~70% of the score, and the directory is 30% of the score.

@archseer
Copy link
Member

If I spawn helix in some subfolder of the workspace root, I expect to work primarily on files contained there, otherwise I could've just spawned it in the workspace root.

That's because helix will automatically try and find a workspace root and use it instead (we could make this configurable). I had the opposite experience of accidentally opening the editor in a subfolder and wondering why I can only see a part of my codebase.

@Triton171
Copy link
Contributor Author

Okay that makes sense.

If that's the direction we want to go, I can experiment a bit with a composite score. I wanted to avoid it because then we rely on the details of the fuzzy matcher (for example if they change the default values of SkimScoreConfig, we could suddenly get weird results, even though the order that the scores induce stays the same), but maybe that's the only way to get good results.

@Triton171
Copy link
Contributor Author

I've changed it to just add a score boost of 5 to files in the file picker that are in the current working directory. 5 should be fairly conservative, let me know what you think, though. The only arguably weird thing I noticed is that files in the root directory are still prioritized over files in the current directory:

helix/Cargo.toml is prioritized over helix/helix-term/Cargo.toml, even when helix/helix-term is the current directory.
In that case I'd say it mostly depends on personal preference and it's not a big deal either way.

@@ -2783,6 +2786,7 @@ fn symbol_picker(cx: &mut Context) {
));
Some((path, line))
},
|_, l_score, _, r_score| l_score.cmp(&r_score).reverse(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this everywhere? Why not just have it sorted in the order it will be displayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort function specifies the order it is displayed in. If we'd replace this with a different sort function, the matches would no longer be sorted by their matching score, so we need this line to mimick the old behavior.
I thought about making the sort function parameter an Option and using this as a default if None is given. I don't think that's a great idea though, especially because most pickers can probably be improved by adding additional criteria to their sort functions.

@pickfire
Copy link
Contributor

I have this idea as well. Nice to see you implementing it. But what @archseer said is correct, it should be a composite score, it should depends on multiple factor with each factor having a different weight rather than getting score by order.

@Triton171
Copy link
Contributor Author

Yes, the composite score definitely worked better than my first attempt at it.
For the file picker, it is now essentially a composite score of the matching score and whether it is in the current directory and for other pickers, the same could be done.

In the code, the composite score is modelled by adding a constant bonus (5) to the score, but this is equivalent to for example having a composite score where being in the current directory has a score of 1 and then composing the scores with weights 5/6 and 1/6. The advantage is that this way we only have one constant instead of 2 redundant ones.

The timestamp is just a tie-breaker, basically the same as before. I don't think there's a reasonable way to assign a score bonus to a timestamp difference so the old behavior is good here.

Implement sorting for the file picker such that:
* Sorting is primarily based on the matching score with files that are in the current directory being given a small boost
* Recency of access/modification/creating is used as a tie-breaker, similar to the previous implementation
@pickfire
Copy link
Contributor

I believe you might want to put in some comments regarding the current algorithm for ranking. The one you just explained should be good enough.

Copy link
Contributor

@pickfire pickfire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (not tested). I think this is good but I think I would need to have thumbs up from another maintainer.

@poliorcetics
Copy link
Contributor

I don't have time to test/review right now so apologies if this is a non-existent concern, but has this been tested with symlinks ?

I don't know what the expected behavior should be, but I wonder how it currently behaves.

@Triton171
Copy link
Contributor Author

The file picker in helix doesn't follow symlinks so there is not really anything to test. I'm also not sure in what scenario you'd want the file picker to follow symlinks.

@poliorcetics
Copy link
Contributor

That's ok, I never tested the picker with symlinks so I wondered how it behaved. Not following them at all is fine to me, thanks for the answer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants