Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tokenized search support to Quick Open dialog and FileSystem filter #88660

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

MajorMcDoom
Copy link
Contributor

@MajorMcDoom MajorMcDoom commented Feb 22, 2024

Closes #9145.

Tokenization

  • You can now search for files in the Quick Open dialog and the FileSystem filter using multiple tokens instead of a single continuous string.
  • This is useful for when you know multiple parts that the file path or name contains, but not the order, or what is between them.
  • This also means you don't have to use accurate naming conventions like snake_case or PascalCase or dash-case.
    TokenizedSearch

Smart Sorting for Quick Open

  • Prioritizes paths in which search tokens appear in order.
  • Prioritizes paths where the search tokens appear in the file name as opposed to folders.
  • Prioritizes paths in which search tokens appear at the front of folder or file names as opposed to the middle.
    TokenizedSearch_Order

Notes

  • This required a rewrite of the Quick Open dialog's search functionality, as there was no easy way to reconcile a tokenized approach with the previous approach, which pretty much assumed the user will be typing out actual paths with separators. There was even a case for exact match, which is highly improbable, and definitely not "Quick" as the name suggests.
  • The previous approach did allow some leniency for typos, due to its use of Sorensen-Dice similarity. In practice, however, it is unpredictable and can be either too lenient or too punishing. Typos are not accounted for by most search functionality in the editor, so it was strange to include it in this one feature, especially since it also required you to be flawlessly accurate in other ways.

@Cammymoop
Copy link
Contributor

related #82200

@AdriaandeJongh
Copy link
Contributor

As nobody else has commented about this: I would appreciate a bit of leniency towards typos. It's not as 'strange' of a feature if it helps people find the things they are looking for.

@MajorMcDoom
Copy link
Contributor Author

As nobody else has commented about this: I would appreciate a bit of leniency towards typos. It's not as 'strange' of a feature if it helps people find the things they are looking for.

That's totally fair, and my wording was not the greatest. I do think that my gripe was with the current implementation, which felt like a strange combination of high demand and leniency, and is also an odd exception amongst other instances of searching in the editor which do not suffer from the same issues despite not having typo-protection.

I do think that the impact of typos is already greatly reduced with the leniency introduced by tokenization. And one must also consider that a search function needs to also eliminate results that do not match. False positives are also likely to slow down the search, particularly if the file you are searching for is not the first result and you have to look through a list of results.

That said, something like what @Cammymoop linked could work well! I'll give it a try soon.

@Mickeon
Copy link
Contributor

Mickeon commented Feb 23, 2024

Ah, "multiple tokens". Is that how they're called? I referred to them as "terms" in both #65315 and #65352 .
I approve of this already because of consistency, but the more this logic is used, the more it should be unified together.

@MajorMcDoom
Copy link
Contributor Author

MajorMcDoom commented Feb 23, 2024

but the more this logic is used, the more it should be unified together.

I totally thought this too, but after looking at the different use cases, it turned out to be more different than similar. Handling paths for example is very different from handling just names. And handling search could also be different from handling filtering, depending on the UX requirements. For example, you might want typo-forgiveness in search, but you wouldn't want that in filtering.

The only similarity turned out to be looping through an array of tokens obtained from (String::split), and imo that's not enough to justify unification, just yet. But I agree, if we do end up with identical use cases, we should unify.

@MajorMcDoom
Copy link
Contributor Author

@Mickeon Oh hmmm, you know, upon closer inspection, it looks like the FileSystem filter could definitely use your utility logic!
It is gonna be a bigger task though. Perhaps that could be another PR. My current PR was a straight-forward integration into the current system, so it wasn't a huge task. It might be a good stop-gap?

@Mickeon
Copy link
Contributor

Mickeon commented Feb 23, 2024

It absolutely needs to be tackled separately (and maybe by someone that has even more understanding of the codebase) because it'd be a huge undertaking, and it would have to be applied literally everywhere. There are many filter LineEdits out there, and they all would benefit from mostly the same changes.

I'm just pointing it out because code duplication is nasty.
Plus, I do remember @Calinou often bringing up better filters. I do not know the exact wording of it, but it's something about accented characters not being filtered as one may expect.

@KoBeWi
Copy link
Member

KoBeWi commented Mar 12, 2024

This is useful for when you know multiple parts that the file path or name contains, but not the order, or what is between them.
This also means you don't have to use accurate naming conventions like snake_case or PascalCase or dash-case.

This was already supported (except the order part, but it's not often used I think?), with the caveat that you can't make spaces. From 4.3 dev4:

KXBEchqhld.mp4

editor/filesystem_dock.cpp Outdated Show resolved Hide resolved
editor/filesystem_dock.h Outdated Show resolved Hide resolved
@MajorMcDoom MajorMcDoom force-pushed the tokenized-file-search branch from c36f9eb to b8b698a Compare April 18, 2024 01:38
@MajorMcDoom MajorMcDoom force-pushed the tokenized-file-search branch from b8b698a to fbfda46 Compare April 18, 2024 02:13
@akien-mga akien-mga modified the milestones: 4.x, 4.3 Apr 18, 2024
@akien-mga akien-mga merged commit 3acd14d into godotengine:master Apr 19, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

@KoBeWi
Copy link
Member

KoBeWi commented May 1, 2024

This was already supported (except the order part, but it's not often used I think?), with the caveat that you can't make spaces.

Apparently this no longer works. You are forced to make spaces now. I preferred the old way tbh >_>

@brevven
Copy link

brevven commented Aug 2, 2024

Requiring spaces seems somewhat nonstandard when taking into consideration things like VScode and ctrlp.vim. Is requiring spaces something we could make optional?

@MajorMcDoom
Copy link
Contributor Author

Requiring spaces seems somewhat nonstandard when taking into consideration things like VScode and ctrlp.vim. Is requiring spaces something we could make optional?

See earlier comments in the thread. Both the old and new solutions have big caveats. There has to be a new algorithm implemented for search, in another PR, and would have to be used throughout the whole editor's search features for consistency and to limit code reuse. There were also candidates suggested.

@brevven
Copy link

brevven commented Aug 5, 2024

Thanks, understood, and can certainly understand the desire to standardize and DRY things up. Given that that's a lot of work, in the meantime, would adding an option on whether or not to require spaces be something the project would support. Not sure that I have time to contribute this, but wondering if it's the type of thing that's worth considering.

@MajorMcDoom
Copy link
Contributor Author

MajorMcDoom commented Aug 5, 2024

Thanks, understood, and can certainly understand the desire to standardize and DRY things up. Given that that's a lot of work, in the meantime, would adding an option on whether or not to require spaces be something the project would support. Not sure that I have time to contribute this, but wondering if it's the type of thing that's worth considering.

I'm not sure how far away we are from a standard, but I would like to just add an FYI, if this is something someone wants to pursue:

The option you're describing cannot be achieved as an option within the current algorithm. It would have to be a toggle between the old behaviour and the current behaviour, which would affect not only ordering of search results, but also whether certain ones show up at all, because they are fundamentally different algorithms. The current one cannot just toggle "requires spaces", because it fundamentally works by splitting the search string in the first place.

@a-johnston
Copy link
Contributor

I think these comments about space/no space handling have been effectively handled by this pr #82200. I made some changes which further improved matching on 3 personal projects (1 large, 2 small) which I used for testing. I'm not sure how much further work it would be to integrate that into the other search/filter modals though; potentially rather than have a static class we could configure class instances as matching contexts to tune each instance to the particular task, but still reuse the core matching logic.

@KoBeWi
Copy link
Member

KoBeWi commented Oct 1, 2024

There is also #56772, which seems to change the behavior too (supporting both old search and the tokenized one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make search boxes for files treat spaces the same way as search boxes for nodes
9 participants