-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tweak autocomplete for search features (for languages with accents...) #3979
Comments
Hey @althio, this is something that has come up before in #3236 and #3159. The current behavior in iD is to do exact matching on the preset name, and fuzzy matching on the preset terms. So if the preset name is "bâtiment", your preset terms could include "batiment" and it will then sort highly in the results if the user types "bat" (It looks like you added it here recently - I'd be interested to know if this search is working better now!) @1ec5 has been including some code folded terms in the Vietnamese preset translations and this seems to work ok, although it adds bloat and takes time to do. From what I understand, we don't want to generate these terms automatically because the difference can sometimes be significant. |
Yes, it is better! I enjoy my few less keystrokes 👍 Thanks for the hint and the links to previous issues, quite instructive. |
For Vietnamese, I've been putting the diacritic-folded terms at the end of the term lists, after synonyms. The downside is that diacritic-folded terms in shorter term lists influence the search results more strongly than diacritic-folded terms in longer term lists, sometimes even more strongly than preset names that happen to have a slightly larger edit distance. Off the top of my head, I've seen this happen with presets involving the Vietnamese word "trường" (truong, trương, truòng, etc.). I haven't looked into whether putting the synonyms after the diacritic-folded terms would yield better results in general, but I'd expect the results to be worse for shop presets, which have many synonyms. (I've been including folded synonyms at the end of the list.) A solution could be to allow localizers to provide synonyms and diacritic variants in separate fields, so we can weight them differently without any effect from the number of synonyms. Does Transifex allow individual messages to be marked as optional, for languages and presets that don't need folding? |
My itch to begin with:
When searching for
Building
in French [=bâtiment], autocomplete function gets thrown off by very minor differences on accents. So if it expectsbât...
it seems to score anything likebet...
,bot...
,bzt...
quite badly, but that includes alsobat...
where only the accent is missing ona
.I don't know where to adjust this? In presets, translations?
edit: Of course, I just found https://www.transifex.com/ideditor/id-editor/translate/#fr/presets
But I would like to know if a more general approach could exist...
Or can a general 'sanitize string' be applied? Something so that letters only differing by an accent are scored as more similar?
The text was updated successfully, but these errors were encountered: