Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweak autocomplete for search features (for languages with accents...) #3979

Closed
althio opened this issue Apr 22, 2017 · 3 comments
Closed

tweak autocomplete for search features (for languages with accents...) #3979

althio opened this issue Apr 22, 2017 · 3 comments
Labels
localization Adapting iD across languages, regions, and cultures question Not Actionable - just a question about something

Comments

@althio
Copy link
Contributor

althio commented Apr 22, 2017

My itch to begin with:
When searching for Building in French [=bâtiment], autocomplete function gets thrown off by very minor differences on accents. So if it expects bât... it seems to score anything like bet..., bot..., bzt... quite badly, but that includes also bat... where only the accent is missing on a.
batiment
ba

I don't know where to adjust this? In presets, translations?
edit: Of course, I just found https://www.transifex.com/ideditor/id-editor/translate/#fr/presets

But I would like to know if a more general approach could exist...
Or can a general 'sanitize string' be applied? Something so that letters only differing by an accent are scored as more similar?

@bhousel
Copy link
Member

bhousel commented Apr 22, 2017

Hey @althio, this is something that has come up before in #3236 and #3159.

The current behavior in iD is to do exact matching on the preset name, and fuzzy matching on the preset terms. So if the preset name is "bâtiment", your preset terms could include "batiment" and it will then sort highly in the results if the user types "bat"

(It looks like you added it here recently - I'd be interested to know if this search is working better now!)

@1ec5 has been including some code folded terms in the Vietnamese preset translations and this seems to work ok, although it adds bloat and takes time to do. From what I understand, we don't want to generate these terms automatically because the difference can sometimes be significant.

@bhousel bhousel added localization Adapting iD across languages, regions, and cultures question Not Actionable - just a question about something labels Apr 22, 2017
@althio
Copy link
Contributor Author

althio commented Apr 22, 2017

(It looks like you added it here recently - I'd be interested to know if this search is working better now!)

Yes, it is better! I enjoy my few less keystrokes 👍

Thanks for the hint and the links to previous issues, quite instructive.
Feel free to close or keep the issue.

@bhousel bhousel closed this as completed Apr 22, 2017
@1ec5
Copy link
Collaborator

1ec5 commented Apr 22, 2017

@1ec5 has been including some code folded terms in the Vietnamese preset translations and this seems to work ok, although it adds bloat and takes time to do. From what I understand, we don't want to generate these terms automatically because the difference can sometimes be significant.

For Vietnamese, I've been putting the diacritic-folded terms at the end of the term lists, after synonyms. The downside is that diacritic-folded terms in shorter term lists influence the search results more strongly than diacritic-folded terms in longer term lists, sometimes even more strongly than preset names that happen to have a slightly larger edit distance. Off the top of my head, I've seen this happen with presets involving the Vietnamese word "trường" (truong, trương, truòng, etc.).

I haven't looked into whether putting the synonyms after the diacritic-folded terms would yield better results in general, but I'd expect the results to be worse for shop presets, which have many synonyms. (I've been including folded synonyms at the end of the list.)

A solution could be to allow localizers to provide synonyms and diacritic variants in separate fields, so we can weight them differently without any effect from the number of synonyms. Does Transifex allow individual messages to be marked as optional, for languages and presets that don't need folding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
localization Adapting iD across languages, regions, and cultures question Not Actionable - just a question about something
Projects
None yet
Development

No branches or pull requests

3 participants