tweak autocomplete for search features (for languages with accents...) #3979

althio · 2017-04-22T12:16:31Z

My itch to begin with:
When searching for Building in French [=bâtiment], autocomplete function gets thrown off by very minor differences on accents. So if it expects bât... it seems to score anything like bet..., bot..., bzt... quite badly, but that includes also bat... where only the accent is missing on a.

I don't know where to adjust this? In presets, translations?
edit: Of course, I just found https://www.transifex.com/ideditor/id-editor/translate/#fr/presets

But I would like to know if a more general approach could exist...
Or can a general 'sanitize string' be applied? Something so that letters only differing by an accent are scored as more similar?

The text was updated successfully, but these errors were encountered:

bhousel · 2017-04-22T14:29:25Z

Hey @althio, this is something that has come up before in #3236 and #3159.

The current behavior in iD is to do exact matching on the preset name, and fuzzy matching on the preset terms. So if the preset name is "bâtiment", your preset terms could include "batiment" and it will then sort highly in the results if the user types "bat"

(It looks like you added it here recently - I'd be interested to know if this search is working better now!)

@1ec5 has been including some code folded terms in the Vietnamese preset translations and this seems to work ok, although it adds bloat and takes time to do. From what I understand, we don't want to generate these terms automatically because the difference can sometimes be significant.

althio · 2017-04-22T15:37:16Z

(It looks like you added it here recently - I'd be interested to know if this search is working better now!)

Yes, it is better! I enjoy my few less keystrokes 👍

Thanks for the hint and the links to previous issues, quite instructive.
Feel free to close or keep the issue.

1ec5 · 2017-04-22T18:10:09Z

@1ec5 has been including some code folded terms in the Vietnamese preset translations and this seems to work ok, although it adds bloat and takes time to do. From what I understand, we don't want to generate these terms automatically because the difference can sometimes be significant.

For Vietnamese, I've been putting the diacritic-folded terms at the end of the term lists, after synonyms. The downside is that diacritic-folded terms in shorter term lists influence the search results more strongly than diacritic-folded terms in longer term lists, sometimes even more strongly than preset names that happen to have a slightly larger edit distance. Off the top of my head, I've seen this happen with presets involving the Vietnamese word "trường" (truong, trương, truòng, etc.).

I haven't looked into whether putting the synonyms after the diacritic-folded terms would yield better results in general, but I'd expect the results to be worse for shop presets, which have many synonyms. (I've been including folded synonyms at the end of the list.)

A solution could be to allow localizers to provide synonyms and diacritic variants in separate fields, so we can weight them differently without any effect from the number of synonyms. Does Transifex allow individual messages to be marked as optional, for languages and presets that don't need folding?

bhousel added localization Adapting iD across languages, regions, and cultures question Not Actionable - just a question about something labels Apr 22, 2017

bhousel closed this as completed Apr 22, 2017

bagage mentioned this issue May 1, 2017

Preset matching failure on synonyms #4002

Closed

quincylvania mentioned this issue Dec 9, 2020

Diacritic-independent preset search #8242

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tweak autocomplete for search features (for languages with accents...) #3979

tweak autocomplete for search features (for languages with accents...) #3979

althio commented Apr 22, 2017 •

edited

Loading

bhousel commented Apr 22, 2017 •

edited

Loading

althio commented Apr 22, 2017

1ec5 commented Apr 22, 2017

tweak autocomplete for search features (for languages with accents...) #3979

tweak autocomplete for search features (for languages with accents...) #3979

Comments

althio commented Apr 22, 2017 • edited Loading

bhousel commented Apr 22, 2017 • edited Loading

althio commented Apr 22, 2017

1ec5 commented Apr 22, 2017

althio commented Apr 22, 2017 •

edited

Loading

bhousel commented Apr 22, 2017 •

edited

Loading