equal ss to ß in the searchfield #3159

paskalito · 2016-06-13T11:52:52Z

Hi
and thanks for this absolutely awesome editor!

people in germani use the "double s" > ß
people in switzerland and i also believe austrio don't.

so everytime someone from these countries tries to find "fussball feld" (soccer field)
they won't find it because it's written as "Fußball fled"

so if you could making appear while typing "fuss" like it does when i search for "soccer" would make lots of peoples life easier :)

thanks

bhousel · 2016-06-13T13:13:39Z

Hadn't thought of that before but it makes sense. I agree we should make our preset search ß-tolerant.

bhousel · 2016-07-01T02:14:04Z

@1ec5 Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii?

pnorman · 2016-07-01T04:17:48Z

@1ec5 Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii?

Would this work with CJK languages where characters can't really be converted to ascii? I can't get the example JS working with fold-to-ascii, so I can't check myself

bhousel · 2016-07-01T04:18:41Z

Digging into this a bit more, it looks like we'd need to compare both original and folded versions, otherwise it would mess up matching unicode fragments like 中國.

hehe @pnorman beat me to it by 30 seconds

bhousel · 2016-07-01T04:49:10Z

Update: looks like node-diacritics is less greedy and ignores unrecognized unicode characters, rather than fold-to-ascii which removes unrecognized characters.

bad (fold-to-ascii):

> console.log(asciiFolder.fold('Happy ßirthday'));
'Happy ssirthday'
> console.log(asciiFolder.fold('中华人民共和国'));
''

good (node-diacritics):

> console.log(removeDiacritics('Happy ßirthday'));
'Happy ssirthday'
> console.log(removeDiacritics('中华人民共和国'));
'中华人民共和国'

1ec5 · 2016-07-01T14:04:08Z

Currently, the Vietnamese localization is working around this issue by including case-folded terms (“ho” for “hồ”) as synonyms for every preset, roughly doubling the size of the localization. Actually, to properly handle typeahead search, I’d have to add even more synonyms for situations where the user has entered only some of the diacritics (“đuong”, “đương”, “đuòng”, “dường”, etc. for “đường”), given the way most popular Vietnamese IMEs work.

Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii?

It would be safe for some languages. However, for Vietnamese, it would have to be a fallback strategy with less weight than the normal search. Otherwise, you’d get unexpected results for many searches, such as “hố lửa” (fire pit) over “hồ bơi” (swimming pool) for “hồ” and “cho thuê xe” (car rental) over “công viên dành cho chó” (dog park) for “chó”. Ideally, localizations would be able to provide their own case folding logic.

(closes #3159)

bhousel added the localization Adapting iD across languages, regions, and cultures label Jun 13, 2016

bhousel added a commit that referenced this issue Jul 8, 2016

Replace diacritics when doing fuzzy searches

0b3df36

(closes #3159)

bhousel mentioned this issue Jul 8, 2016

Replace diacritics when doing fuzzy searches #3236

Merged

2 tasks

bhousel closed this as completed in #3236 Jul 8, 2016

bhousel mentioned this issue Apr 22, 2017

tweak autocomplete for search features (for languages with accents...) #3979

Closed

1ec5 mentioned this issue Dec 10, 2020

Diacritic-independent preset search #8242

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

equal ss to ß in the searchfield #3159

equal ss to ß in the searchfield #3159

paskalito commented Jun 13, 2016

bhousel commented Jun 13, 2016

bhousel commented Jul 1, 2016

pnorman commented Jul 1, 2016

bhousel commented Jul 1, 2016

bhousel commented Jul 1, 2016

1ec5 commented Jul 1, 2016

equal ss to ß in the searchfield #3159

equal ss to ß in the searchfield #3159

Comments

paskalito commented Jun 13, 2016

bhousel commented Jun 13, 2016

bhousel commented Jul 1, 2016

pnorman commented Jul 1, 2016

bhousel commented Jul 1, 2016

bhousel commented Jul 1, 2016

1ec5 commented Jul 1, 2016