-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
equal ss to ß in the searchfield #3159
Comments
Hadn't thought of that before but it makes sense. I agree we should make our preset search ß-tolerant. |
@1ec5 Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii? |
Would this work with CJK languages where characters can't really be converted to ascii? I can't get the example JS working with fold-to-ascii, so I can't check myself |
Digging into this a bit more, it looks like we'd need to compare both original and folded versions, otherwise it would mess up matching unicode fragments like 中國. hehe @pnorman beat me to it by 30 seconds |
Update: looks like node-diacritics is less greedy and ignores unrecognized unicode characters, rather than fold-to-ascii which removes unrecognized characters. bad (fold-to-ascii): > console.log(asciiFolder.fold('Happy ßirthday'));
'Happy ssirthday'
> console.log(asciiFolder.fold('中华人民共和国'));
'' good (node-diacritics): > console.log(removeDiacritics('Happy ßirthday'));
'Happy ssirthday'
> console.log(removeDiacritics('中华人民共和国'));
'中华人民共和国' |
Currently, the Vietnamese localization is working around this issue by including case-folded terms (“ho” for “hồ”) as synonyms for every preset, roughly doubling the size of the localization. Actually, to properly handle typeahead search, I’d have to add even more synonyms for situations where the user has entered only some of the diacritics (“đuong”, “đương”, “đuòng”, “dường”, etc. for “đường”), given the way most popular Vietnamese IMEs work.
It would be safe for some languages. However, for Vietnamese, it would have to be a fallback strategy with less weight than the normal search. Otherwise, you’d get unexpected results for many searches, such as “hố lửa” (fire pit) over “hồ bơi” (swimming pool) for “hồ” and “cho thuê xe” (car rental) over “công viên dành cho chó” (dog park) for “chó”. Ideally, localizations would be able to provide their own case folding logic. |
Hi
and thanks for this absolutely awesome editor!
people in germani use the "double s" > ß
people in switzerland and i also believe austrio don't.
so everytime someone from these countries tries to find "fussball feld" (soccer field)
they won't find it because it's written as "Fußball fled"
so if you could making appear while typing "fuss" like it does when i search for "soccer" would make lots of peoples life easier :)
thanks
The text was updated successfully, but these errors were encountered: