Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equal ss to ß in the searchfield #3159

Closed
paskalito opened this issue Jun 13, 2016 · 6 comments · Fixed by #3236
Closed

equal ss to ß in the searchfield #3159

paskalito opened this issue Jun 13, 2016 · 6 comments · Fixed by #3236
Labels
localization Adapting iD across languages, regions, and cultures

Comments

@paskalito
Copy link

Hi
and thanks for this absolutely awesome editor!

people in germani use the "double s" > ß
people in switzerland and i also believe austrio don't.

so everytime someone from these countries tries to find "fussball feld" (soccer field)
they won't find it because it's written as "Fußball fled"

so if you could making appear while typing "fuss" like it does when i search for "soccer" would make lots of peoples life easier :)

thanks

@bhousel
Copy link
Member

bhousel commented Jun 13, 2016

Hadn't thought of that before but it makes sense. I agree we should make our preset search ß-tolerant.

@bhousel bhousel added the localization Adapting iD across languages, regions, and cultures label Jun 13, 2016
@bhousel
Copy link
Member

bhousel commented Jul 1, 2016

@1ec5 Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii?

@pnorman
Copy link
Contributor

pnorman commented Jul 1, 2016

@1ec5 Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii?

Would this work with CJK languages where characters can't really be converted to ascii? I can't get the example JS working with fold-to-ascii, so I can't check myself

@bhousel
Copy link
Member

bhousel commented Jul 1, 2016

Digging into this a bit more, it looks like we'd need to compare both original and folded versions, otherwise it would mess up matching unicode fragments like 中國.

hehe @pnorman beat me to it by 30 seconds

@bhousel
Copy link
Member

bhousel commented Jul 1, 2016

Update: looks like node-diacritics is less greedy and ignores unrecognized unicode characters, rather than fold-to-ascii which removes unrecognized characters.

bad (fold-to-ascii):

> console.log(asciiFolder.fold('Happy ßirthday'));
'Happy ssirthday'
> console.log(asciiFolder.fold('中华人民共和国'));
''

good (node-diacritics):

> console.log(removeDiacritics('Happy ßirthday'));
'Happy ssirthday'
> console.log(removeDiacritics('中华人民共和国'));
'中华人民共和国'

@1ec5
Copy link
Collaborator

1ec5 commented Jul 1, 2016

Currently, the Vietnamese localization is working around this issue by including case-folded terms (“ho” for “hồ”) as synonyms for every preset, roughly doubling the size of the localization. Actually, to properly handle typeahead search, I’d have to add even more synonyms for situations where the user has entered only some of the diacritics (“đuong”, “đương”, “đuòng”, “dường”, etc. for “đường”), given the way most popular Vietnamese IMEs work.

Would you consider it safe to just run both the search fragment and the comparison string through a general purpose string folding library like fold-to-ascii?

It would be safe for some languages. However, for Vietnamese, it would have to be a fallback strategy with less weight than the normal search. Otherwise, you’d get unexpected results for many searches, such as “hố lửa” (fire pit) over “hồ bơi” (swimming pool) for “hồ” and “cho thuê xe” (car rental) over “công viên dành cho chó” (dog park) for “chó”. Ideally, localizations would be able to provide their own case folding logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
localization Adapting iD across languages, regions, and cultures
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants