Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode case functions don't handle special conventions correctly #19516

Open
helgee opened this issue Dec 6, 2016 · 4 comments
Open

Unicode case functions don't handle special conventions correctly #19516

helgee opened this issue Dec 6, 2016 · 4 comments
Labels
unicode Related to unicode characters and encodings

Comments

@helgee
Copy link
Contributor

helgee commented Dec 6, 2016

As previously discussed in #19469

The lowercase, uppercase, and the not yet merged titlecase function do not handle the special casing conventions outlined in UTR#21 correctly.

Examples

julia> lowercase("OΔΥΣΣΕΥΣ")
"oδυσσευσ" # wrong, uses the non-final sigma
"oδυσσευς" # would be correct, uses the final sigma

EDIT (2021/03/19): This example has become obsolete due to a 2017 change in German orthography.

julia> uppercase("Spaß")
"SPAß" # wrong
"SPASS" # would have been correct until 2017
@stevengj stevengj changed the title Unicode case functions do handle special conventions correctly Unicode case functions don't handle special conventions correctly Dec 6, 2016
@stevengj stevengj added the unicode Related to unicode characters and encodings label Dec 6, 2016
@stevengj
Copy link
Member

stevengj commented Dec 6, 2016

utf8proc implements case-folding, but I don't think it has the info for UTR21? Might require a patch to utf8proc?

@stevengj
Copy link
Member

stevengj commented Dec 6, 2016

See also JuliaStrings/utf8proc#54

@helgee
Copy link
Contributor Author

helgee commented Mar 19, 2021

The second example works nowadays because German orthography was changed in 2017 to include ẞ which is an uppercase ß.

julia> versioninfo()
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, icelake-client)
Environment:
  JULIA_PKG_DEVDIR = /Users/helge/projects/julia

julia> uppercase("spaß")
"SPAẞ" # correct

@stevengj
Copy link
Member

stevengj commented Jun 24, 2024

See also:

julia> Unicode.normalize("Spaß", casefold=true)
"spass"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

2 participants