Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode-compliant islower/uppercase #38574

Merged
merged 4 commits into from
Dec 18, 2020
Merged

Unicode-compliant islower/uppercase #38574

merged 4 commits into from
Dec 18, 2020

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Nov 25, 2020

Closes #36618, using the new utf8proc_islower and utf8proc_isupper functions from utf8proc 2.6 (which we upgraded to in #38551).

Technically breaking, but I'm not sure who would be relying on the slight differences between the old behavior and the Unicode standard definitions?

@stevengj stevengj added unicode Related to unicode characters and encodings minor change Marginal behavior change acceptable for a minor release needs news A NEWS entry is required for this change labels Nov 25, 2020
@stevengj
Copy link
Member Author

Along the way, I noticed that:

  1. Maybe isletter should correspond to the Unicode "Alphabetic" derived property?
  2. titlecase(::String) really needs to conform more closely to the UAX #29's definition of word boundaries. Right now, it can break right in the middle of a grapheme if there are combining characters: titlecase("bôrked") == "BôRked" seems like a bug to me.

@stevengj
Copy link
Member Author

Probably this is too late for 1.6, so I'll wait to add NEWS until the 1.7-dev cycle.

@musm
Copy link
Contributor

musm commented Dec 16, 2020

Probably this is too late for 1.6, so I'll wait to add NEWS until the 1.7-dev cycle.

Since we've branched it sounds like now is a good time to add that.

@stevengj stevengj removed the needs news A NEWS entry is required for this change label Dec 16, 2020
@stevengj
Copy link
Member Author

Fixed the NEWS.

Copy link
Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. This can be squash-merged if you're done with it, @stevengj.

@stevengj stevengj merged commit 17de527 into master Dec 18, 2020
@stevengj stevengj deleted the sgj/islowerupper branch December 18, 2020 15:34
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
* Unicode-compliant islower/uppercase

* don't test isletter for non-L* letters

* include titlecase in alphas test

* add news
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor change Marginal behavior change acceptable for a minor release unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make isuppercase and islowercase agree with Unicode standard
3 participants