Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Unicode 12.0 #260

Open
eyeplum opened this issue Mar 6, 2019 · 4 comments
Open

Upgrade to Unicode 12.0 #260

eyeplum opened this issue Mar 6, 2019 · 4 comments
Labels
A: source-data Source Data C: emoji Unicode Emoji C: idna IDNA: Internationalized Domain Names in Applications C: ucd Unicode Character Database feature New features requested or planned

Comments

@eyeplum
Copy link
Member

eyeplum commented Mar 6, 2019

Description

Update external data and all modules to Unicode 12.0. For changes in Unicode 12.0, see: https://www.unicode.org/versions/Unicode12.0.0/

According to Section M of the change note, migrating from Unicode 11.0 to 12.0 should be straight forward. For us, presumably it's basically updating all data files to Unicode 12.0 and regenerate all tables.

Blocked by #259 .

Unicode 12.1

It might be trivial to upgrade to Unicode 12.1 at the same time (assuming this issue will be implemented after 2019 May 7), as it only adds one character U+32FF SQUARE ERA NAME REIWA.

For details, see: https://unicode.org/versions/Unicode12.1.0/

@eyeplum eyeplum added C: ucd Unicode Character Database C: emoji Unicode Emoji A: source-data Source Data C: idna IDNA: Internationalized Domain Names in Applications labels Mar 6, 2019
@behnam behnam added the feature New features requested or planned label Apr 8, 2019
@data-man
Copy link

data-man commented Jul 2, 2020

Time for Unicode 13.0 :)

@crlf0710
Copy link

Time for Unicode 15.0 :(

I'm... trying to help implementing rust-lang/rust#101840 . Currently rustc is relying on unic_emoji_char::is_emoji for diagnostics. But it seems the Unicode data here is quite outdated...

@eyeplum
Copy link
Member Author

eyeplum commented Sep 17, 2022

Time for Unicode 15.0 :(

I'm... trying to help implementing rust-lang/rust#101840 . Currently rustc is relying on unic_emoji_char::is_emoji for diagnostics. But it seems the Unicode data here is quite outdated...

Hi there!

I have been using my own fork in recent years, the fork is currently updated to Unicode 14.0 (and will be updated to Unicode 15.0 soon).

I've been meaning to eventually merge those changes into here, perhaps now is a good time to give it a go :)

Cc: @behnam

@eyeplum
Copy link
Member Author

eyeplum commented Sep 17, 2022

My fork current has these changes:

  • A new module for Unihan (data updated to Unicode 14.0)
  • Unicode 11.0 data changes and segmentation algorithm changes
  • Unicode 12.0 and 12.1 data changes
  • Unicode 13.0 data changes
  • Unicode 14.0 data changes

If we want them to be merged here, I think we will need to decide a way to release the changes. Perhaps each Unicode version as a point releases? E.g.:

  • unic 0.10.0 for Unicode 11.0
  • unic 0.11.0 for Unicode 12.0
  • unic 0.11.1 for Unicode 12.1
  • unic 0.12.0 for Unicode 13.0
  • unic 0.13.0 for Unicode 14.0

Though I'm not sure what's the best way to handle the new Unihan module... Perhaps we can just chuck it in one of those releases...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: source-data Source Data C: emoji Unicode Emoji C: idna IDNA: Internationalized Domain Names in Applications C: ucd Unicode Character Database feature New features requested or planned
Projects
None yet
Development

No branches or pull requests

4 participants