Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update property enums to support CodepointTrie #1089

Merged
merged 13 commits into from
Oct 1, 2021

Conversation

iainireland
Copy link
Contributor

Putting up an initial draft for feedback. This partially resolves #1071: it handles General_Category, but not Script yet.

Points of interest:

  1. Going along with a discussion we had earlier, I split GeneralCategory into GeneralCategory and GeneralSubcategory. The former is [repr(u32)] and can represent grouped categories like Letter. It's useful for queries. The latter is [repr(u8)] and represents a single specific subcategory (eg TitlecaseLetter). It's primarily useful for output. GeneralSubcategory is what's stored in the codepoint trie.
  2. I pulled in num_enum as a dependency to automatically derive GeneralSubcategory::try_from::<u8>. It is no_std compatible, has 2.5M downloads all time, and has minimal additional dependencies. The alternative would be writing a couple of large match statements by hand, or writing our own proc_macro to do so.
  3. In retrospect it's possible that icu_uniset should instead have been icu_uprops or something like that, since the same property definitions are used for both Uniset and CodepointTrie. For now I've put the ULE code in a separate file in icu_uniset; we can sort things out more later.

@echeran @sffc @Manishearth

@iainireland iainireland self-assigned this Sep 23, 2021
Manishearth
Manishearth previously approved these changes Sep 23, 2021
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ULE code looks correct,

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • Cargo.lock is different
  • components/uniset/Cargo.toml is different
  • components/uniset/src/enum_props.rs is different
  • components/uniset/src/lib.rs is different
  • components/uniset/src/props.rs is now changed in the branch
  • components/uniset/src/ule.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GeneralSubcategoryULE looks about right! Although you need to take in the latest changes to the trait from recent PRs.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/uniset/src/ule.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@iainireland
Copy link
Contributor Author

Assuming that CI doesn't have any surprises for me, this version is ready for review. I think this resolves #1071; if there's anything that was supposed to be included that isn't here, let me know.

I've gone ahead and converted Script to a newtype-wrapped u16; it's a bit ugly, but manageably so, and the difference between an enum pretending to be a u16 and a u16 pretending to be an enum is small.

@iainireland iainireland requested a review from sffc September 28, 2021 18:26
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks great! Just a few minor things

components/uniset/src/enum_props.rs Show resolved Hide resolved
components/uniset/src/enum_props.rs Show resolved Hide resolved
components/uniset/src/props.rs Show resolved Hide resolved
components/uniset/src/ule.rs Outdated Show resolved Hide resolved
components/uniset/src/ule.rs Show resolved Hide resolved
@iainireland iainireland marked this pull request as ready for review September 30, 2021 22:41
@iainireland iainireland requested a review from a team as a code owner September 30, 2021 22:41
Manishearth
Manishearth previously approved these changes Sep 30, 2021
}
}

unsafe impl ULE for GeneralSubcategoryULE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Please add a comment explaining why this impl satisfied the invariants of ULE

(yes, I know that the ZeroVec impls don't consistently do this, but I'm hoping to move towards always documenting this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 and see #1121 for an updated safety checklist for ULE

sffc
sffc previously approved these changes Oct 1, 2021
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo Manish's comment

@iainireland iainireland dismissed stale reviews from sffc and Manishearth via df643b0 October 1, 2021 01:44
@iainireland iainireland requested a review from a team October 1, 2021 01:46
@iainireland iainireland merged commit 4aaac96 into unicode-org:main Oct 1, 2021
@iainireland iainireland deleted the cpt-property-enums branch October 1, 2021 16:51
robertbastian pushed a commit to robertbastian/icu4x that referenced this pull request Oct 4, 2021
* Make GC exhaustive

GeneralCategory will never be extended
See https://www.unicode.org/policies/stability_policy.html

* Add GeneralSubcategory to represent raw GC data

* Implement AsULE for GeneralSubcategory

* Cargo fmt

* refactor

* Make GC repr(u8)

* Cargo fmt

* Convert Script to an identifier

* Implement AsULE for Script

* Implement validate_byte_slice instead of parse_byte_slice

* Impl From<GeneralSubcategory> for GeneralCategory

* Remove default-implemented ULE methods

* Add safety comment on GeneralSubcategoryULE impl

Co-authored-by: Iain Ireland <iain.i.ireland@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define property enums/newtypes, including AsULE impls
4 participants