Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodePointTrie data provider #1167

Merged
merged 32 commits into from
Oct 21, 2021
Merged

Conversation

echeran
Copy link
Contributor

@echeran echeran commented Oct 11, 2021

Closes #1073

This PR is rebased on top of PR #1161, which addresses the upstream dependency issue #1072 -- designing the data provider struct for CodePointTrie data. This PR will implement the actual code needed for the data provider, based on that design.

The pieces of work involved, to wit:

  • (imho) Renaming of data provider structs for consistency Refactor properties to separate crate #1153
  • Data provider code for CPT provider
  • Transformer code between source data struct (enumerated property TOML file) and data provider struct (UnicodePropertyMapV1)

@echeran echeran requested a review from iainireland October 11, 2021 22:47
@echeran
Copy link
Contributor Author

echeran commented Oct 11, 2021

cc @hsivonen

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • provider/uprops/src/enum_codepointtrie.rs is different
  • utils/codepointtrie/src/codepointtrie.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Please update to main to remove the obsolete diffs from Iain's work.

}

impl From<&TinyStr16> for EnumeratedProperty {
fn from(prop_short_alias: &TinyStr16) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this one of two ways:

  1. Match on a &str rather than a &TinyStr16
  2. Match on a TinyStr16 by value as described in Recommendation for pattern matching zbraniecki/tinystr#22

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please make sure we are actually using this. If you can delete it, please do, because when we actually implement string-to-property parsing, it should be data-driven, not hard coded in this impl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix.

provider/uprops/src/enum_codepointtrie.rs Outdated Show resolved Hide resolved
utils/codepointtrie/src/codepointtrie.rs Outdated Show resolved Hide resolved
provider/uprops/src/enum_codepointtrie.rs Outdated Show resolved Hide resolved
provider/uprops/src/enum_codepointtrie.rs Outdated Show resolved Hide resolved
utils/codepointtrie/Cargo.toml Outdated Show resolved Hide resolved
utils/codepointtrie/Cargo.toml Outdated Show resolved Hide resolved
provider/uprops/src/enum_codepointtrie.rs Show resolved Hide resolved
utils/codepointtrie/src/codepointtrie.rs Outdated Show resolved Hide resolved
utils/codepointtrie/src/codepointtrie.rs Outdated Show resolved Hide resolved
@@ -1,51 +0,0 @@
// This file is part of ICU4X. For terms of use, please see the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why is this file being deleted? I'm using this code in icu4x_js_regexp.

We could rename it to ~PropertiesUnicodeSetDataProvider if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was from a few days ago when I didn't recognize how it was being used. I recognized it since then and have it un-deleted locally. Let me push up the local changes.

utils/codepointtrie/src/codepointtrie.rs Outdated Show resolved Hide resolved
@@ -5,7 +5,7 @@
[package]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this PR to move the content from https://github.com/unicode-org/icu4x/blob/main/utils/codepointtrie/src/provider.rs into components/properties/src/provider.rs, and also remove the dependency on the data provider from utils/codepointtrie.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First part done (contents of codepointtrie provider.rs -> properties provider.rs).

CodePointTrie struct in codepointtrie.rs has default impls of Yokeable and ZeroCopyFrom implemented, so I didn't think I could remove the dependency on icu_provider within icu_codepointtrie.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the dependency to be yoke instead of icu_provider


// ResourceKey subcategory string is the short alias of the property

(GENERAL_CATEGORY_V1, "gc"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: By using the same macro here, we are putting these in the same namespace as the binary properties. For example: "uniset/alnum@1" and "uniset/gc@1". Do we want a separate namespace like "prop_maps/gc@1" ? We could defer this decision until later (but before v1).

components/properties/src/trievalue.rs Outdated Show resolved Hide resolved
provider/uprops/src/enum_codepointtrie.rs Show resolved Hide resolved
utils/codepointtrie/src/codepointtrie.rs Outdated Show resolved Hide resolved
@echeran echeran requested a review from sffc October 20, 2021 21:47
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diffs from last time look good; I'll do another pass soon

@echeran echeran marked this pull request as ready for review October 21, 2021 00:00
@echeran echeran requested a review from a team as a code owner October 21, 2021 00:00
sffc
sffc previously approved these changes Oct 21, 2021
@echeran echeran merged commit 3ea4f80 into unicode-org:main Oct 21, 2021
@echeran echeran deleted the cpt-data-transformer branch October 21, 2021 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CodePointTrie transformer code (data source to/from provider struct)
4 participants