-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodePointTrie data provider #1167
Conversation
TrieType no longer exists, so we don't need an awkward name for TrieTypeEnum.
…or CodePointTrie data
cc @hsivonen |
b468891
to
888edc5
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! Please update to main to remove the obsolete diffs from Iain's work.
components/uniset/src/enum_props.rs
Outdated
} | ||
|
||
impl From<&TinyStr16> for EnumeratedProperty { | ||
fn from(prop_short_alias: &TinyStr16) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this one of two ways:
- Match on a
&str
rather than a&TinyStr16
- Match on a
TinyStr16
by value as described in Recommendation for pattern matching zbraniecki/tinystr#22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please make sure we are actually using this. If you can delete it, please do, because when we actually implement string-to-property parsing, it should be data-driven, not hard coded in this impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix.
provider/uprops/src/provider.rs
Outdated
@@ -1,51 +0,0 @@ | |||
// This file is part of ICU4X. For terms of use, please see the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why is this file being deleted? I'm using this code in icu4x_js_regexp
.
We could rename it to ~PropertiesUnicodeSetDataProvider if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was from a few days ago when I didn't recognize how it was being used. I recognized it since then and have it un-deleted locally. Let me push up the local changes.
utils/codepointtrie/Cargo.toml
Outdated
@@ -5,7 +5,7 @@ | |||
[package] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use this PR to move the content from https://github.com/unicode-org/icu4x/blob/main/utils/codepointtrie/src/provider.rs into components/properties/src/provider.rs, and also remove the dependency on the data provider from utils/codepointtrie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First part done (contents of codepointtrie provider.rs -> properties provider.rs).
CodePointTrie struct in codepointtrie.rs has default impls of Yokeable and ZeroCopyFrom implemented, so I didn't think I could remove the dependency on icu_provider within icu_codepointtrie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the dependency to be yoke
instead of icu_provider
|
||
// ResourceKey subcategory string is the short alias of the property | ||
|
||
(GENERAL_CATEGORY_V1, "gc"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: By using the same macro here, we are putting these in the same namespace as the binary properties. For example: "uniset/alnum@1" and "uniset/gc@1". Do we want a separate namespace like "prop_maps/gc@1" ? We could defer this decision until later (but before v1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diffs from last time look good; I'll do another pass soon
Closes #1073
This PR is rebased on top of PR #1161, which addresses the upstream dependency issue #1072 -- designing the data provider struct for CodePointTrie data. This PR will implement the actual code needed for the data provider, based on that design.
The pieces of work involved, to wit:
UnicodePropertyMapV1
)