-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodePointTrie data provider #1167
Merged
Merged
Changes from 9 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
d1ca58c
Rename TrieTypeEnum to TrieType
iainireland 2731299
Implement Yokeable/ZeroCopyFrom for CodePointTrie and data struct
iainireland 3519819
Cargo fmt + minor fixes
iainireland 9142762
Add CPT struct to icu_provider_uprops data source struct
echeran 7ce722e
Renames data providers for UnicodeSet data ahead of introducing one f…
echeran 7784c1d
Matches CPT version to project/sub-crates, adds CPT as dep to provide…
echeran 1f6d7e7
Add WIP code for data provider for CodePointTrie data
echeran 3a890f5
More WIP code for CodePointTrie data provider implementation
echeran 888edc5
Fix error
Manishearth 4b6c986
Merge branch 'main' into cpt-data-transformer
echeran 1186512
Simplify constructing ZeroVec using ZV's new FromIterator impl
echeran b67d033
Merge branch 'main' into cpt-data-transformer
echeran 4abb8a4
Merge current snapshot of PR #1153 (refactor properties to separate c…
echeran ee4e8e9
Update path to uniset crate in CI job for benchmarking
echeran 69fc6e1
Merge branch 'main' into cpt-data-transformer
echeran c543200
Implement TrieValue for GeneralSubcategory
echeran 40381be
Implement TrieValue for Script
echeran d4bbcbd
Rename TrieValue trait's associate type for Result errors
echeran 997db93
Remove unneeded dependency
echeran 28b51c0
Revert version number of icu_codepointtrie
echeran e10eb66
Move data structs for UnicodePropertyMap from icu_codepointtrie to ic…
echeran 5b32a4d
Error message rewording
echeran 499b817
Finish reverting unneeded renaming/refactoring in icu_properties
echeran 7139775
Add docstrings for the uprops data providers
echeran fd0f748
Add test for Script using data provider for CodePointTrie data
echeran 4ce245c
Export CPT data provider symbol publicly
echeran b8aada4
Merge branch 'main' into cpt-data-transformer
echeran 8aca45a
Declare no_std for icu_codepointtrie
echeran a39dd49
Add `extern crate...` to import alloc libs
echeran 3689a11
Remove unused custom code for string -> enum conversion
echeran f24108d
Replace icu_provider dep with yoke, remove std feature in icu_codepoi…
echeran 0a32faf
Add derive feature to yoke dependency in icu_codepointtrie
echeran File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
// This file is part of ICU4X. For terms of use, please see the file | ||
// called LICENSE at the top level of the ICU4X source tree | ||
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ). | ||
|
||
use crate::error::Error; | ||
use crate::uprops_serde; | ||
use crate::uprops_serde::enumerated::EnumeratedPropertyCodePointTrie; | ||
|
||
use icu_codepointtrie::codepointtrie::{CodePointTrie, CodePointTrieHeader, TrieType, TrieValue}; | ||
use icu_codepointtrie::provider::{UnicodePropertyMapV1, UnicodePropertyMapV1Marker}; | ||
use icu_provider::prelude::*; | ||
use icu_uniset::enum_props::EnumeratedProperty; // TODO(#1160) - Refactor property definitions out of UnicodeSet | ||
use zerovec::ZeroVec; | ||
|
||
use core::convert::TryFrom; | ||
|
||
use std::fs; | ||
use std::path::PathBuf; | ||
|
||
pub struct EnumeratedPropertyCodePointTrieProvider { | ||
sffc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
root_dir: PathBuf, | ||
} | ||
|
||
impl EnumeratedPropertyCodePointTrieProvider { | ||
pub fn new(root_dir: PathBuf) -> Self { | ||
EnumeratedPropertyCodePointTrieProvider { root_dir } | ||
} | ||
|
||
fn get_toml_data(&self, name: &str) -> Result<uprops_serde::enumerated::Main, Error> { | ||
let mut path: PathBuf = self.root_dir.clone().join(name); | ||
path.set_extension("toml"); | ||
let toml_str = fs::read_to_string(&path).map_err(|e| Error::Io(e, path.clone()))?; | ||
toml::from_str(&toml_str).map_err(|e| Error::Toml(e, path)) | ||
} | ||
} | ||
|
||
impl<T: TrieValue> TryFrom<uprops_serde::enumerated::EnumeratedPropertyCodePointTrie> | ||
for UnicodePropertyMapV1<'static, T> | ||
{ | ||
type Error = DataError; | ||
|
||
fn try_from( | ||
cpt_data: EnumeratedPropertyCodePointTrie, | ||
) -> Result<UnicodePropertyMapV1<'static, T>, DataError> { | ||
let trie_type_enum: TrieType = | ||
TrieType::try_from(cpt_data.trie_type_enum_val).map_err(DataError::new_resc_error)?; | ||
let header = CodePointTrieHeader { | ||
high_start: cpt_data.high_start, | ||
shifted12_high_start: cpt_data.shifted12_high_start, | ||
index3_null_offset: cpt_data.index3_null_offset, | ||
data_null_offset: cpt_data.data_null_offset, | ||
null_value: cpt_data.null_value, | ||
trie_type: trie_type_enum, | ||
}; | ||
let index: ZeroVec<u16> = ZeroVec::clone_from_slice(&cpt_data.index); | ||
// TODO: make data have type ZeroVec<T> | ||
// | ||
let data: Result<Vec<T::ULE>, String> = if let Some(data_8) = cpt_data.data_8 { | ||
sffc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
data_8 | ||
.iter() | ||
.map(|i| *i as u32) | ||
.map(|i| T::parse_from_u32(i).map(|i| i.as_unaligned())) | ||
.collect() | ||
} else if let Some(data_16) = cpt_data.data_16 { | ||
data_16 | ||
.iter() | ||
.map(|i| *i as u32) | ||
.map(|i| T::parse_from_u32(i).map(|i| i.as_unaligned())) | ||
.collect() | ||
} else if let Some(data_32) = cpt_data.data_32 { | ||
data_32 | ||
.iter() | ||
.map(|i| *i as u32) | ||
.map(|i| T::parse_from_u32(i).map(|i| i.as_unaligned())) | ||
.collect() | ||
} else { | ||
return Err(DataError::new_resc_error( | ||
icu_codepointtrie::error::Error::FromDeserialized { | ||
reason: "Cannot deserialize data array for CodePointTrie in TOML", | ||
sffc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
}, | ||
)); | ||
}; | ||
|
||
let data = ZeroVec::Owned(data.map_err(DataError::new_resc_error)?); | ||
let trie = CodePointTrie::<T>::try_new(header, index, data) | ||
.map_err(DataError::new_resc_error); | ||
trie.map(|t| UnicodePropertyMapV1 { codepoint_trie: t }) | ||
} | ||
} | ||
|
||
impl<'data, T: TrieValue> DataProvider<'data, UnicodePropertyMapV1Marker<T>> | ||
for EnumeratedPropertyCodePointTrieProvider | ||
{ | ||
fn load_payload( | ||
&self, | ||
req: &DataRequest, | ||
) -> Result<DataResponse<'data, UnicodePropertyMapV1Marker<T>>, DataError> { | ||
// For data resource keys that represent the CodePointTrie data for an enumerated | ||
// property, the ResourceKey sub-category string will just be the short alias | ||
// for the property. | ||
let prop_name = &req.resource_path.key.sub_category; | ||
|
||
let toml_data: uprops_serde::enumerated::Main = self | ||
.get_toml_data(prop_name) | ||
.map_err(DataError::new_resc_error)?; | ||
|
||
let prop_enum: EnumeratedProperty = EnumeratedProperty::from(prop_name); | ||
|
||
let source_cpt_data: uprops_serde::enumerated::EnumeratedPropertyCodePointTrie = | ||
toml_data.enum_property.data.code_point_trie; | ||
|
||
let data_struct = UnicodePropertyMapV1::<T>::try_from(source_cpt_data)?; | ||
|
||
Ok(DataResponse { | ||
metadata: DataResponseMetadata { | ||
data_langid: req.resource_path.options.langid.clone(), | ||
}, | ||
payload: Some(DataPayload::from_owned(data_struct)), | ||
}) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this one of two ways:
&str
rather than a&TinyStr16
TinyStr16
by value as described in Recommendation for pattern matching zbraniecki/tinystr#22There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please make sure we are actually using this. If you can delete it, please do, because when we actually implement string-to-property parsing, it should be data-driven, not hard coded in this impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix.