-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of UCD Segmentation properties #166
Conversation
Generate tables for three segmentation-related enumerated properties: * Grapheme_Cluster_Break * Word_Break * Sentence_Break
Uh... looks like rust 1.17 doesn't like code-blocks in doc-block metas. Need to find a way to get around this now... :| |
This looks like the issue with Basically, 1.17 can't tell the difference between a |
Right, @CAD97! Looks like multiple lines is the problem. I'm working on a fix now. Thanks for the reminder! |
Add `unic-ucd-segment` component with initial implementation of three main segmentation-related properties: * `Grapheme_Cluster_Break`, * `Word_Break`, and * `Sentence_Break`.
Okay, updated this WIP with the changes suggested in #170. I was so focused on some other parts of the code that I didn't realize the Now going to fix the macro syntax issue. (Which apparently is only fixed in |
Suggestion: Proof of concept https://play.rust-lang.org/?gist=af53821eae5868c2ca8284e3a6fbeeb8 |
Because of a bug in Rust compiler before `1.20.0` release, the old sytanx won't allow having multi-line doc-blocks on enum variants. Since we don't want to update our minimum-supported-rustc-version at the moment, need to update the macro syntax to mitigate the bug. The new syntax is loosly based on the `match` syntax, plus the newly accepted RFC to allow extra `VERTICAL LINE` prefixes.
Don't forget to update the example, the |
I think I came up with a more rust-y syntax based on the recent developments on the language side: $(#[$variant_meta:meta])+
| $variant:ident {
// ...
} That's an extra Now, one open question for this last part is: do we want to only support one syntax for this and update all the call-sites, or should we make it an optional syntax and only use it in new places needed? I like the first (and have done that in the last update) because stability is not a big deal for the macro (being considered an What do you think, @CAD97? |
Here's the RFC, btw: rust-lang/rfcs#1745 |
Just a simple search and replace to improve readability of internal implementations, specially to distinguish between names for properties themselves vs. names for property values.
Following the guideline for naming, plus not using a keyword.
Okay, 287d950 should address all the issues here. Any other last comments before we land? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, barring a few comment updates.
gen/src/writer/ucd/mod.rs
Outdated
name::generate(&clean_dir("unic/ucd/name/tables")); | ||
normal::generate(&clean_dir("unic/ucd/normal/tables")); | ||
ident::generate(&clean_dir("unic/ucd/ident/tables")); | ||
if false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this left over from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Thanks for catching it. Will fix and bors.
unic/char/property/src/macros.rs
Outdated
@@ -26,7 +26,7 @@ | |||
/// human => "Human-Readable Property Name"; | |||
/// | |||
/// /// Exactly one attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this to something like "any number of doc comments"; it's not accurate like this anymoe
@@ -23,21 +23,21 @@ char_property! { | |||
human => "My Property"; | |||
|
|||
/// Required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't required anymore; they should be made significant or deleted.
bors: r+ |
166: Initial implementation of UCD Segmentation properties r=behnam a=behnam Add UCD Segmentation source data to `/data/`, implement conversion from new files to property map data tables. Add `unic-ucd-segment` component with initial implementation of three main segmentation-related properties: * `Grapheme_Cluster_Break`, * `Word_Break`, and * `Sentence_Break`. Current implementation uses `char_property!()` macro for `EnumeratedCharProperty` implementation, which only supports `TotalCharProperty`. Since the `Other` (abbr: `XX`) value in all these properties are notions of non-existance of breaking property, we want to switch to `PartialCharProperty` domain type and use `Option<enum>`. This is left as a separate step because it needs changes to the macro.
Build succeeded |
Add UCD Segmentation source data to
/data/
, implement conversion from new files to property map data tables.Add
unic-ucd-segment
component with initial implementation of threemain segmentation-related properties:
Grapheme_Cluster_Break
,Word_Break
, andSentence_Break
.Current implementation uses
char_property!()
macro forEnumeratedCharProperty
implementation, which only supportsTotalCharProperty
.Since the
Other
(abbr:XX
) value in all these properties are notionsof non-existance of breaking property, we want to switch to
PartialCharProperty
domain type and useOption<enum>
. This is leftas a separate step because it needs changes to the macro.
Tracker: #135