-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Char property macro 2.0 #48
Conversation
As noted above, I've started to think that @behnam, I'd like to include the module definition in the macro, so it can be doc'd and it's clear that a module definition is happening; what do you think the best syntax would be? It would be interesting and possible to make it optional, but not exactly a piece of cake. I'd prefer to start with a mandatory module for Note that using this on 1.19 will also entail restricting documentation on variants to exactly 0 or 1 attributes, as 1.19 Once 1.20 comes, the exactly 0 or 1 attributes will become (0 or more) or (1 or more) attributes, depending on which we pick. At that time we can also relax it to 0 or more if we chose to use |
gah, this works on stable other than the doc comment... This is ready for use other than working around the documentation matching issue. |
@behnam this macro now works and could in theory be adopted. However, it is crippled until Rust 1.20 because of rust-lang/rust/#24189 (fixed in rust-lang/rust/#42913). Luckily, it will require no changes to start working better once 1.20 goes stable. I've noted the limitation in the documentation for the macro. This mostly is just waiting on further discussion on the I can (and probably will) in the future add support for newtype structs like |
This commit (96c0d81) adds the functionality to close #66. I couldn't come up with a good way to do this on stable, so I fell on just I am fully open to alternatives though, I just picked something that worked. This does indeed work with the newtype pattern. Playground link. Of course, this requires that the buildscript added here be used on every crate that uses the macro, which is less than desirable. Again, if there are better ideas, I'm all ears; I'd love to change this to be less awkward. |
Some finagling later and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, @CAD97! I think now we have a more clear idea of what needs to be done here and we can finally land this.
I have a few comments inline. About the general direction (to echo what I've put inline), I think we should land the traits first, then the macro. This way, we know the expected API which macro is going to satisfy, and we will be sure that the macro impl is not changing the expected value from the manual impl.
Also, because the traits are a bit more complicated, since we need a bunch of them to model all the properties correctly. The macro here is only useful to generate flat-enum properties.
What do you think?
unic/utils/src/macros.rs
Outdated
/// # Effect | ||
/// | ||
/// - Implements `CharProperty` with the `abbr` and `long` presented in the appropriate method | ||
/// - Implements `FromStr` accepting the variant name or the abbr or long forms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Implements `FromStr` accepting any of variant, abbr, or long names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. That's better. Though I like calling it the "rust name" now; it makes it clear which version (RustName
) we're talking about.
}; | ||
} | ||
|
||
#[macro_export] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need #[macro_export]
for the internal one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we do, while we are using macro_rules!
.
#[macro_export] | ||
macro_rules! char_property { | ||
( | ||
$(#[$name_meta:meta])* pub enum $name:ident { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't pub
be also part of meta
here and not explicitly present in marco in and out patterns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is pub
doesn't match #[$:meta]
. The proper answer would be $:vis
(rust-lang/rust#41022) but that's still unstable or doing it manually, which is slightly overkill.
unic/utils/src/macros.rs
Outdated
/// - Implements `FromStr` accepting the variant name or the abbr or long forms | ||
/// - Implements `Display` using the given string, falling back when not provided on | ||
/// the long name, the short name, and the variant name, in that order | ||
/// - Populates the module `abbr_names` with `pub use` bindings of variants to their short names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to do the same with long_names
, now that we are clearly making a distinction?
The rational would be that some data/test files may use the long_names
and user wants to generate tables without having to rename everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some applications, I expect FromStr
would be preferred over including more symbols. But since it is here, it does make sense to expose it for when it is useful.
unic/utils/src/traits.rs
Outdated
@@ -0,0 +1,21 @@ | |||
//! # Unic - Utils - Traits | |||
//! | |||
//! A component of [`unic`: Unicode and Internationalization Crates for Rust](/unic/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All utils is now one component. We can drop the crate-like doc and only document it as a module, which is what it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
unic/utils/src/traits.rs
Outdated
use std::{fmt, str}; | ||
|
||
/// A Property that applies to a character | ||
pub trait CharProperty : str::FromStr + fmt::Display { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm thinking that we should do is to get this part in, have the trait implemented manually, and switch to the macro in a separate step.
Also, we need to make sure we differentiate between numerical properties, complex enum properties, and flat enum properties. For example, fn all_values()
should only be defined for flat enum properties.
I'll also submit my ideas for classifying properties with traits soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CharProperty
in this PR is merely a placeholder for the proper trait(s). I fully expect to wait on your traits and then to migrate to implementing those as appropriate rather than the ad-hoc way it is currently defined.
unic/utils/tests/macro_tests.rs
Outdated
assert_eq!(Property::EmptyVariant.long_name(), None); | ||
|
||
assert_eq!(format!("{}", Property::AbbrVariant), "AV"); | ||
assert_eq!(format!("{}", Property::LongVariant), "Long_Variant"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having the display()
-fallback-on-long-name to also replace underscore with space? Doing so, we won't haev to manually set display
property for most of the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it is possible to transform the captured :ident
in that manner using macro_rules!
, otherwise I would have done it already. You could in theory do it with a procedural macro, but I've not yet familiarized myself with how to do those, stable or not.
If we do go the path of a procedural macro in order to be able to do this transformation, I would (naively) suggest making the whole thing a procedural macro for better error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of doing it this way:
Call stringify!($ident)
in fn long_name()
, and then in fn display()
, if no value is provided, it falls back to
self.long_name().to_owned().replace("_", " ")
Wouldn't that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's not very performant, but it's just display()
and not actually much core functionality used for text processing. If ever we need to optimize it, we can provide the string literals.
Okay, the @CAD97, would be great if you could actually collapse the whole history here (or create a new PR), so we can start fresh. I find long branches with multiple merges from master in the middle very hard to conclude about and not that useful, history-wise. |
@behnam is there a standard "ignore me lint-wise" attribute that I can put on the code spat out by the macro? EDIT: |
Add char_property macro Add char_property macro Add char_property macro Signed-off-by: CAD97 <cad97@cad97.com> Start of buffer muncher Signed-off-by: CAD97 <cad97@cad97.com> squished the bug Signed-off-by: CAD97 <cad97@cad97.com> Munch Display Signed-off-by: CAD97 <cad97@cad97.com> Add user-facing macro Signed-off-by: CAD97 <cad97@cad97.com> Remove feature gate openers and debug macro invocation Signed-off-by: CAD97 <cad97@cad97.com> Simple test Signed-off-by: CAD97 <cad97@cad97.com> Merge char_property macro/trait into unic-utils codepoints Change associated consts to abbr_names mod Document char_property macro Patch to work on stable Remove feature gate opening Why was that still there Actually let this work on stable now All values slice (Closes benham#66) Align all_values with behnam/#70 long_names module Long_Variant fallback is `Long Variant` Fix the merge... Use the real *CharProperty traits
@behnam we're good to go if you would go over and review the changes once more! (I hope dearly I didn't mess anything up in the squash.) Extending this macro to cover newtype properties (like EDIT: Purely additive, and only touching my new files. 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this PR a lot, @CAD97! Great job!
I have a bunch of nit, and one real feedback inline. I think we really need to do something about the name modules definitions. I can't think of anything else, and should be good to land afterwards!
unic/utils/src/lib.rs
Outdated
@@ -7,6 +7,7 @@ | |||
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your | |||
// option. This file may not be copied, modified, or distributed | |||
// except according to those terms. | |||
//! Utilities for use in UNIC development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop this one now. (or later...)
} | ||
|
||
#[allow(bad_style)] | ||
#[allow(unreachable_patterns)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop this unreachable_patterns
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this one isn't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... the commit deleted this line locally and not on the server....? 0_o How did that happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you deleted the right one, from one block below, because there used to be three of them. The one on this block (on line 314) is actually needed. The one on line 323 was expected to be deleted, which is now gone.
Uh.... GH BUG with placement of inline comments!!!
unic/utils/tests/macro_tests.rs
Outdated
/// Required | ||
DisplayVariant { | ||
abbr => DV, | ||
display => "The one and only DISPLAY VARIANT!!!11!", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know it's a test, but !!!11!
doesn't test anything very specific, right? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just being silly 😛
unic/utils/src/macros.rs
Outdated
} | ||
|
||
$(#[$abbr_names_meta:meta])* pub mod $abbr_names:ident; | ||
$(#[$long_names_meta:meta])* pub mod $long_names:ident; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do something about these two modules. The thing it that, in the pattern, there's only one enum
, which is clear what it is. The two pub mod
lines, for anyone reading the code, or even developing, it's not clear at all that the first will be used for abbr-names, and the second for long-names. IMHO, we should enforce that in the patter.
For example, something like this:
char_property! {
pub enum MyProperty {
...
}
pub mod my_property_abbr_names for abbr_names;
pub mod my_property_long_names for long_names;
}
Rational: for
is a reserved keyword and gets highlighted accordingly. abbr_names
and long_names
go well with the names being impl for the enum, abbr_name
/long_name
, so it's easy to remember and easy to read/understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, and better than my one-off idea:
abbr_names in pub mod my_property_abbr_names
Updated OP with the current syntax + output 🎉 |
bors: r+ |
48: Char property macro 2.0 r=behnam Replaces #41. See #41 for earlier discussion. An example will show better than I can tell: ```rust char_property! { /// Represents the Unicode character /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property, /// also known as the *bidirectional character type*. /// /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types> /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values> pub enum BidiClass { /// Any strong left-to-right character /// /// ***General Scope*** /// /// LRM, most alphabetic, syllabic, Han ideographs, /// non-European or non-Arabic digits, ... LeftToRight { abbr => L, long => Left_To_Right, display => "Left-to-Right", } /// Any strong right-to-left (non-Arabic-type) character /// /// ***General Scope*** /// /// RLM, Hebrew alphabet, and related punctuation RightToLeft { abbr => R, long => Right_To_Left, display => "Right-to-Left", } /// Any strong right-to-left (Arabic-type) character /// /// ***General Scope*** /// /// ALM, Arabic, Thaana, and Syriac alphabets, /// most punctuation specific to those scripts, ... ArabicLetter { abbr => AL, long => Arabic_Letter, display => "Right-to-Left Arabic", } } } /// Abbreviated name bindings for the `BidiClass` property pub mod abbr_names for abbr; /// Name bindings for the `BidiClass` property as they appear in Unicode documentation pub mod long_names for long; ``` expands to: ```rust /// Represents the Unicode character /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property, /// also known as the *bidirectional character type*. /// /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types> /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values> #[allow(bad_style)] #[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)] pub enum BidiClass { /// Any strong left-to-right character LeftToRight, /// Any strong right-to-left (non-Arabic-type) character RightToLeft, /// Any strong right-to-left (Arabic-type) character ArabicLetter, } /// Abbreviated name bindings for the `BidiClass` property #[allow(bad_style)] pub mod abbr_names { pub use super::BidiClass::LeftToRight as L; pub use super::BidiClass::RightToLeft as R; pub use super::BidiClass::ArabicLetter as AL; } /// Name bindings for the `BidiClass` property as they appear in Unicode documentation #[allow(bad_style)] pub mod long_names { pub use super::BidiClass::LeftToRight as Left_To_Right; pub use super::BidiClass::RightToLeft as Right_To_Left; pub use super::BidiClass::ArabicLetter as Arabic_Letter; } #[allow(bad_style)] #[allow(unreachable_patterns)] impl ::std::str::FromStr for BidiClass { type Err = (); fn from_str(s: &str) -> Result<Self, Self::Err> { match s { "LeftToRight" => Ok(BidiClass::LeftToRight), "RightToLeft" => Ok(BidiClass::RightToLeft), "ArabicLetter" => Ok(BidiClass::ArabicLetter), "L" => Ok(BidiClass::LeftToRight), "R" => Ok(BidiClass::RightToLeft), "AL" => Ok(BidiClass::ArabicLetter), "Left_To_Right" => Ok(BidiClass::LeftToRight), "Right_To_Left" => Ok(BidiClass::RightToLeft), "Arabic_Letter" => Ok(BidiClass::ArabicLetter), _ => Err(()), } } } #[allow(bad_style)] #[allow(unreachable_patterns)] impl ::std::fmt::Display for BidiClass { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { match *self { BidiClass::LeftToRight => write!(f, "{}", "Left-to-Right"), BidiClass::RightToLeft => write!(f, "{}", "Right-to-Left"), BidiClass::ArabicLetter => write!(f, "{}", "Right-to-Left Arabic"), BidiClass::LeftToRight => write!(f, "{}", "Left_To_Right".replace('_', " ")), BidiClass::RightToLeft => write!(f, "{}", "Right_To_Left".replace('_', " ")), BidiClass::ArabicLetter => write!(f, "{}", "Arabic_Letter".replace('_', " ")), _ => { write!( f, "{}", match *self { BidiClass::LeftToRight => "L", BidiClass::RightToLeft => "R", BidiClass::ArabicLetter => "AL", BidiClass::LeftToRight => "LeftToRight", BidiClass::RightToLeft => "RightToLeft", BidiClass::ArabicLetter => "ArabicLetter", } ) } } } } #[allow(bad_style)] impl ::char_property::EnumeratedCharProperty for BidiClass { fn abbr_name(&self) -> &'static str { match *self { BidiClass::LeftToRight => "L", BidiClass::RightToLeft => "R", BidiClass::ArabicLetter => "AL", } } fn all_values() -> &'static [BidiClass] { const VALUES: &[BidiClass] = &[ BidiClass::LeftToRight, BidiClass::RightToLeft, BidiClass::ArabicLetter, ]; VALUES } } ``` All three of the `abbr`, `long`, and `display` properties of the enum are optional, and have sane fallbacks: `abbr_name` and `long_name` return `None` if unspecified, and `fmt::Display` will check, in order, for `display`, `long_name`, `abbr_name`, and the variant name until it finds one to use (stringified, of course). `FromStr` is defined, matching against any of the provided `abbr`, `long`, and variant name. <hr /> Important notes: - <strike>The current format uses associated consts, so it works on beta but won't work on stable until 1.20 is stable.</strike> - Consts have a slightly different meaning than `pub use` -- `pub use` aliases the type where `const` is a new object and if used in pattern matching is a `==` call and not a pattern match. - For this reason I'm actually slightly leaning towards using `pub use` even once associated consts land; they're compartmentalized (so `use Property::*` doesn't pull in 3x as many symbols as there are variants). After using the const based aliasing for a little bit, I'm inclined to like the current solution of `unic::ucd::bidi::BidiClass::*` + `unic::ucd::bidi::bidi_class::abbr_names::*`. These really should be a `pub use` and not a `const`. - Note that I still think `const` are the way to go for cases like `Canonical_Combining_Class`, though. - <strike>The current syntax could easily be adapted to use modules instead of associated consts, but was written with the associated consts so we could get a feel of how it would look with them.</strike> - The zero-or-more meta match before a enum variant conflicts with the ident match before 1.20. See rust-lang/rust#42913, rust-lang/rust#24189 - There only tests of the macro are rather thin and could be expanded. - It's a macro, so the response when you stick stuff not matching the expected pattern is cryptic at best. - The `CharProperty` trait is pretty much the lowest common denominator. It's a starting point, and we can iterate from there. - How and where do we want to make `CharProperty` a externally visible trait? Currently having it in namespace is the only way to access `abbr_name` and `long_name`. - <strike>Earlier discussion suggested putting these into `unic::utils::char_property`. Moving it would be simple, but for now it's living in the root of `unic-utils`</strike> - <strike>The crate `unic-utils` is currently in the workspace by virtue of being a dependency of `unic`, but is not in any way visible a crate depending on `unic`.</strike> - <strike>Documentation doesn't exist.</strike>
Build succeeded |
Replaces #41. See #41 for earlier discussion.
An example will show better than I can tell:
expands to:
All three of the
abbr
,long
, anddisplay
properties of the enum are optional, and have sane fallbacks:abbr_name
andlong_name
returnNone
if unspecified, andfmt::Display
will check, in order, fordisplay
,long_name
,abbr_name
, and the variant name until it finds one to use (stringified, of course).FromStr
is defined, matching against any of the providedabbr
,long
, and variant name.Important notes:
The current format uses associated consts, so it works on beta but won't work on stable until 1.20 is stable.pub use
--pub use
aliases the type whereconst
is a new object and if used in pattern matching is a==
call and not a pattern match.pub use
even once associated consts land; they're compartmentalized (souse Property::*
doesn't pull in 3x as many symbols as there are variants). After using the const based aliasing for a little bit, I'm inclined to like the current solution ofunic::ucd::bidi::BidiClass::*
+unic::ucd::bidi::bidi_class::abbr_names::*
. These really should be apub use
and not aconst
.const
are the way to go for cases likeCanonical_Combining_Class
, though.The current syntax could easily be adapted to use modules instead of associated consts, but was written with the associated consts so we could get a feel of how it would look with them.CharProperty
trait is pretty much the lowest common denominator. It's a starting point, and we can iterate from there.CharProperty
a externally visible trait? Currently having it in namespace is the only way to accessabbr_name
andlong_name
.Earlier discussion suggested putting these intounic::utils::char_property
. Moving it would be simple, but for now it's living in the root ofunic-utils
The crateunic-utils
is currently in the workspace by virtue of being a dependency ofunic
, but is not in any way visible a crate depending onunic
.Documentation doesn't exist.