Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only match a fragment specifier if it starts with certain tokens. #42913

Merged
merged 1 commit into from
Jul 11, 2017

Conversation

kennytm
Copy link
Member

@kennytm kennytm commented Jun 26, 2017

When trying to match a fragment specifier, we first predict whether the current token can be matched at all. If it cannot be matched, don't bother to push the Earley item to bb_eis. This can fix a lot of issues which otherwise requires full backtracking (#42838).

In this PR the prediction treatment is not done for :item, :stmt and :tt, but it could be expanded in the future.

Fixes #24189.
Fixes #26444.
Fixes #27832.
Fixes #34030.
Fixes #35650.
Fixes #39964.
Fixes the 4th comment in #40569.
Fixes the issue blocking #40984.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@kennytm
Copy link
Member Author

kennytm commented Jun 26, 2017

(Note that this PR will cause a diagnostic quality degradation, e.g. vec![,], previously:

error: expected expression, found `,`
  --> 1.rs:12:18
   |
12 |     let v = vec![,];
   |                  ^

After this PR:

error: no rules expected the token `,`
  --> 1.rs:12:18
   |
12 |     let v = vec![,];
   |                  ^

)

@arielb1
Copy link
Contributor

arielb1 commented Jun 27, 2017

r? @jseyfried

@arielb1 arielb1 assigned jseyfried and unassigned nikomatsakis Jun 27, 2017
@arielb1 arielb1 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 27, 2017
@kennytm kennytm force-pushed the fix-40569-ident-without-backtrack branch 2 times, most recently from 89ba783 to 685469e Compare June 27, 2017 23:12
@nikomatsakis
Copy link
Contributor

So, this is probably a good idea, but -- @LeoTestard had actually implemented a full fullback mechanism, but we backed off because of the increased potential for problems when the grammar changes. It seems likely that this approach has some the same concerns, although I guess greatly mitigated. (I'd be curious to get @LeoTestard's thoughts.)

@durka
Copy link
Contributor

durka commented Jun 28, 2017 via email

@nikomatsakis
Copy link
Contributor

Yes, that's true.

@kennytm
Copy link
Member Author

kennytm commented Jun 29, 2017

Looks like "prefix set" has been proposed before in rust-lang/rfcs#1746. I'm not sure if this PR is exactly the same as 1746 since the RFC looks much more complicated than this PR.

RFC 1746 was closed expecting matching in macro 2.0 will decouple from the actual grammar, so e.g. $x:expr will be equivalent to AFAIK [^FOLLOW(expr)]+ (such improvement has not been implemented yet). This PR is compatible with that position by viewing an :expr still as an opaque regular expression [FIRST(expr)][^FOLLOW(expr)]*. For sure macro 2.0 likely won't use bb_eis at all, making this PR unnecessary if we restrict the improvement to macro 2.0.

Fixing FIRST(expr) would still have an impact on grammar change, but this should be a much less issue than #33840 which is sensitive to the exact grammar. Maybe we need an RFC on this (which is needed for macro 2.0 anyway).

This PR is necessary for :vis to work correctly in macro 1.0 (i.e. able to parse $(#[$attr:meta])* $vis:vis struct ...:vis needs to know it cannot match '#' outside bb_eis). Otherwise :vis will need to be coupled to macro 2.0 matcher improvement and things like supporting pub(restricted) in thread_local! (#40984) and bitflags! (bitflags/bitflags#72) will need to wait a bit longer.

Copy link
Contributor

@jseyfried jseyfried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, LGTM modulo comment.

_ => false,
},
"meta" => match *token {
Token::Ident(_) => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add Token::Path here for future comparability with attribute macros 2.0 invocations (#[path::to::macro arbitrary + tokens]); these are supported today but don't (yet) work with meta.

Copy link
Member Author

@kennytm kennytm Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added ModSep and NtPath merged the arms for "meta" and "path" since they are equivalent in terms of first-set.

@kennytm kennytm force-pushed the fix-40569-ident-without-backtrack branch 4 times, most recently from b619821 to f4e5030 Compare July 6, 2017 10:31
@kennytm kennytm force-pushed the fix-40569-ident-without-backtrack branch from f4e5030 to 6008004 Compare July 7, 2017 06:19
@kennytm
Copy link
Member Author

kennytm commented Jul 7, 2017

@jseyfried Fixed build break from #40939 in 6008004. Currently ignoring the LazyTokenStream part of the Token::Interpolated.

@jseyfried
Copy link
Contributor

Thanks! @bors r+

Currently ignoring the LazyTokenStream part of the Token::Interpolated

That's fine, the LazyTokenStream only adds more information that isn't relevant to these checks anyway.

@bors
Copy link
Contributor

bors commented Jul 10, 2017

📌 Commit 6008004 has been approved by jseyfried

@bors
Copy link
Contributor

bors commented Jul 11, 2017

⌛ Testing commit 6008004 with merge 1999bfa...

bors added a commit that referenced this pull request Jul 11, 2017
…seyfried

Only match a fragment specifier the if it starts with certain tokens.

When trying to match a fragment specifier, we first predict whether the current token can be matched at all. If it cannot be matched, don't bother to push the Earley item to `bb_eis`. This can fix a lot of issues which otherwise requires full backtracking (#42838).

In this PR the prediction treatment is not done for `:item`, `:stmt` and `:tt`, but it could be expanded in the future.

Fixes #24189.
Fixes #26444.
Fixes #27832.
Fixes #34030.
Fixes #35650.
Fixes #39964.
Fixes the 4th comment in #40569.
Fixes the issue blocking #40984.
@bors
Copy link
Contributor

bors commented Jul 11, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: jseyfried
Pushing 1999bfa to master...

@bors bors merged commit 6008004 into rust-lang:master Jul 11, 2017
@durka
Copy link
Contributor

durka commented Jul 11, 2017

Relnotes? This will ship in 1.20, right? I know this is the old macro system, but it seems like a pretty big win for macro writing based on the laundry list of closed issues...

@kennytm kennytm changed the title Only match a fragment specifier the if it starts with certain tokens. Only match a fragment specifier if it starts with certain tokens. Jul 11, 2017
@kennytm kennytm deleted the fix-40569-ident-without-backtrack branch July 11, 2017 10:07
@Mark-Simulacrum Mark-Simulacrum added the relnotes Marks issues that should be documented in the release notes of the next release. label Jul 11, 2017
bors bot added a commit to open-i18n/rust-unic that referenced this pull request Aug 10, 2017
48: Char property macro 2.0 r=behnam

Replaces #41. See #41 for earlier discussion.

An example will show better than I can tell:

```rust
char_property! {
    /// Represents the Unicode character
    /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property,
    /// also known as the *bidirectional character type*.
    ///
    /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types>
    /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values>
    pub enum BidiClass {
        /// Any strong left-to-right character
        ///
        /// ***General Scope***
        ///
        /// LRM, most alphabetic, syllabic, Han ideographs,
        /// non-European or non-Arabic digits, ...
        LeftToRight {
            abbr => L,
            long => Left_To_Right,
            display => "Left-to-Right",
        }

        /// Any strong right-to-left (non-Arabic-type) character
        ///
        /// ***General Scope***
        ///
        /// RLM, Hebrew alphabet, and related punctuation
        RightToLeft {
            abbr => R,
            long => Right_To_Left,
            display => "Right-to-Left",
        }

        /// Any strong right-to-left (Arabic-type) character
        ///
        /// ***General Scope***
        ///
        /// ALM, Arabic, Thaana, and Syriac alphabets,
        /// most punctuation specific to those scripts, ...
        ArabicLetter {
            abbr => AL,
            long => Arabic_Letter,
            display => "Right-to-Left Arabic",
        }
    }
}

/// Abbreviated name bindings for the `BidiClass` property
pub mod abbr_names for abbr;
/// Name bindings for the `BidiClass` property as they appear in Unicode documentation
pub mod long_names for long;
```

expands to:

```rust
/// Represents the Unicode character
/// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property,
/// also known as the *bidirectional character type*.
///
/// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types>
/// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values>
#[allow(bad_style)]
#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
pub enum BidiClass {
    /// Any strong left-to-right character
    LeftToRight,
    /// Any strong right-to-left (non-Arabic-type) character
    RightToLeft,
    /// Any strong right-to-left (Arabic-type) character
    ArabicLetter,
}
/// Abbreviated name bindings for the `BidiClass` property
#[allow(bad_style)]
pub mod abbr_names {
    pub use super::BidiClass::LeftToRight as L;
    pub use super::BidiClass::RightToLeft as R;
    pub use super::BidiClass::ArabicLetter as AL;
}
/// Name bindings for the `BidiClass` property as they appear in Unicode documentation
#[allow(bad_style)]
pub mod long_names {
    pub use super::BidiClass::LeftToRight as Left_To_Right;
    pub use super::BidiClass::RightToLeft as Right_To_Left;
    pub use super::BidiClass::ArabicLetter as Arabic_Letter;
}
#[allow(bad_style)]
#[allow(unreachable_patterns)]
impl ::std::str::FromStr for BidiClass {
    type Err = ();
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        match s {
            "LeftToRight" => Ok(BidiClass::LeftToRight),
            "RightToLeft" => Ok(BidiClass::RightToLeft),
            "ArabicLetter" => Ok(BidiClass::ArabicLetter),
            "L" => Ok(BidiClass::LeftToRight),
            "R" => Ok(BidiClass::RightToLeft),
            "AL" => Ok(BidiClass::ArabicLetter),
            "Left_To_Right" => Ok(BidiClass::LeftToRight),
            "Right_To_Left" => Ok(BidiClass::RightToLeft),
            "Arabic_Letter" => Ok(BidiClass::ArabicLetter),
            _ => Err(()),
        }
    }
}
#[allow(bad_style)]
#[allow(unreachable_patterns)]
impl ::std::fmt::Display for BidiClass {
    fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
        match *self {
            BidiClass::LeftToRight => write!(f, "{}", "Left-to-Right"),
            BidiClass::RightToLeft => write!(f, "{}", "Right-to-Left"),
            BidiClass::ArabicLetter => write!(f, "{}", "Right-to-Left Arabic"),
            BidiClass::LeftToRight => write!(f, "{}", "Left_To_Right".replace('_', " ")),
            BidiClass::RightToLeft => write!(f, "{}", "Right_To_Left".replace('_', " ")),
            BidiClass::ArabicLetter => write!(f, "{}", "Arabic_Letter".replace('_', " ")),
            _ => {
                write!(
                    f,
                    "{}",
                    match *self {
                        BidiClass::LeftToRight => "L",
                        BidiClass::RightToLeft => "R",
                        BidiClass::ArabicLetter => "AL",
                        BidiClass::LeftToRight => "LeftToRight",
                        BidiClass::RightToLeft => "RightToLeft",
                        BidiClass::ArabicLetter => "ArabicLetter",
                    }
                )
            }
        }
    }
}
#[allow(bad_style)]
impl ::char_property::EnumeratedCharProperty for BidiClass {
    fn abbr_name(&self) -> &'static str {
        match *self {
            BidiClass::LeftToRight => "L",
            BidiClass::RightToLeft => "R",
            BidiClass::ArabicLetter => "AL",
        }
    }
    fn all_values() -> &'static [BidiClass] {
        const VALUES: &[BidiClass] = &[
            BidiClass::LeftToRight,
            BidiClass::RightToLeft,
            BidiClass::ArabicLetter,
        ];
        VALUES
    }
}
```

All three of the `abbr`, `long`, and `display` properties of the enum are optional, and have sane fallbacks: `abbr_name` and `long_name` return `None` if unspecified, and `fmt::Display` will check, in order, for `display`, `long_name`, `abbr_name`, and the variant name until it finds one to use (stringified, of course).

`FromStr` is defined, matching against any of the provided `abbr`, `long`, and variant name.

<hr />

Important notes:

- <strike>The current format uses associated consts, so it works on beta but won't work on stable until 1.20 is stable.</strike>
  - Consts have a slightly different meaning than `pub use` -- `pub use` aliases the type where `const` is a new object and if used in pattern matching is a `==` call and not a pattern match.
  - For this reason I'm actually slightly leaning towards using `pub use` even once associated consts land; they're compartmentalized (so `use Property::*` doesn't pull in 3x as many symbols as there are variants). After using the const based aliasing for a little bit, I'm inclined to like the current solution of `unic::ucd::bidi::BidiClass::*` + `unic::ucd::bidi::bidi_class::abbr_names::*`. These really should be a `pub use` and not a `const`.
  - Note that I still think `const` are the way to go for cases like `Canonical_Combining_Class`, though.
- <strike>The current syntax could easily be adapted to use modules instead of associated consts, but was written with the associated consts so we could get a feel of how it would look with them.</strike>
- The zero-or-more meta match before a enum variant conflicts with the ident match before 1.20. See rust-lang/rust#42913, rust-lang/rust#24189
- There only tests of the macro are rather thin and could be expanded.
- It's a macro, so the response when you stick stuff not matching the expected pattern is cryptic at best.
- The `CharProperty` trait is pretty much the lowest common denominator. It's a starting point, and we can iterate from there.
- How and where do we want to make `CharProperty` a externally visible trait? Currently having it in namespace is the only way to access `abbr_name` and `long_name`.
- <strike>Earlier discussion suggested putting these into `unic::utils::char_property`. Moving it would be simple, but for now it's living in the root of `unic-utils`</strike>
- <strike>The crate `unic-utils` is currently in the workspace by virtue of being a dependency of `unic`, but is not in any way visible a crate depending on `unic`.</strike>
- <strike>Documentation doesn't exist.</strike>
bors bot added a commit to gfx-rs/gfx that referenced this pull request Sep 27, 2017
1373: Support field attributes on vertex and constant structures r=kvark a=mzmonsour

Adds support for attributes on struct fields, such as doc comments. This doesn't work on stable because of a macro parser bug. Wait until rust-lang/rust#42913 is in a stable release (1.20 probably) before merging this.
hcpl added a commit to hcpl/synstructure that referenced this pull request Apr 8, 2018
I believe this is something fixed by
rust-lang/rust#42913 in Rust 1.20, but we still
need to support Rust 1.15.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
8 participants