Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro matchers only match when they feel like it #27832

Closed
jonas-schievink opened this issue Aug 14, 2015 · 11 comments · Fixed by #42913
Closed

Macro matchers only match when they feel like it #27832

jonas-schievink opened this issue Aug 14, 2015 · 11 comments · Fixed by #42913
Assignees
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..)

Comments

@jonas-schievink
Copy link
Contributor

...or at least that's how much I currently understand, since macros are really counterintuitive sometimes.

macro_rules! m {
    ( $i:ident ) => ();
    ( $t:tt $j:tt ) => ();
}

fn main() {
    m!(c);
    m!(t 9);  // why does this work, but not the next case?

    m!(0 9);
    // ^ error: expected ident, found 0
}

I don't think this is supposed to happen. Even if it is, the exact rules used for macro matching should definitely be documented somewhere (I think there's not even an RFC).

@jdm jdm added the A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) label Aug 14, 2015
@durka
Copy link
Contributor

durka commented Aug 14, 2015

It seems like the ident parser in particular is very greedy, or something. For example, changing it from ident to expr eliminates the error (as does switching the order of the rules, but I assume this is reduced from something where you can't do that).

However, fixing this (if it can be fixed) is probably a breaking change :(

@jonas-schievink
Copy link
Contributor Author

Found the culprit: https://github.com/rust-lang/rust/blob/master/src/libsyntax/ext/tt/macro_parser.rs#L514-L522

// this could be handled like a token, since it is one

Do you really think this would break something? I actually don't think it would, since a fix for this should only accept more macro invocations.

@jonas-schievink
Copy link
Contributor Author

On second thought... I think that's not the direct cause, since all other fragments behave roughly the same.

Oh well, back to the drawing board

@jonas-schievink
Copy link
Contributor Author

Okay, so as it turns out, all NT parsers introduce this bug (except tt, which you can't really test). In this particular case, changing from ident to expr only worked because idents and integral literals are both valid expressions. For example, this doesn't work:

macro_rules! m {
    ( $b:expr ) => ();
    ( $t:tt $u:tt ) => ();
}

fn main() {
    m!(3);      // works trivially
    m!(1 2);    // works, since `1` is a valid expression
    m!(_ 1);    // doesn't work, since `_` is not an expression (but a valid TT, of course)
}

Now that this confusion is out of the way, I think I see what happens: When parsing _ as an expression, libsyntax panics because of a syntax error, which aborts compilation (the macro docs state that the parser fully commits to parsing such a nonterminal, so the fact that the third invocation doesn't work is expected behaviour).

However, when parsing 1, it works. But since the 2 token wasn't eaten, the second arm is tried (this is the bug!). It matches, of course, and the macro is accepted.

So, fixing this would indeed be a breaking change, since the intended behaviour is (at least as far as I know), to reject the invocation, but this isn't happening.

@DanielKeep
Copy link
Contributor

@jonas-schievink I disagree that attempting the second arm is a bug. Ideally, macro_rules! should attempt each rule until it finds one that matches. From my perspective, the bug is that macro_rules! has no way to recover from failed parse attempts.

Ideally, your last example should go something like this (using ^ to represent cursor positions):

  • ( 3 )
    • (^3 ) ; (^$b:expr )$b = 3
    • ( 3^) ; ( $b:expr^) → matched
  • ( 1 2 )
    • (^1 2 ) ; (^$b:expr )$b = 1
    • ( 1^2 ) ; ( $b:expr^) → input too long; next rule
    • (^1 2 ) ; (^$t:tt $u:tt )$t = 1
    • ( 1^2 ) ; ( $t:tt^$u:tt )$u = 2
    • ( 1 2^) ; ( $t:tt $u:tt^) → matched
  • ( _ 1 )
    • (^_ 1 ) ; (^$b:expr ) → syntax error; next rule.
    • (^_ 1 ) ; (^$t:tt $u:tt )$t = _
    • ( _^1 ) ; ( $t:tt^$u:tt )$u = 1
    • ( _ 1^) ; ( $t:tt $u:tt^) → matched

@jonas-schievink
Copy link
Contributor Author

Ideally, macro_rules! should attempt each rule until it finds one that matches.

That would indeed be useful! But I think this comment implies that that isn't the intended behaviour:

//! This is an Earley-like parser, without support for in-grammar nonterminals,
//! only by calling out to the main rust parser for named nonterminals (which it
//! commits to fully when it hits one in a grammar). This means that there are no
(stating that the macro parser fully commits to NTs implies to me that it doesn't backtrack to try other arms)

@DanielKeep
Copy link
Contributor

@jonas-schievink I believe that's referring to how it parses within a rule. Earley parsers can deal with ambiguities by tracking multiple potential parse forests (if I remember correctly; my understanding is a little vague). What it's saying is that it has to commit to parsing a non-terminal (i.e. higher-level productions like expressions) because the parser doesn't have any way to back out of a partial parse. So when it encounters one, it has to parse it, come hell or high water.

Having the macro system not check successive rules once a rule starts matching would be apocalyptic: it would kill damn near every useful, non-trivial macro. We're talking mass hysteria, cats and dogs living together.

@jonas-schievink
Copy link
Contributor Author

I believe that's referring to how it parses within a rule.

Fair enough. In that case the bug is just that the macro expander will panic when it can't parse an NT, so it can't backtrack.

I also managed to dig up #3232, which was closed as "not a bug", but this definitely feels like one.

@dylanede
Copy link
Contributor

@jonas-schievink Sorry to dig this up again, but isn't an ident a terminal, so your comments about NTs aren't applicable to this bug?

@jonas-schievink
Copy link
Contributor Author

@dylanede Correct! That's why I mentioned this comment:

// this could be handled like a token, since it is one

(AFAIK, token == terminal)

@jeberger
Copy link

I got bitten by this bug too. Note that I see two different issues here:

  • The parser accepts m!(1 2), which according to @jonas-schievink is a bug. Indeed, according to the docs I would have expected the parser to refuse, but I can live with it either way and I can see where fixing the issue would cause code to break.

  • The parser refuses m!(_ 1), which should be a no-brainer: _ is not an ident, so the first rule should not even begin to match and no backtracking is required to move on to the next rule. Moreover I don't see where fixing this would break anything, since it would only allow code that currently does not compile (although I have no idea how hard fixing this may turn out to be).

kennytm added a commit to kennytm/rust that referenced this issue Jul 7, 2017
bors added a commit that referenced this issue Jul 11, 2017
…seyfried

Only match a fragment specifier the if it starts with certain tokens.

When trying to match a fragment specifier, we first predict whether the current token can be matched at all. If it cannot be matched, don't bother to push the Earley item to `bb_eis`. This can fix a lot of issues which otherwise requires full backtracking (#42838).

In this PR the prediction treatment is not done for `:item`, `:stmt` and `:tt`, but it could be expanded in the future.

Fixes #24189.
Fixes #26444.
Fixes #27832.
Fixes #34030.
Fixes #35650.
Fixes #39964.
Fixes the 4th comment in #40569.
Fixes the issue blocking #40984.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants