Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type ascription syntax can subtly change the behaviour of macros #30531

Closed
jonas-schievink opened this issue Dec 22, 2015 · 29 comments
Closed

Type ascription syntax can subtly change the behaviour of macros #30531

jonas-schievink opened this issue Dec 22, 2015 · 29 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. relnotes Marks issues that should be documented in the release notes of the next release. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@jonas-schievink
Copy link
Contributor

In the wild: phsym/prettytable-rs#11

This program prints "1" on stable and beta, but "0" on nightly:

macro_rules! cell {
    ($value:expr) => { 0 };
    ($style:ident : $value:expr) => { 1 };
}

fn main() {
    println!("{}", cell!(t : T));
}

Originally reported by @hoodie

@petrochenkov
Copy link
Contributor

If I'm not mistaken, the macro future proofing RFC should make macros like this illegal?
cc @pnkfelix

@bluss bluss added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) labels Dec 23, 2015
@pnkfelix
Copy link
Member

Well honestly I'm not sure; that RFC was tightening up rules for a single rewrite rule; I'm not sure how a we handle/check interactions between multiple rules. ..

@durka
Copy link
Contributor

durka commented Dec 23, 2015

Even with the future-proofing, FOLLOW(ident) is "any token" so this would not be prevented. To fix this : would need to be removed from FOLLOW(ident) (and FOLLOW(block) as well), but that would be another breaking change :( . Of course type ascription is also a breaking change as evidenced by this issue, so it's one against the other. At least perhaps a lint could be added to catch this situation where type ascription influences macro expansion.

@pnkfelix
Copy link
Member

I think removing : from FOLLOW(ident) is a bit stronger than necessary. Or at least, maybe we should be thinking more about validating the set of rewrite rules as a whole set of composed alternatives, rather than thinking about them in isolation. (This would likely still introduce breaking change, but hopefully not as significant as throwing more things into the follow set would be.)

@durka
Copy link
Contributor

durka commented Dec 25, 2015

I guess another (distasteful) option is changing the type ascription syntax so that it would be covered by the extant future-proofing. For example allowing expr-but-not-ident, so ident: ty is an error but (ident): ty is ascripted. [edit: would still break a macro with a pattern like ( $e:expr ): $t:ty but that should be rare and is much harder to prevent]

@hoodie
Copy link
Contributor

hoodie commented Dec 25, 2015

Is there a way to know that the second part describes a type?

@durka
Copy link
Contributor

durka commented Dec 25, 2015

@hoodie what do you mean? Macros are expanded before name resolution and type checking so AFAIK they can only really look at syntax.

@hoodie
Copy link
Contributor

hoodie commented Dec 25, 2015

Yes you are right, I thought of that too right after I posted. This just means that the colon character is to be avoided in macros.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 6, 2016

validating the set of rewrite rules as a whole set of composed alternatives

To be clear, what I meant by this is that we might be able to walk across all of the rules in parallel, maintaining multiple cursors into the rules, rather than separately with a single cursor, and have the FOLLOW set for each token depend upon where all the cursors are located. Right now the FOLLOW(ident) is the whole universe of tokens. But what I'm thinking is that if you have one cursor pointing at an expr and another at an ident (representing the idea that the macro expander could eventually be trying to decide which rewrite arm to use), then the FOLLOW for that ident would be much more restricted, perhaps e.g. FOLLOW(ident) = FOLLOW(expr) in that context (or maybe it would need to be even more restricted, I'm not sure).

@pnkfelix
Copy link
Member

pnkfelix commented Jan 6, 2016

In the meantime, its possible that a workaround could be to just reorder the rewrite arms in the macro definition itself so that the case with expr comes after the case with ident : ...

@DanielKeep
Copy link
Contributor

@pnkfelix I'm afraid macro_rules! is too dumb for that to work:

macro_rules! cell {
    ($style:ident : $value:expr) => { 1 };
    ($value:expr) => { 0 };
}

fn main() {
    println!("{}", cell!(t : T));
    println!("{}", cell!(42));
}

produces the following on stable:

<anon>:8:26: 8:28 error: expected ident, found 42
<anon>:8     println!("{}", cell!(42));
                                  ^~

In cases like this, expr always has to come before ident, or it just won't work.

@brson brson added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jan 6, 2016
@nikomatsakis
Copy link
Contributor

So, @brson and I were just talking about this. I think my preferred solution is to fix the "macro future proofing RFC", since it seems obvious that this is basically a bug in that RFC and it is not achieving its aims. However, the precise best way to do that -- and the fallout from doing so -- is unknown.

In the meantime, perhaps the most prudent course of action is to either disable parsing of type ascription or else to require that a type ascription expression occurs within parentheses ((<expr>: T)). The latter would be better but harder. That way, we can avoid breakage while we have this discussion.

Longer term, it seems likely but not certain that this crate will eventually break, but hopefully that is because we tighten the macro future-proofing RFC, and not as an unintentional side-effect of adding a new expression form. Alternatively, if we decide we cannot fix the macro future-proofing RFC, we will have to be very careful about adding new expression forms, and probably wind up limiting ourselves to forms that require parentheses or are otherwise unambiguous.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 6, 2016

ah fun, we get to fix future proofing a second time. :)

@DanielKeep
Copy link
Contributor

I personally think the best fix would be to solve the "macro_rules! can't back out of a rule" problem, which should allow the work-around of "put the most specific form first".

@durka
Copy link
Contributor

durka commented Jan 7, 2016

@DanielKeep +1

@nikomatsakis
Copy link
Contributor

@DanielKeep that seems like a good idea but it doesn't fix the fact that we would be breaking the code until the workaround is applied. That said, it may be that if we fix macro-rules future-proofing, we also have to fix that problem, so that at least a workaround exists.

@petrochenkov
Copy link
Contributor

Operator ? from upcoming rust-lang/rfcs#243 is going to cause the same problem.

macro_rules! cell {
    ($value:expr) => { 0 };
    ($style:ident ?) => { 1 };
}

fn main() {
    println!("{}", cell!(t?));
}

Requiring ? to occur within parentheses would be largely harmful for its purpose. (The same apply to type ascription, maybe to a less degree.)
IMO, it would be more beneficial to solve this problem by restricting macros and not by damaging future syntaxes.

@durka
Copy link
Contributor

durka commented Jan 8, 2016

That should count against the possible syntax, IMO (not instantly disqualify it, but count against it), as this issue should count against the current formulation of type ascription syntax. Macro future-proofing becomes a bit of a sham if you can't future-proof your macro against future changes to future-proofing.

@nikomatsakis
Copy link
Contributor

We discussed this issue in yesterday's @rust-lang/lang meeting. The rough consensus was that we should not revert the type ascription change (nor other future extensions to the expression grammar). The bug here is that these macro definitions were accepted in the first place, basically, and we hope soon to revise the macro rules to correct that.

For some background: when we released 1.0, we set some parameters for what sorts of changes we could make to the Rust grammar. In particular, for things like expressions and types, which can be referenced from macro definitions, we established (in RFC 550) a set of legal "follow" tokens that must never be added to the grammar. For example, the follow set for an expression was defined as =>, ,, and ;. These sets were derived by looking at the existing Rust grammar and finding expression terminators we were already committed to: for example, expressions can already be followed by , in a tuple expression or function call. Under those rules, the change to implement type ascription is legal.

RFC 550 also defined a static analysis that was intended to reject macro definitions which might be affected by (legal) changes to the expression grammar. For example, that analysis would reject a macro arm such as $e:expr : $t:ty because, there, $e is followed by :, which is not in the follow set for an expression. This means that if we were to add something like type ascription, the macro would break, and we did not want that to happen. However, as this issue demonstrates, that static anlysis contains a "soundness bug" when it comes to multiple macro arms.

In the medium to long term, we need to patch RFC 550 to correctly account for multiple macro arms. In the short term, however, macro authors will have to look at their own macros to check whether they might be broken by changes to the extension grammar.

We considered reverting the addition of type ascription but decided against it. Basically, once we fix RFC 550, those crates that are affected by type ascription are going to break anyway. On balance, it seems better to regress them now and gain type ascription (and the ? operator, etc) than to wait and regress them later. That said, it does imply that we need to be careful about phasing in new syntax -- warning periods may be required to help people adapt to changing macro definitions. In this case, the fallout seems to be relatively limited, but it's something to be aware of going forward. (We considered going back and trying to add a warning period for this case, but it is fairly non-trivial to implement, and given the limited fallout doesn't seem justified, though I at least would welcome such a PR if someone did do it.)

@durka
Copy link
Contributor

durka commented Jan 8, 2016

The "soundness bug" does not apply only to multiple arms. The trouble is that ident is a subset of expr, but ident's follow set is unrestricted.

@durka
Copy link
Contributor

durka commented Jan 8, 2016

If the intention is that macro authors must pay attention to the spirit of RFC 550, rather than its actual implementation, this should be publicized widely.

@ticki
Copy link
Contributor

ticki commented Jan 8, 2016

It has broken librand too, for the record.

@durka
Copy link
Contributor

durka commented Jan 8, 2016

@ticki huh? Which macros in rand?

@nikomatsakis
Copy link
Contributor

@durka

The "soundness bug" does not apply only to multiple arms. The trouble is that ident is a subset of expr, but ident's follow set is unrestricted.

I believe that only matters if there are multiple arms in play. Do you have an example that would suggest otherwise?

@durka
Copy link
Contributor

durka commented Jan 8, 2016

@nikomatsakis hmm I see what you are saying. I can't come up with an example where #![feature(type_ascription)] changes the behavior of a one-arm macro (besides causing it to accept more invocations, of course), though there may be something I didn't think of. But I'm still not sure whether to blame the hole on arm interaction or not.

@brson
Copy link
Contributor

brson commented Jan 11, 2016

@ticki I wasn't able to reproduce rand breakage, building rand commit 8feb9df678990f91c39759f62c05f5ad5a6a78c8 using rust commit d70ab2b.

@ticki
Copy link
Contributor

ticki commented Jan 12, 2016

It was fixed a couple of days ago in the upstream rust.

@durka
Copy link
Contributor

durka commented Jan 12, 2016

I think @ticki refers to #30713, not this issue.

@nikomatsakis
Copy link
Contributor

Closing as "won't fix" per #30531 (comment)

@nikomatsakis nikomatsakis added the relnotes Marks issues that should be documented in the release notes of the next release. label Jan 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. relnotes Marks issues that should be documented in the release notes of the next release. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants