Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent tokenization of procedural macros with macro_rules #49846

Closed
alexcrichton opened this issue Apr 10, 2018 · 1 comment
Closed

Inconsistent tokenization of procedural macros with macro_rules #49846

alexcrichton opened this issue Apr 10, 2018 · 1 comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)

Comments

@alexcrichton
Copy link
Member

Given a procedural macro like so:

// foo.rs
#![crate_type = "proc-macro"]
#![feature(proc_macro)]

extern crate proc_macro;

use proc_macro::*;

#[proc_macro_attribute]
pub fn attr(_a: TokenStream, b: TokenStream) -> TokenStream {
    println!("============== invocation =========================");
    println!("**** ToString *****\n{}", b);
    // println!("**** Debug *****\n{:?}", b);
    let b = recollect(b);
    println!("**** recollect ToString *****\n{}", b);
    // println!("**** recollect Debug *****\n{:?}", b);
    return b
}

fn recollect(a: TokenStream) -> TokenStream {
    a.into_iter()
        .map(|tt| {
            match tt {
                TokenTree::Group(tt) => {
                    let mut g = Group::new(tt.delimiter(), recollect(tt.stream()));
                    g.set_span(tt.span());
                    TokenTree::from(g)
                }
                other => other,
            }
        })
        .collect()
}

and an invocation:

// bar.rs
#![crate_type = "rlib"]
#![feature(proc_macro)]

extern crate foo;

use foo::attr;

#[attr]
pub fn bar() {}

macro_rules! a {
    ($i:item) => ($i)
}

a! {
    #[attr]
    pub fn baz() {}
}

you get the following when compiling:

$ rustc +nightly foo.rs
$ rustc +nightly bar.rs -L .
============== invocation =========================
**** ToString *****
pub fn bar() { }
**** recollect ToString *****
pub fn bar (  ) {  }
============== invocation =========================
**** ToString *****
pub fn baz() { }
**** recollect ToString *****
  # [ attr ] pub fn baz (  ) {  }  
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 1', /checkout/src/libcore/slice/mod.rs:871:14

In addition to the panic (oh dear!) we can see here that #[attr] is showing up in the tokens by accident. The attribute should have been removed during the tokenization when passing to the procedural macro!

I believe this is another instance of #43081

@alexcrichton
Copy link
Member Author

cc @dtolnay

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 10, 2018
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
kennytm added a commit to kennytm/rust that referenced this issue Apr 14, 2018
…r=nrc

proc_macro: Avoid cached TokenStream more often

This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)
Projects
None yet
Development

No branches or pull requests

1 participant