-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Ignore non-semantic tokens for 'probably_eq' streams. #55971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I left a few comments, and I'm somewhat wary of this. It's possible for the token trees to differ ever-so-slightly in pedantic fashions where I don't think this PR is correct, but I also don't think that we can actually today construct such a situation. The main use case here is handling things like previously expanded macros, #[cfg]
removing items, etc. These are "major changes" to the AST rather than surgical changes, so this is likely to be correct today but I'm curious to get your take too!
match tree { | ||
| TokenTree::Token(_, Token::Comma) | ||
| TokenTree::Token(_, Token::OpenDelim(DelimToken::NoDelim)) | ||
| TokenTree::Token(_, Token::CloseDelim(DelimToken::NoDelim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be slightly worried about ignoring these because couldn't they affect precedence? The token streams 1 + (2 * 3)
vs (1 + 2) * 3
for example (subbing parens for "no delimiter"). Does this come up in practice though for one of the tests below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that be a DelimToken::Paren
? I didn't filter those as I was worried about the same thing.
src/libsyntax/tokenstream.rs
Outdated
| TokenTree::Token(_, Token::OpenDelim(DelimToken::NoDelim)) | ||
| TokenTree::Token(_, Token::CloseDelim(DelimToken::NoDelim)) | ||
| TokenTree::Token(_, Token::OpenDelim(DelimToken::Brace)) | ||
| TokenTree::Token(_, Token::CloseDelim(DelimToken::Brace)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the one above where it seems like it may cause weird failures, but did this come up in practice for tests below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has come up in practice; all of these filters were motivated by existing reports. I'm not sure, off the top of my head, where I saw this issue in practice, but I'm happy to find it if needed!
let mut t2 = other.trees(); | ||
fn semantic_tree(tree: &TokenTree) -> bool { | ||
match tree { | ||
| TokenTree::Token(_, Token::Comma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you throw a comment here for why we're ignoring all comma tokens?
One case this may get tripped up is (x)
vs (x,)
, though :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be the most commonly hit span-loss in practice (see rwf2/Rocket#811, my own comment in #43081 (comment), and dtolnay/syn#482). In general, the pretty printer inserts trailing commas everywhere, and users don't. So all three of the following lose span information:
struct Foo { a: usize }
let x = Foo { a: 10 };
match x { Foo(..) => 10 }
| TokenTree::Token(_, Token::CloseDelim(DelimToken::NoDelim)) | ||
| TokenTree::Token(_, Token::OpenDelim(DelimToken::Brace)) | ||
| TokenTree::Token(_, Token::CloseDelim(DelimToken::Brace)) | ||
| TokenTree::Token(_, Token::Semi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, could a comment be added here why this isn't necessary to compare?
I'm also a little uneasy about this one like commas though because it seems like we're saying that semicolons are optional in Rust, but I feel like there's something that can parse one way or another with and without a semicolon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but the thing to consider here is whether there's going to be a modification to the AST that add/removes semicolons without preserving semantics. I'd be surprised if that was the case.
#![feature(proc_macro_diagnostic)] | ||
#![feature(proc_macro_hygiene)] | ||
#![feature(proc_macro_quote)] | ||
#![feature(proc_macro_span)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to write this test with only stable proc macros perhaps? I've found the best way to do that is just force tokens to round trip through a procedural macro but make sure the tokens have a compiler warning in them (like an unused mut
), and that way we won't have to update this over time ideally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try that (round-tripping was my first attempt), but I couldn't get the tests to work as I'd expect. I can try again, however.
I'm somewhat wary of this too, but no more so than I am of the entire My understanding of the need for Presumably this is much harder than it sounds. I, for one, don't know how to easily find all of the instances that change the AST in a way that matters for this issue, and worse, how to enforce that any future code carries out the appropriate invalidation. (Side note: To get a better understanding, I'd love it if you could point to some example in the source where this happens today.) As such, The implementation of Somewhat tangentially: it seems that in an ideal world, we'd never want to return a |
You're spot on with all the descriptions/comments here, and I had also forgotten this was "so bad" as to consider
Yeah :(. Fundamentally none of this should be necessary. We should implement #43081 instead, basically implementing something like Since that's not done the next best thing is to simply cache what we parse and reuse that. Naturally, though, mutation of the AST can happen practically anywhere, so there's no real feasible way for us to track where it's all happening (especially over time as we continue to modify the compiler).
The only one that I can remember (and the main motivation for I can't think of any other locations that we modify the AST pre-macro-expansion, but I vaguely recall there being at least one more... Note that as a side effect of this you're guaranteed to get bad spans for this: #[derive(Deserialize)]
struct Foo {
#[cfg(something)]
a: i32,
#[serde(malformed attribute)]
b: u32,
} Ok so to land this PR, I would ideally like to see two things:
For the second part I think you should be able to get away with this: // src/lib.rs
#![crate_type = "proc-macro"]
extern crate proc_macro;
use proc_macro::TokenStream;
#[proc_macro_attribute]
pub fn foo(_attr: TokenStream, item: TokenStream) -> TokenStream {
item.into_iter().collect()
} and with extern crate foo;
#[foo::foo]
#[allow(dead_code)]
fn main() {
let warn_about_me = 3;
struct A { a: i32 }
} as the test. That (for me today) produces a crazy looking warning with no span. You should be able to just continue throwing syntax into How's that all sound? |
Perfect! Thank you for expanding on the issue. Will tackle this tonight. |
20c00ca
to
78eb516
Compare
@alexcrichton Done. |
@bors: r+ |
📌 Commit 78eb516 has been approved by |
Any chance we could get this in a roll-up or bump the priority? This is a blocker for Rocket's 0.4 release. |
@bors p=1 |
Ignore non-semantic tokens for 'probably_eq' streams. Improves the situation in #43081 by skipping typically non-semantic tokens when checking for 'probably_eq'. r? @alexcrichton
☀️ Test successful - status-appveyor, status-travis |
Add unstable Literal::subspan(). Take 2 of rust-lang#55971. Still ~wrong, but now with a comment! (and less of a surface) Unblocks rust-lang#49219. r? @alexcrichton
Improves the situation in #43081 by skipping typically non-semantic tokens when checking for 'probably_eq'.
r? @alexcrichton