Ignore non-semantic tokens for 'probably_eq' streams. #55971

SergioBenitez · 2018-11-15T08:38:02Z

Improves the situation in #43081 by skipping typically non-semantic tokens when checking for 'probably_eq'.

alexcrichton

Thanks for this! I left a few comments, and I'm somewhat wary of this. It's possible for the token trees to differ ever-so-slightly in pedantic fashions where I don't think this PR is correct, but I also don't think that we can actually today construct such a situation. The main use case here is handling things like previously expanded macros, #[cfg] removing items, etc. These are "major changes" to the AST rather than surgical changes, so this is likely to be correct today but I'm curious to get your take too!

alexcrichton · 2018-11-15T13:47:33Z

src/libsyntax/tokenstream.rs

+            match tree {
+                | TokenTree::Token(_, Token::Comma)
+                | TokenTree::Token(_, Token::OpenDelim(DelimToken::NoDelim))
+                | TokenTree::Token(_, Token::CloseDelim(DelimToken::NoDelim))


I'd be slightly worried about ignoring these because couldn't they affect precedence? The token streams 1 + (2 * 3) vs (1 + 2) * 3 for example (subbing parens for "no delimiter"). Does this come up in practice though for one of the tests below?

Wouldn't that be a DelimToken::Paren? I didn't filter those as I was worried about the same thing.

alexcrichton · 2018-11-15T13:48:40Z

src/libsyntax/tokenstream.rs

+                | TokenTree::Token(_, Token::OpenDelim(DelimToken::NoDelim))
+                | TokenTree::Token(_, Token::CloseDelim(DelimToken::NoDelim))
+                | TokenTree::Token(_, Token::OpenDelim(DelimToken::Brace))
+                | TokenTree::Token(_, Token::CloseDelim(DelimToken::Brace))


This is similar to the one above where it seems like it may cause weird failures, but did this come up in practice for tests below?

It has come up in practice; all of these filters were motivated by existing reports. I'm not sure, off the top of my head, where I saw this issue in practice, but I'm happy to find it if needed!

alexcrichton · 2018-11-15T14:13:46Z

src/libsyntax/tokenstream.rs

-        let mut t2 = other.trees();
+        fn semantic_tree(tree: &TokenTree) -> bool {
+            match tree {
+                | TokenTree::Token(_, Token::Comma)


Could you throw a comment here for why we're ignoring all comma tokens?

One case this may get tripped up is (x) vs (x,), though :(

This appears to be the most commonly hit span-loss in practice (see rwf2/Rocket#811, my own comment in #43081 (comment), and dtolnay/syn#482). In general, the pretty printer inserts trailing commas everywhere, and users don't. So all three of the following lose span information:

struct Foo { a: usize } let x = Foo { a: 10 }; match x { Foo(..) => 10 }

alexcrichton · 2018-11-15T14:24:52Z

src/libsyntax/tokenstream.rs

+                | TokenTree::Token(_, Token::CloseDelim(DelimToken::NoDelim))
+                | TokenTree::Token(_, Token::OpenDelim(DelimToken::Brace))
+                | TokenTree::Token(_, Token::CloseDelim(DelimToken::Brace))
+                | TokenTree::Token(_, Token::Semi)


Like above, could a comment be added here why this isn't necessary to compare?

I'm also a little uneasy about this one like commas though because it seems like we're saying that semicolons are optional in Rust, but I feel like there's something that can parse one way or another with and without a semicolon

I agree, but the thing to consider here is whether there's going to be a modification to the AST that add/removes semicolons without preserving semantics. I'd be surprised if that was the case.

alexcrichton · 2018-11-15T14:40:09Z

src/test/ui-fulldeps/proc-macro/auxiliary/span-preservation.rs

+#![feature(proc_macro_diagnostic)]
+#![feature(proc_macro_hygiene)]
+#![feature(proc_macro_quote)]
+#![feature(proc_macro_span)]


Would it be possible to write this test with only stable proc macros perhaps? I've found the best way to do that is just force tokens to round trip through a procedural macro but make sure the tokens have a compiler warning in them (like an unused mut), and that way we won't have to update this over time ideally

I did try that (round-tripping was my first attempt), but I couldn't get the tests to work as I'd expect. I can try again, however.

SergioBenitez · 2018-11-15T22:50:25Z

I'm somewhat wary of this. It's possible for the token trees to differ ever-so-slightly in pedantic fashions where I don't think this PR is correct, but I also don't think that we can actually today construct such a situation.

I'm somewhat wary of this too, but no more so than I am of the entire probably_eq ordeal.

My understanding of the need for probably_eq at all (and please do correct me if I'm wrong in any way!), is that we're trying to detect if the AST has changed in some way that invalidates the TokenStream that was stored when the original AST was parsed. If this is the case, then it seems that the correct solution to the entire problem is to know where the AST is changing and have that code invalidate the cached TokenStream directly.

Presumably this is much harder than it sounds. I, for one, don't know how to easily find all of the instances that change the AST in a way that matters for this issue, and worse, how to enforce that any future code carries out the appropriate invalidation. (Side note: To get a better understanding, I'd love it if you could point to some example in the source where this happens today.) As such, probably_eq exists to check if the TokenStream has been invalidated by generating a TokenStream from the current AST and comparing it to the stored TokenStream. Because the generated TokenStream isn't lossless, we do a probably_eq instead of simply eq.

The implementation of probably_eq already doesn't consider some important changes. For instance, the following two are considered equal: 1 + 2 and 4 + 5. This is because the Lit comparison looks only at the variant discriminant and not at the contents. The justification for this is that any invalidating modifications to the AST are unlikely to be of this kind (they're likely to be "major changes"!), or if they are of this kind, they're likely to preserve semantics. I think the same reasoning applies to ignoring tokens like ,, {, and }.

Somewhat tangentially: it seems that in an ideal world, we'd never want to return a TokenStream that was created from printing an AST node. Is that correct? In other words, there should never be a case where a TokenStream is "invalidated". Instead, it should be replaced with the "updated" TokenStream. Is that the right way to think about this, or am I misunderstanding something fundamentally?

alexcrichton · 2018-11-16T09:40:20Z

You're spot on with all the descriptions/comments here, and I had also forgotten this was "so bad" as to consider 1 + 2 equivalent to 4 + 5! Since we do that, I'd actually be comfortable landing everything here!

Presumably this is much harder than it sounds.

Yeah :(. Fundamentally none of this should be necessary. We should implement #43081 instead, basically implementing something like ToToken for all AST types. Since #43081 is such a significant undertaking, though, it hasn't been done yet.

Since that's not done the next best thing is to simply cache what we parse and reuse that. Naturally, though, mutation of the AST can happen practically anywhere, so there's no real feasible way for us to track where it's all happening (especially over time as we continue to modify the compiler).

To get a better understanding, I'd love it if you could point to some example in the source where this happens today

The only one that I can remember (and the main motivation for probably_eq to exist in the first place) was #[cfg] processing. During #[cfg] processing we can remove, for example, functions from an impl block or fields of a struct. The cached token streams, however, aren't updated to reflect this.

I can't think of any other locations that we modify the AST pre-macro-expansion, but I vaguely recall there being at least one more...

Note that as a side effect of this you're guaranteed to get bad spans for this:

#[derive(Deserialize)]
struct Foo {
    #[cfg(something)]
    a: i32,
    #[serde(malformed attribute)]
    b: u32,
}

Ok so to land this PR, I would ideally like to see two things:

First, a comment above each case in semantic_tree about why the token is being ignored. For example , definitely makes sense for trailing commas and whatnot.
Second, the test is updated to only use stable proc_macro APIs

For the second part I think you should be able to get away with this:

// src/lib.rs
#![crate_type = "proc-macro"]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_attribute]
pub fn foo(_attr: TokenStream, item: TokenStream) -> TokenStream {
    item.into_iter().collect()
}

and with

extern crate foo;

#[foo::foo]
#[allow(dead_code)]
fn main() {
    let warn_about_me = 3;

    struct A { a: i32 }
}

as the test. That (for me today) produces a crazy looking warning with no span. You should be able to just continue throwing syntax into fn main() { ... } which today causes spans to get removed, and all the examples added here the spans should get preserved.

How's that all sound?

SergioBenitez · 2018-11-17T00:53:59Z

How's that all sound?

Perfect! Thank you for expanding on the issue. Will tackle this tonight.

SergioBenitez · 2018-11-17T07:37:55Z

@alexcrichton Done.

alexcrichton · 2018-11-17T07:41:47Z

@bors: r+

bors · 2018-11-17T07:41:48Z

📌 Commit 78eb516 has been approved by alexcrichton

SergioBenitez · 2018-11-19T11:18:54Z

Any chance we could get this in a roll-up or bump the priority? This is a blocker for Rocket's 0.4 release.

petrochenkov · 2018-11-19T11:31:17Z

@bors p=1

bors · 2018-11-19T19:57:09Z

⌛ Testing commit 78eb516 with merge 5aff307...

@alexcrichton

Ignore non-semantic tokens for 'probably_eq' streams. Improves the situation in #43081 by skipping typically non-semantic tokens when checking for 'probably_eq'. r? @alexcrichton

bors · 2018-11-19T22:52:23Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 5aff307 to master...

@alexcrichton

Add unstable Literal::subspan(). Take 2 of rust-lang#55971. Still ~wrong, but now with a comment! (and less of a surface) Unblocks rust-lang#49219. r? @alexcrichton

rust-highfive assigned alexcrichton Nov 15, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 15, 2018

alexcrichton reviewed Nov 15, 2018

View reviewed changes

Ignore non-semantic tokens for 'probably_eq' streams.

78eb516

SergioBenitez force-pushed the skip-non-semantic branch from 20c00ca to 78eb516 Compare November 17, 2018 07:37

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 17, 2018

bors merged commit 78eb516 into rust-lang:master Nov 19, 2018

This was referenced Nov 21, 2018

Missing filename and line number in error rwf2/Rocket#811

Closed

Add unstable Literal::subspan(). #56120

Merged

Ignore non-semantic tokens for 'probably_eq' streams. #55971

Ignore non-semantic tokens for 'probably_eq' streams. #55971

Uh oh!

Conversation

SergioBenitez commented Nov 15, 2018

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SergioBenitez commented Nov 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Nov 16, 2018

Uh oh!

SergioBenitez commented Nov 17, 2018

Uh oh!

SergioBenitez commented Nov 17, 2018

Uh oh!

alexcrichton commented Nov 17, 2018

Uh oh!

bors commented Nov 17, 2018

Uh oh!

SergioBenitez commented Nov 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrochenkov commented Nov 19, 2018

Uh oh!

bors commented Nov 19, 2018

Uh oh!

bors commented Nov 19, 2018

Uh oh!

Uh oh!

SergioBenitez commented Nov 15, 2018 •

edited

Loading

SergioBenitez commented Nov 19, 2018 •

edited

Loading