Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Token-based outer attributes handling #76130

Closed

Conversation

Aaron1011
Copy link
Member

@Aaron1011 Aaron1011 commented Aug 30, 2020

Makes progress towards #43081

We now capture tokens for attributes, and remove them from the tokenstream when applying #[cfg] attributes (in addition to modifying the AST). As a result, derive-proc-macros now receive the exact input (instead of the pretty-printed/retokenized input), even when fields/variants get #[cfg]-stripped.

Several changes are made in order to accomplish this:

  • New structs PreexpTokenStream and PreexpTokenTree are added. These are identical to TokenStream and PreexpTokenstream, with the exception of an OuterAttributes variant in PreexpTokenTree. This is used to represent captured attributes, allowing the target tokens to be removed by #[cfg]-stripping
    (and when invoking attribute macros).
  • Tokens are now attached to Attribute. This allows us to remove prepend_attributes, which required pretty-printing/retokenizing attributes in certain cases.
  • collect_tokens was rewritten. The implementation is now much simpler - instead of keeping track of nested TokenTree::Delimited at various deps, we collect all tokens into a flat buffer (e.g. we push TokenKind::OpenDelim and TokenKind::CloseDelim. After capturing, we losslessly re-package these tokens back into the normal TokenTree::Delimited structure.
  • We now store a PreexpTokenStream on AST s structs instead of a plain TokenStream. This contains both the attributes and the target itself.
  • parse_outer_attributes now passes the parsed attributes to a closure, instead of simply returning them. The closure is responsible for parsing the attribute target, and allows us to automatically construct the proper PreexpTokenStream.
  • HasAttrs now has a visit_tokens method. This is used during #[cfg]-stripping to allow us to remove attribute targets from the PreexpTokenStream.

This PR is quite large (though ~1000 lines are tests). I opened it mainly to show how all of the individual pieces fit together, since some of them (e.g. the parse_outer_attributes change) don't make sense if we're not going to land this PR. However, many pieces are mergeable on their own - I plan to split them into their own PRs.

TODO:

  • Any errors for malformed #[cfg]/#[cfg_attr] are duplicated. This is because we run the same code for both the AST-based attributes and the token-based attributes. We could possibly skip validating the token-based attributes, but I was worried about accidentally skipping gating if the pretty-print/reparse check ever fails. These are deduplicated by the diagnostic infrastructure outside of compiletest, but it would be nice to fix this.
  • Inner attributes are partially handled - we store them in the PreexpTokenStream alongside outer attributes, which allows us to handle #![cfg(FALSE)] and remove its parent. This should be good enough to handle all stable uses of inner attributes that are observable by proc-macros (currently, just cfg-stripping before #[derive] macros are invoked). Custom inner attributes are not handled.
  • Parser::parse_stmt_without_recovery does not eat a trailing semicolon, which means we will not capture it in the tokenstream. I need to either refactor statement parsing to eat the semicolon inside the parse_outer_attributes_with_tokens, or manually adjust the captured tokens.
  • I currently only check for the presence/absence of attributes when determining whether to run token-collection logic. If this causes performance problems, we could look for specific attributes (cfg, cfg_attr, derive, or any non-builtin attribute). There are many scenarios where we can skip token collection for some/all ast structs (no derive on an an item, no cfg inside an item, no custom attributes on an item, etc.)

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 30, 2020
@Aaron1011
Copy link
Member Author

r? @petrochenkov

Aaron1011 added a commit to Aaron1011/rust that referenced this pull request Aug 30, 2020
In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant
somewhat unwieldy.
@bors

This comment has been minimized.

tmandry added a commit to tmandry/rust that referenced this pull request Sep 2, 2020
…nkov

Factor out StmtKind::MacCall fields into `MacCallStmt` struct

In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant
somewhat unwieldy.
@Aaron1011
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 4, 2020

⌛ Trying commit 083792bcf67d1e50afc9ad50500ab011d34b0bee with merge 30ddab741f8c02392c4383445436138f5a6ec53d...

@bors
Copy link
Contributor

bors commented Sep 4, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 30ddab741f8c02392c4383445436138f5a6ec53d (30ddab741f8c02392c4383445436138f5a6ec53d)

@rust-timer
Copy link
Collaborator

Queued 30ddab741f8c02392c4383445436138f5a6ec53d with parent 80cacd7, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (30ddab741f8c02392c4383445436138f5a6ec53d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@Aaron1011
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 4, 2020

⌛ Trying commit 23fd9fbd9d07096ca254592013a42b4cae12ba29 with merge b3535fba76a3fc7202284b7699afff8f8831e461...

@bors
Copy link
Contributor

bors commented Sep 4, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: b3535fba76a3fc7202284b7699afff8f8831e461 (b3535fba76a3fc7202284b7699afff8f8831e461)

@rust-timer
Copy link
Collaborator

Queued b3535fba76a3fc7202284b7699afff8f8831e461 with parent 42d896a, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (b3535fba76a3fc7202284b7699afff8f8831e461): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@petrochenkov
Copy link
Contributor

@Aaron1011
Apologies, I spent all the weekend time on testing LLD, so I didn't have chance to look a this.
I'll try to do it during the week.

Aaron1011 added a commit to Aaron1011/rust that referenced this pull request Oct 11, 2020
Fixes rust-lang#74616
Makes progress towards rust-lang#43081
Unblocks PR rust-lang#76130

When pretty-printing an AST node, we may insert additional parenthesis
to ensure that precedence is properly preserved in code we output.
However, the proc macro implementation relies on comparing a
pretty-printed AST node to the captured `TokenStream`. Inserting extra
parenthesis changes the structure of the reparsed `TokenStream`, making
the comparison fail.

This PR refactors the AST pretty-printing code to allow skipping the
insertion of additional parenthesis. Several freestanding methods are
moved to trait methods on `PrintState`, which keep track of an internal
`insert_extra_parens` flag. This flag is normally `true`, but we expose
a public method which allows pretty-printing a nonterminal with
`insert_extra_parens = false`.

To avoid changing the public interface of `rustc_ast_pretty`, the
freestanding `_to_string` methods are changed to delegate to a
newly-crated `State`. The main pretty-printing code is moved to a new
`state` module to ensure that it does not accidentally call any of these
public helper functions (instead, the internal functions with the same
name should be used).
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 14, 2020
…ochenkov

Refactor AST pretty-printing to allow skipping insertion of extra parens

Fixes rust-lang#75734
Makes progress towards rust-lang#43081
Unblocks PR rust-lang#76130

When pretty-printing an AST node, we may insert additional parenthesis
to ensure that precedence is properly preserved in code we output.
However, the proc macro implementation relies on comparing a
pretty-printed AST node to the captured `TokenStream`. Inserting extra
parenthesis changes the structure of the reparsed `TokenStream`, making
the comparison fail.

This PR refactors the AST pretty-printing code to allow skipping the
insertion of additional parenthesis. Several freestanding methods are
moved to trait methods on `PrintState`, which keep track of an internal
`insert_extra_parens` flag. This flag is normally `true`, but we expose
a public method which allows pretty-printing a nonterminal with
`insert_extra_parens = false`.

To avoid changing the public interface of `rustc_ast_pretty`, the
freestanding `_to_string` methods are changed to delegate to a
newly-crated `State`. The main pretty-printing code is moved to a new
`state` module to ensure that it does not accidentally call any of these
public helper functions (instead, the internal functions with the same
name should be used).
@camelid camelid added the A-attributes Area: Attributes (`#[…]`, `#![…]`) label Oct 16, 2020
@Dylan-DPC-zz
Copy link

Blocked on #77250

@Dylan-DPC-zz Dylan-DPC-zz added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 16, 2020
Aaron1011 added a commit to Aaron1011/rust that referenced this pull request Oct 19, 2020
Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.

The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.

This implementation has a number of advantages over the previous one:

* It is significantly simpler, with no edge cases around capturing the
  start/end of a delimited group.

* It can be easily extended to allow replacing tokens an an arbitrary
  'depth' by just using `Vec::splice` at the proper position. This is
  important for PR rust-lang#76130, which requires us to track information about
  attributes along with tokens.

* The lazy approach to `TokenStream` construction allows us to easily
  parse an AST struct, and then decide after the fact whether we need a
  `TokenStream`. This will be useful when we start collecting tokens for
  `Attribute` - we can discard the `LazyTokenStream` if the parsed
  attribute doesn't need tokens (e.g. is a builtin attribute).

The performance impact seems to be neglibile (see
rust-lang#77250 (comment)). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 21, 2020
…on, r=petrochenkov

Rewrite `collect_tokens` implementations to use a flattened buffer

Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.

The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.

This implementation has a number of advantages over the previous one:

* It is significantly simpler, with no edge cases around capturing the
  start/end of a delimited group.

* It can be easily extended to allow replacing tokens an an arbitrary
  'depth' by just using `Vec::splice` at the proper position. This is
  important for PR rust-lang#76130, which requires us to track information about
  attributes along with tokens.

* The lazy approach to `TokenStream` construction allows us to easily
  parse an AST struct, and then decide after the fact whether we need a
  `TokenStream`. This will be useful when we start collecting tokens for
  `Attribute` - we can discard the `LazyTokenStream` if the parsed
  attribute doesn't need tokens (e.g. is a builtin attribute).

The performance impact seems to be neglibile (see
rust-lang#77250 (comment)). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels Oct 25, 2020
@crlf0710
Copy link
Member

Triage: A reminder that #77250 has been merged, so it seems maybe this can continue...

@Aaron1011
Copy link
Member Author

This is currently blocked on further work on statement attributes

@jyn514 jyn514 added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-proc-macros Area: Procedural macros T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) labels Nov 30, 2020
@Dylan-DPC-zz Dylan-DPC-zz added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 3, 2020
@Aaron1011
Copy link
Member Author

Closing in favor of #80689

@Aaron1011 Aaron1011 closed this Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-attributes Area: Attributes (`#[…]`, `#![…]`) A-proc-macros Area: Procedural macros S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.