Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: Attribute refactor #131229

Open
6 tasks done
jdonszelmann opened this issue Oct 4, 2024 · 5 comments
Open
6 tasks done

Tracking issue: Attribute refactor #131229

jdonszelmann opened this issue Oct 4, 2024 · 5 comments
Assignees
Labels
A-ast Area: AST A-attributes Area: Attributes (`#[…]`, `#![…]`) C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jdonszelmann
Copy link
Contributor

jdonszelmann commented Oct 4, 2024

While working on #125418 with @m-ou-se, I've interacted quite a bit with attributes in the compiler. I've got some thoughts about the way they currently work. I'm posting this as a mix between an explanation of the status quo and why I think that's an issue, in addition to also serving as a kind of tracking issue for these changes if I've convinced you that this is a problem.

Quick Overview

From the ground up: There are several syntaxes for macros, one of those syntaxes is attributes which can have several forms. Attributes can be expanded, either as a user defined attribute macro, or as an "active" built in attribute like #[test]. However, some attributes are kept around for the entire compilation lifecycle.

These built-in attributes are never expanded. Instead, they are kept around and serve as markers or metadata to guide the compilation process at various stages. There are currently around 100 of these.

The problem

While most of what is parsed, is later lowered during [`rustc_ast_lowering`], attributes are not, mostly.

Many crates under compiler/ depend on rustc_ast just to use ast::Attribute. Let's see what that means:

Partial lowering and impossible states

One part of attributes actually is lowered, attributes of the form #[key = "value"] aka MetaNameValueStr. To be able to do that, the ast contains an enum AttrArgsEq that already has a variant for when eventually it is lowered:

pub enum AttrArgsEq {
Ast(P<Expr>),
Hir(MetaItemLit),
}

For one part of the compilation process, the Ast variant is always active and Hir is completely unused, while later in the compiler the reverse is true. In some places people didn't realize this and they provided implementations for both cases while only one could occur,
while in other places they are marked as unreachable, like here:

unreachable!("in literal form when walking mac args eq: {:?}", lit)

Another case of partial lowering is the tokens field:

Which is later extensively defended against, making sure this really happened:

debug_assert!(!attr.ident().is_some_and(|ident| self.is_ignored_attr(ident.name)));
debug_assert!(!attr.is_doc_comment());
let ast::Attribute { kind, id: _, style, span } = attr;
if let ast::AttrKind::Normal(normal) = kind {
normal.item.hash_stable(self, hasher);
style.hash_stable(self, hasher);
span.hash_stable(self, hasher);
assert_matches!(
normal.tokens.as_ref(),
None,
"Tokens should have been removed during lowering!"
);
} else {

Parse, don't validate.

I'm a big fan of the blog post Parse, don't validate. Generally rust's type system makes this pattern the most obvious thing to do and it's what I teach my university students every year. However, that is exactly what we aren't doing with attributes. In rustc_passes/check_attr.rs we first validate extensively, and emit various diagnostics. However, every single attribute is later parsed again where it is needed. I started making a small overview, but 100 attributes is a lot

Image

But basically, of the first 19 attributes I looked at, 5 are Word attributes and trivial, a few are parsed together, but in total I've found 11 completely distinct and custom parsing logics, not reusing any parts, spread over as many files and compiler crates.

I lied a little there, the parsing does reuse some things. For example, the attributes are turned into MetaItems using common logic. However, that doesn't change the fact that attributes are effectively re-validated scattered around the compiler, and many of these places have more diagnostics of their own, that could've happened during the earlier validation. It also means that at a very late stage in the compiler, we are still dealing with parsing TokenStreams, something that you'd think we should abstract away a little after parsing.

An example of such custom parsing logic:

let get = |name| {
let Some(attr) = self.get_attr(def_id, name) else {
return Bound::Unbounded;
};
debug!("layout_scalar_valid_range: attr={:?}", attr);
if let Some(
&[
ast::NestedMetaItem::Lit(ast::MetaItemLit {
kind: ast::LitKind::Int(a, _),
..
}),
],
) = attr.meta_item_list().as_deref()
{
Bound::Included(a.get())
} else {
self.dcx().span_delayed_bug(
attr.span,
"invalid rustc_layout_scalar_valid_range attribute",
);
Bound::Unbounded
}
};

Flexibility

Finally, though I have fewer concrete examples of this, sticking to ast::Attribute throughout the compiler removes quite some flexibility. Everything has to fit into an ast::Attribute, or if it doesn't, you'd have to create more variants like AttrArgsEq::Hir to support something in the ast that shouldn't even be part of the ast, forcing you to add a myriad of exceptions in parts of the compiler where such an extra variant isn't relevant yet. Specifically, for #125418 we noticed this because we wanted to do some limited form of name resolution for a path stored in an attribute, which proved next to impossible.

Ideas

Lower attributes during `rustc_ast_lowering`.

I've got 90% of a commit ready to do this, and it's what sparked the idea for this issue. It leads to some code duplication. I'm a little unhappy about it, because it forces a lot of changes across the entire compiler, exactly because attribute parsing now happens in so many places. However, it already means that a lot of assertions can be removed because at some part of the compiler, the fact that an Attribute can't have certain fields and values anymore becomes encoded in the type system. I'll open a PR for this soon, and we can discuss whether we think this is a good first step.

What also doesn't help is that rustc_attr currently has logic to validate attributes, but these functions are called in wildly different parts of the compiler. Some functions here validate actual ast::Attributes from before lowering, while other functions validate new hir::Attributes. Bugs here seem easy to make, since even though currently these are the same type, they don't always contain the same fields....

The "real solution": parse, don't validate

As I see it, what would make attributes so much nicer to work with, is if there was a place in the compiler (something like the rustc_attr crate, but actually good) where all attributes are turned from their ast tokeny representation into some specific attribute representation. Something like the following, based on the examples I've looked at in the table I showed a little higher up:

enum InlineKind {
    Always,
    Never,
    Normal
}

enum Attribute {
    Diagnostic {
        message: Symbol,
        name: Symbol,
        notes: Vec<Symbol>
    },
    Inline(InlineKind),
    Coverage(bool),
    // ...
}

This structure contains only the information necessary to use each attribute, and all the diagnostics happen while parsing into this structure. That has the added benefit that this datastructure itself serves as great documentation as to what values an attribute allows. It's super clear here that a #[diagnostic] attributes contains a message, name and some notes. Currently, you'd have to make sure the written documentation for this attribute is up-to-date enough.

The translation from ast::Attribute to this new parsed attribute should, I think, happen during AST to HIR lowering.

I think the advantages of this should be pretty obvious, based on the examples I've given of the problems with the current approach. However, I could think of some potential blockers people might care about:

  • some errors might now be thrown in different stages of the compiler (earlier). I'd say this can actually be an advantage, but I've not talked to enough people to know whether this is a problem anywhere
  • A file with code to parse these attributes will contain code of many different features. I personally like that, but it also means that those features themselves become slightly less self-contained.
  • Validity of attributes still needs to be checked on-site. (like track_caller not being valid on closures, given certain feature flags)
  • Affects large parts of the compiler, also unstable parts, where people are actively opening merge requests and might run into conflicts.

Part two I have not worked on personally. I might, if I find enough time, but if someone feels very inspired to pick this up or lead this (or tell my why this is a dumb idea) feel free to.


Everything above was my original issue, that changes were needed to attributes. Everything below is tracking the progress of these changes


Steps

Already completed

Future

Introduce hir attributes, making a very clear distinction between pre-lowering and post-lowering attributes

After hir attributes exist, I want to slowly start tackling attribute parsing. Most of that happens spread out throughout the compiler, but the one place that has a lot of it centralized is rustc_attr/src/builtin.rs. Introducing this in one go is a pain to review, so I'm trying to keep things reviewable and hopefully reduce the number of conflicts, though there will be some, and make every change on its own either neutral or a net-positive change.

split up builtins.rs into files for individual attributes

ready: jdonszelmann#4

rename rustc_attr to rustc_attr_parsing, and introduce also rustc_attr_data_structures

ready: jdonszelmann#3

(I might make this ande the previous one 1 pr). This one might be a little painful, but I think clear naming is also important. I think it's nice if code that has to do with attributes mostly lives in rustc_attr_* crates, and because of circular dependencies it can't be a single one.

Introduce new logic to systematically parse attributes

Sofar these changes might be nice, but neither makes attribute parsing actually better.

A lot of these changes are already implemented. I have made this pr to my own fork for me to keep track of everything, it contains a lot of the changes I want to make, but it's currently too large to review. I'm a bit further ahead, since I wanted to make 100% sure that what I'm planning works out. So, I've already experimented with converting around 15 different kinds of attributes to it, and testing the compiler on that. I didn't want to propose something that wouldn't actually work.

Below follows an explanation of some of it, but feel free to skip that if you're not interested in that right now. When I file this, I'll of course motivate it more. In the code it's already documented pretty well.

Define an enum AttributeKind that exhaustively lists all parsed attributes.

https://github.com/jdonszelmann/rust/blob/458f1d026c6dfe7345829326793db0f26fda6b77/compiler/rustc_attr_data_structures/src/attributes.rs#L129

It contains a variant for all attributes in the compiler, but when I make a PR for this I'll of course start with only one or two.

For a bit, Attributes will be an enum. An attribute is either Parsed, or Unparsed. Unparsed attributes will basically be hir attributes like introduced in #131808. At some point in the future, almost no attributes will be of the Unparsed type anymore, except custom tool attributes which we cannot parse.

Define "attribute groups", sets of syntactical attributes that are parsed together.

Here's an example of that. You can see the group for #[stable(..)], #[unstable(..)] and #[rustc_allowed_through_unstable_modules]

These are a group together, because they result in a single parsed AttributeKind. That's because these are either directly conflicting, or modify each other. Attribute groups have state. They work in two phases:

  1. Iterate over all relevant attributes. Which ones are relevant is defined by the group. This modifies the state of the group.
  2. When all relevant attributes have passed, each goup can report that the result of what it saw as either an error, or a valid AttributeKind.

The stability group accepts stable, unstable and rustc_allowed_through_unstable_modules, rejects stable if it already saw unstable and the other way round, and finally creates one AttributeKind containing either a stability, instability, and a boolean whether it also saw rustc_allowed_through_unstable_modules.

Define certain shortcuts for common kinds of attribute groups.

For example, an attribute that can only appear a single time, or are only a single word (no parameters), or where only either the first or the last one is accepted and the rest should warn. I've already crudely done this for attributes that can only appear a single time but I want it to match the same names as AttributeDuplicates, since that's effectively what these simplifications are. That way, these policies for when an attribute is duplicate becomes a property of how we parse it, making it impossible to do it wrong. I recently found a bug with duplicate #[inline] where it warns that the unused attribute is the last one, while actually it's the first one that's unused. That will become impossible.

Define a macro to match on attributes on nodes

You can see an example of that here


Note that the attribute parsers will usually run as part of ast lowering, but in select cases it is desired to run them early.
I did consider this, and this is indeed possible using the approach described here.

Create a new attribute parser

Currently, attributes are not really parsed, I think it can more accurately be described as being decomposed. rustc_ast::attr has all kinds of methods to convert between somewhat abstract types like AttrItem, Attribute, MetaItem, MetaItemKind, etc, representing parts of attributes but mostly containing tokens. When attributes are parsed, they are broken up and often still stored as smaller instances of these types, while what we really want to do is get the information out and throw away the tokens when parsing.

I've created a new parser that is better suited to do this, making it harder to make errors. Even though it's already ready, I'll introduce this separately so we can talk about its benefits separately.

introduce rustc_attr_validation

At this point, not much has changed as to validation. Next to rustc_attr_parsing and rustc_attr_data_structures, I intend to create rustc_attr_validation. This will represent all the logic after ast lowering, for when a tcx is available and we can run queries. Some of this currently happens in rustc_passes/check_attr.rs. However, even the fact that we will be able to exhaustively match on an enum of attributes will make mistakes harder. I intend to make more changes, such as forcing new attributes to list what kinds of targets they're valid on.

Document these changes

Of course, I'll already have documentation on all the previous changes in code. However, I intend to write a post on the dev guide as well to make sure that in the future, people know how to use the infrastructure for attributes

Port all attributes to this system

At this point, with all infrastructure in place, I expect a few PRs porting all attributes to be parsed in rustc_attr_parsing. I might ask others to help here, which is now possible when things are documented in the devguide.

Also introduce some parsed attributes in the AST

This is an idea of @oli-obk . It might be good to also make a smaller enum of parsed attributes in ast::Attribute. Especially for attributes that can be discarded when lowering, or which we need or need to validate earlier on. When we validate them while parsing, we can make fewer mistakes. These can then also contain fields that aren't just tokens to support for example resolving names like with the defines attribute.

Related issues

I intend to solve these systematically, as in, by rewriting how attributes are handled these should not be issues anymore.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 4, 2024
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-attributes Area: Attributes (`#[…]`, `#![…]`) labels Oct 4, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 4, 2024

@jieyouxu jieyouxu added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Oct 4, 2024
@bjorn3
Copy link
Member

bjorn3 commented Oct 4, 2024

How will this work with custom tool attributes? These are registered using an attribute inside the crate that uses the attribute on it'w own code.

@jdonszelmann
Copy link
Contributor Author

jdonszelmann commented Oct 4, 2024

hmm, that's a good question. I guess those specifically we couldn't parse and they should stay around as either ast::Attribute or some kind of lowered hir::Attribute a bit like how I describe in the first part of the ideas section. That still makes attributes a lot nicer in the majority of cases.

@jdonszelmann
Copy link
Contributor Author

Related: #131801

@jdonszelmann
Copy link
Contributor Author

@rustbot claim

workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 4, 2024
…r=xFrednet

Clippy: Move some attribute lints to be early pass (post expansion)

r? `@xFrednet`

As a side effect it removes a duplicated warning on line 53 of the `allow_attributes` test. I discussed this with `@xFrednet` , and it's mainly to support the attribute rework rust-lang#131229
tgross35 added a commit to tgross35/rust that referenced this issue Nov 5, 2024
…r=xFrednet

Clippy: Move some attribute lints to be early pass (post expansion)

r? ``@xFrednet``

As a side effect it removes a duplicated warning on line 53 of the `allow_attributes` test. I discussed this with ``@xFrednet`` , and it's mainly to support the attribute rework rust-lang#131229
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 5, 2024
…r=xFrednet

Clippy: Move some attribute lints to be early pass (post expansion)

r? ```@xFrednet```

As a side effect it removes a duplicated warning on line 53 of the `allow_attributes` test. I discussed this with ```@xFrednet``` , and it's mainly to support the attribute rework rust-lang#131229
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 5, 2024
Rollup merge of rust-lang#132598 - jdonszelmann:move-lints-to-early, r=xFrednet

Clippy: Move some attribute lints to be early pass (post expansion)

r? ```@xFrednet```

As a side effect it removes a duplicated warning on line 53 of the `allow_attributes` test. I discussed this with ```@xFrednet``` , and it's mainly to support the attribute rework rust-lang#131229
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Dec 6, 2024
…es, r=notriddle

rustdoc: remove eq for clean::Attributes

This change removes the `PartialEq` and `Eq` implementations from `Attributes`. This implementation was not used, and whether the implementation is useful at all is questionable. I care about removing it, because I'm working rust-lang#131229. While simplifying the representation of attributes, I intend to remove attr ids from attributes where possible. They're actually rarely useful. This piece of code uses them, but for no real reason, so I think simply removing the implementation makes most sense. Let me know if there are major objections to this.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 7, 2024
Rollup merge of rust-lang#133960 - jdonszelmann:remove-eq-on-attributes, r=notriddle

rustdoc: remove eq for clean::Attributes

This change removes the `PartialEq` and `Eq` implementations from `Attributes`. This implementation was not used, and whether the implementation is useful at all is questionable. I care about removing it, because I'm working rust-lang#131229. While simplifying the representation of attributes, I intend to remove attr ids from attributes where possible. They're actually rarely useful. This piece of code uses them, but for no real reason, so I think simply removing the implementation makes most sense. Let me know if there are major objections to this.
@jdonszelmann jdonszelmann changed the title Refactoring attributes in the compiler Tracking issue: Attribute refactor Dec 14, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 15, 2024
…bk,petrochenkov

Hir attributes

This PR needs some explanation, it's somewhat large.

- This is step one as described in rust-lang/compiler-team#796. I've added a new `hir::Attribute` which is a lowered version of `ast::Attribute`. Right now, this has few concrete effects, however every place that after this PR parses a `hir::Attribute` should later get a pre-parsed attribute as described in rust-lang/compiler-team#796 and transitively rust-lang#131229.
- an extension trait `AttributeExt` is added, which is implemented for both `ast::Attribute` and `hir::Atribute`. This makes `hir::Attributes` mostly compatible with code that used to parse `ast::Attribute`. All its methods are also added as inherent methods to avoid having to import the trait everywhere in the compiler.
  - Incremental can not not hash `ast::Attribute` at all.
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 15, 2024
…bk,petrochenkov

Hir attributes

This PR needs some explanation, it's somewhat large.

- This is step one as described in rust-lang/compiler-team#796. I've added a new `hir::Attribute` which is a lowered version of `ast::Attribute`. Right now, this has few concrete effects, however every place that after this PR parses a `hir::Attribute` should later get a pre-parsed attribute as described in rust-lang/compiler-team#796 and transitively rust-lang#131229.
- an extension trait `AttributeExt` is added, which is implemented for both `ast::Attribute` and `hir::Atribute`. This makes `hir::Attributes` mostly compatible with code that used to parse `ast::Attribute`. All its methods are also added as inherent methods to avoid having to import the trait everywhere in the compiler.
  - Incremental can not not hash `ast::Attribute` at all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area: AST A-attributes Area: Attributes (`#[…]`, `#![…]`) C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants