Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro fragment fields #3714

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Changes from 17 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a2fc4ac
Macro fragment fields
joshtriplett Oct 15, 2024
1edb079
RFC 3714
joshtriplett Oct 21, 2024
7fc82cd
Improve spans for fields without corresponding tokens
joshtriplett Oct 21, 2024
bbb2dd0
Rephrase some future work
joshtriplett Oct 22, 2024
195f8a9
Rephrase explanation of using fragment fields
joshtriplett Oct 22, 2024
5d002f4
Define `param` using repetition, to allow users more flexibility with…
joshtriplett Oct 22, 2024
3e14949
Clarify that `:fn` is a definition, including a body
joshtriplett Oct 22, 2024
b032d5e
Future work: function declarations
joshtriplett Oct 22, 2024
df73c45
Add more future possibilities
joshtriplett Oct 22, 2024
d6a5314
Fix example
joshtriplett Oct 24, 2024
d0ba412
Future possibilities: function qualifiers like `const` and `async`
joshtriplett Oct 24, 2024
271c9c4
Hedge a future possibility further
joshtriplett Oct 24, 2024
62bf518
Expand on possible future handling of `param`
joshtriplett Oct 24, 2024
aacf8ba
Note that adding new fields to an existing matcher is forward-compatible
joshtriplett Oct 24, 2024
2da9937
Add `vis` for `:adt`
joshtriplett Oct 24, 2024
69a2c9a
Discuss synthesis of tokens for fields
joshtriplett Oct 24, 2024
37893b4
Future possibilities: add speculations about conditionally available …
joshtriplett Oct 24, 2024
39f750c
More speculative future possibilities
joshtriplett Nov 12, 2024
2c885c1
Link RFC
joshtriplett Nov 12, 2024
3980897
Word-wrap after merging suggestion
joshtriplett Nov 12, 2024
935694c
Link RFC in more places
joshtriplett Nov 12, 2024
cb7570a
Fix typo
joshtriplett Nov 20, 2024
185b841
More future possibilities
joshtriplett Nov 20, 2024
225773b
Add unresolved question about `return_type`
joshtriplett Nov 20, 2024
0ea7ea9
Future possibility: handle structs and tuples uniformly
joshtriplett Nov 20, 2024
1041307
Add unresolved question about process and delegation
joshtriplett Dec 2, 2024
bf3dca0
Wording tweak
joshtriplett Dec 2, 2024
a2f14ab
Add backquotes to clarify the type of `body`
joshtriplett Dec 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions text/3714-macro-fragment-fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
- Feature Name: `macro_fragment_fields`
- Start Date: 2024-10-14
- RFC PR: [rust-lang/rfcs#3714](https://github.com/rust-lang/rfcs/pull/3714)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add a syntax and mechanism for macros to access "fields" of high-level fragment
specifiers that they've matched, to let macros use the Rust parser for
robustness and future compatibility, while still extracting pieces of the
matched syntax.

# Motivation
[motivation]: #motivation

The macros-by-example system is powerful, but sometimes difficult to work with.
In particular, parsing complex parts of Rust syntax often requires carefully
recreating large chunks of the Rust grammar, in order to parse out the desired
pieces. Missing or incorrectly handling any portion of the syntax can result in
not accepting the same syntax Rust does; this includes future extensions to
Rust syntax that the macro was not yet aware of. Higher-level fragment
specifiers are more robust for these cases, but don't allow extracting
individual pieces of the matched syntax.

This RFC introduces a mechanism to use high-level fragment specifiers while
still extracting individual pieces of the matched syntax.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

When writing macros by example, and using certain high-level fragment
specifiers, you can use the syntax `${matched_name.field_name}` to extract
specific "fields" of the matched syntax. This allows you to use the Rust parser
for those high-level fragments, rather than having to recreate parts of the
Rust grammar in order to extract the specific pieces you want. Fields evaluate
to pieces of Rust syntax, suitable for substitution into the program or passing
to other macros for further processing.

For example, the fragment `:adt` parses any abstract data type supported by
Rust: struct, union, or enum. Given a match `$t:adt`, you can obtain the name
of the matched type with `${t.name}`:

```rust
macro_rules! get_name {
($t:adt) => { stringify!(${t.name}) }
}

fn main() {
let n1 = get_name!(struct S { field: u32 });
let n2 = get_name!(enum E { V1, V2 = 42, V3(u8) });
let n3 = get_name!(union U { u: u32, f: f32 });
println!("{n3}{n1}{n2}"); // prints "USE"
}
```

An attempt to access a field that doesn't exist will produce a compilation
error on the macro definition, whether or not the specific macro rule gets
invoked.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Fragment fields may be used in a macro transcriber anywhere the corresponding
fragment name could be used.

Fragment fields typically follow the same rules for repetition handling as the
corresponding fragment (e.g. being used at the same level/kind of repetition).
However, fragment fields that contain multiple items require one additional
level of repetition; see the `param` field of `:fn`, below.

This RFC introduces the following new fragment specifiers, with specified fields:
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

- `:fn`: A function definition (including body).
- `name`: The name of the function, as an `ident`.
- `param`: The parameters of the function, presented as though captured by a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "type" of this field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the macro fragment param, then each repetition will have type param; until then, each repetition looks like pat_param: ty. (Handwaving the ... case here.)

level of `*` repetition. For instance, you can write `$(${f.param}),*` to
get a comma-separated list of parameters, or `$(other_macro!(${f.param}))*`
to pass each parameter to another macro.
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
- `return_type`: The return type of the function, as a `ty`. If the function
has no explicitly specified return type, this will be `()`, with a span of
the closing parenthesis for the function arguments.
- `body`: The body of the function, as a block (including the
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
surrounding braces).
- `vis`: The visibility of the function, as a `vis` (may be empty).
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
- `:adt`: An ADT (struct, union, or enum).
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
- `name`: The name of the ADT, as an `ident`.
- `vis`: The visibility of the ADT, as a `vis` (may be empty).

The tokens within fields have the spans of the corresponding tokens from the
source. If a token has no corresponding source (e.g. the `()` in `return_type`
for a `fn` with no explicitly specified return type), the field definition
defines an appropriate span.

Using a field of a fragment counts as a use of the fragment, for the purposes
of ensuring every fragment gets used at least once at the appropriate level of
repetition.

This extends the grammar of macro metavariable expressions to allow using a dot
and identifier to access a field.

Note that future versions of Rust can add new fields to an existing matcher;
doing so is a compatible change.

# Drawbacks
[drawbacks]: #drawbacks

This adds complexity to the macro system, in order to simplify macros in the
ecosystem.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Rather than using field syntax, we could use function-like syntax in the style
of RFC 3086's macro metavariable expressions. However, field syntax seems like
a more natural fit for this concept.
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

Rather than synthesizing tokens for cases like `return_type`, we could make a
rule that we *never* provide tokens that aren't in the original source.
However, this would substantially limit usability of these fields in some
cases, and make macros harder to write. This RFC proposes, in general, that we
can synthesize tokens if necessary to provide useful values for fields.

# Prior art
[prior-art]: #prior-art

RFC 3086, for macro metavariable expressions, introduced a similar mechanism to
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
add helpers for macros to more easily process the contents of fragments.

# Future possibilities
[future-possibilities]: #future-possibilities

This RFC proposes a few obvious useful fields, both for their own sake and to
serve as examples of the concept. There are many more fields we may want to
introduce in the future. This RFC intentionally proposes only a few fields, to
allow evaluating the RFC on the basis of the concept and proposed syntax rather
than every individual field proposal. If any individual proposed field proves
controversial or requires more extensive design, it should be removed and
deferred to a future RFC, rather than complicating this RFC with that more
extensive design.

Some examples of *possible* fields, to be evaluated in the future:
- For `fn`, a field for the ABI. This could be a synthesized `"Rust"` for
functions without a specified ABI.
- For `fn`, one or more fields for qualifiers such as `const` and `async`.
- For `adt` and `fn`, fields for the generics and bounds. We may want to
provide them exactly as specified, or we may want to combine the bounds from
both generics and where clauses. (This would work well together with a macro
metavariable expression to generate the appropriate `where` bounds for a
`derive`.)
- For `adt`, `fn`, and various others, a field for the doc comment, if any.
- For `block`, a field for the statements in the block.
- For `path`, a field for the segments in the path, and a field for the leading
`::` if any.
- For `lifetime`, a field for the lifetime identifier, without the `'`.

Some examples of *possible* additional fragment specifiers, to be evaluated in
the future:
- `param` for a single function parameter, with fields for the pattern and the
type. (This would also need to handle cases like `...` in variadic functions,
and cases like `self`, perhaps by acting as if it was `self: Self`.)
- `field` for a single field of a `struct`, `union`, or struct-style enum
variant.
- `variant` for a single variant of an `enum`
- `fndecl` for a function declaration (rather than a definition), such as in a
trait or an extern block.
- `trait` for a trait definition, with fields for functions and associated
types.
- `binop` for a binary operator expression, with fields for the operator and
the two operands.
- `match` for a match expression, with fields for the scrutinee and the arms.
- `match_arm` for one arm of a match, with fields for the pattern and the body.
- `doc` for a doc comment, with `head` and `body` fields (handled the same way
rustdoc does).

Some of these have tensions between providing convenient fields and handling
variations of these fragments that can't provide those fields. We could handle
this via separate fragment specifiers for different variations, or by some
mechanism for conditionally handling fields that may not exist. The former
would be less robust against future variations, while the latter would be more
complex.

We could handle conditionally available fields by presenting them as though
they have a repetition of `?`, which would allow expansions within `$(...)?`;
that would support simple conditional cases without much complexity.

We could handle some other types of conditions by presenting "boolean"-like
fields as fields that expand to no tokens but do so under a repetition of `?`,
to allow writing conditionals like `$(${x.field} ...)?`. This would fit such
conditionals within existing macro concepts, but it may suffer from an unwanted
overabundance of cleverness, and may not be as easy to read as a dedicated
conditional construct.

If, in the future, we introduce fields whose values have fragment types that
themselves have fields, we should support nested field syntax.

We may want to provide a macro metavariable function to extract syntax that has
specific attributes (e.g. derive helper attributes) attached to it. For
instance, a derive macro applied to a struct may want to get the fields that
have a specific helper attribute attached.

If, in the future, we have a robust mechanism for compilation-time execution of
Rust or some subset of Rust, without requiring separately compiled proc macro
crates, we may want to use and extend that mechanism in preference to any
further complexity in the `macro_rules` system. However, such a mechanism seems
likely to be far in the future.