Auto generate ast expression nodes #16285

Glyphack · 2025-02-20T19:13:11Z

Summary

Auto generate AST nodes using definitions in ast.toml. I added attributes similar to Field in ASDL to hold field information

Test Plan

Nothing outside the ruff_python_ast package should change.

MichaReiser · 2025-02-20T19:26:36Z

Uhh, exciting.

I do feel a bit conflicted about using toml to define our grammar over e.g. something like ungrammar but maybe it's the right choice and doesn't really matter?

github-actions · 2025-02-20T19:34:14Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Glyphack · 2025-02-20T19:35:05Z

@MichaReiser I will also use ungrammar on a separate branch to see the difference I just know the name nothing more. I was also unsure if I should extend the ast.toml file or not.

I first started by re-using the ASDL parser by python and then realized the ASDL is not offering more functionality here for generating Nodes and Enums. I decided to continue with toml file since we can add custom fields there as well. For example the fields order or other payload can be added to the toml.

MichaReiser · 2025-02-20T20:45:48Z

RustPython used to use ASDL and, uff, that was painful. It's probably different because we actually parsed the official python grammar and derived the AST from it. The problem with that is that our AST diverged in many places.

dcreager

I do feel a bit conflicted about using toml to define our grammar over e.g. something like ungrammar but maybe it's the right choice and doesn't really matter?

To be honest I don't have a strong opinion between our hand-rolled TOML, ASDL, and ungrammar. To me, the main concerns are:

Is the AST definition easy to understand and maintain?
Is it easy to parse in our code generator?

Our hand-rolled TOML was great for the first proof-of-concept, since we could rely on Python's builtin tomllib to do the bulk of the parsing.

I thought ASDL could be nice primarily because we could reuse a lot of the parsing logic from CPython, since (at least at the moment) we are using a Python script to parse the AST definition and generate our (Rust) code. (At first I hoped it would also have the benefit of letting us reuse the grammar itself from CPython, but as @MichaReiser points out, our AST has diverged from CPython's representation.)

RustPython used to use ASDL and, uff, that was painful.

What part was painful about it? Was the syntax itself too limited? Or was it cumbersome to parse and generate code from?

crates/ruff_python_ast/ast.toml

crates/ruff_python_ast/generate.py

crates/ruff_python_ast/src/nodes.rs

MichaReiser · 2025-02-20T22:02:12Z

What part was painful about it? Was the syntax itself too limited? Or was it cumbersome to parse and generate code from?

The main pain point was that it parsed python's official AST grammar and then did some hacky overrides in code for where we wanted to diverge.

My other concern with using ASDL is that it isn't easy for us to extend if we need to and I always found it hard to read (e.g. could we extract field documentation?)

Glyphack · 2025-02-21T22:25:54Z

good news 🎉 I was able to complete the generation for all expression nodes. There are a few hacky things I need to fix.

We need a way to provide the derive attribute to structs. Right now I just added a derives field that adds additional derives for nodes that have the Default derive.

ExprNoneLiteral = { fields = [], derives = ["Default"]}

I initially thought only knowing a field is a sequence or not would be enough to insert it in a Vec but I found in ExprCompare one of the types is Box<[crate::CmpOp]>.

ExprCompare = { fields = [
    { name = "ops", type = "Box<[crate::CmpOp]>"},
    { name = "comparators", type = "Box<[Expr]>"}
]}

I added a rule to automatically box if a field type is the group of the node it belongs to so we box every Expr but this might be confusing because for Box<str> we need to box explicitly. Or for [Expr] type the auto boxing won't work. I'm also not interested in making it more complicated but I'm not sure what way would be better.
Also it won't work in this example:

{ name = "parameters", type = "Box<crate::Parameters>", optional = true },

I'm not sure if we refer to Name just by it's name and import it in the file or like this:

{ name = "id", type = "name::Name" },

The current code also uses crate:: before every type. This is wrong because if a type is bool it should not become crate::bool but removing that requires every type to contain the crate word which is redundant mostly. We can solve this by having a map of types that should not have crate:: I don't know what would be the easiest solution.

I'm reading about ungrammar but I feel because of the extensibility we need in the generator using a custom file would be easier because we can add stuff without hacking it on top of something else. But I have to read it first.

dcreager

good news 🎉 I was able to complete the generation for all expression nodes. There are a few hacky things I need to fix.

This is starting to come together very nicely! Thank you for tackling this.

I'm reading about ungrammar but I feel because of the extensibility we need in the generator using a custom file would be easier because we can add stuff without hacking it on top of something else. But I have to read it first.

Given some of my style nits about cleaning up the TOML, I think it's fine to proceed with this PR without also looking for a way to migrate to ungrammar. I think that can be a separate experiment/PR — and in all honesty, a lower priority one, since this seems to be shaping up nicely as it currently is.

crates/ruff_python_ast/ast.toml

crates/ruff_python_ast/generate.py

crates/ruff_python_ast/src/nodes.rs

crates/ruff_python_ast/src/relocate.rs

crates/ruff_python_ast/ast.toml

MichaReiser

I like where this is going.

I do find the TOML somewhat hard to parse because of how verbose fields is. I'd suggest exploring using regular lists with inline tables to make the format more compact. We can still use the "full-table" layout in cases where it is necessary (because inline-tables need to be single line)

crates/ruff_python_ast/ast.toml

crates/ruff_python_ast/generate.py

MichaReiser · 2025-03-04T09:38:23Z

@dcreager this looks good to me. I'll let you have the final review as the issue author

dcreager

This is great! My only comment is that we need to update the documentation to describe the changes.

dcreager · 2025-03-04T20:49:08Z

crates/ruff_python_ast/ast.toml

Update the comment at the top of this file:

rustdoc → doc in the "group options" section

Describe doc, derives, and fields in the "syntax node options" section

In particular, make sure to describe the mini-language for fields.type

Thanks for the reminder. I delayed this to document when changes are final but completely forgot.

Co-authored-by: Douglas Creager <dcreager@dcreager.net>

dcreager

Love it. Thanks for tackling this!

## Summary  Part of #15655 Replaced statement nodes with autogenerated ones. Reused the stuff we introduced in #16285. Nothing except for copying the nodes to new format. ## Test Plan Tests run without any changes. Also moved the test that checks size of AST nodes to `generated.rs` since all of the structs that it tests are now there.

Glyphack force-pushed the autogen-ast branch 3 times, most recently from fd8c7cd to 85c9400 Compare February 20, 2025 19:25

MichaReiser requested a review from dcreager February 20, 2025 19:25

AlexWaygood added the internal An internal refactor or improvement label Feb 20, 2025

dcreager reviewed Feb 20, 2025

View reviewed changes

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved

crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved

MichaReiser requested a review from dcreager February 25, 2025 08:13

dcreager reviewed Feb 25, 2025

View reviewed changes

Glyphack force-pushed the autogen-ast branch from 177cfed to ce4cf6f Compare February 25, 2025 19:30

Glyphack requested a review from dcreager February 26, 2025 07:15

Glyphack marked this pull request as ready for review February 26, 2025 07:15

MichaReiser reviewed Feb 26, 2025

View reviewed changes

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved

dcreager reviewed Feb 27, 2025

View reviewed changes

crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved

crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved

Glyphack force-pushed the autogen-ast branch from 5c6410b to 03d1891 Compare March 2, 2025 10:09

MichaReiser approved these changes Mar 3, 2025

View reviewed changes

MichaReiser requested a review from dcreager March 3, 2025 09:55

dcreager reviewed Mar 4, 2025

View reviewed changes

Glyphack and others added 6 commits March 4, 2025 22:13

Auto generate ast expression nodes

8a3de69

Auto generate all expression AST nodes

e59903c

Fix linter issues

0facf65

Update crates/ruff_python_ast/generate.py

ce1492c

Co-authored-by: Douglas Creager <dcreager@dcreager.net>

Apply review suggestions

af6fd56

Move node comments into rustdoc

ded3c53

Glyphack and others added 7 commits March 4, 2025 22:13

Update crates/ruff_python_ast/ast.toml

85e07f7

Co-authored-by: Douglas Creager <dcreager@dcreager.net>

Refactor rustdoc

e6bee43

Define the fields requiring crate:: prefix

386ca09

Fix clippy

b908b3b

Use inline arrays with DSL types

ed9b41d

Remove special logic only applicable to Parameters

6a896eb

Document AST code generation

a91dcb1

Glyphack force-pushed the autogen-ast branch from 283be3e to a91dcb1 Compare March 4, 2025 21:14

Glyphack requested a review from dcreager March 4, 2025 21:15

dcreager approved these changes Mar 5, 2025

View reviewed changes

dcreager merged commit 23fd492 into astral-sh:main Mar 5, 2025
21 checks passed

Glyphack mentioned this pull request Mar 11, 2025

[red-knot] Auto generate statement nodes #16645

Merged

Glyphack deleted the autogen-ast branch May 21, 2025 20:26

Auto generate ast expression nodes #16285

Auto generate ast expression nodes #16285

Uh oh!

Conversation

Glyphack commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

MichaReiser commented Feb 20, 2025

Uh oh!

github-actions bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

Glyphack commented Feb 20, 2025

Uh oh!

MichaReiser commented Feb 20, 2025

Uh oh!

dcreager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser commented Feb 20, 2025

Uh oh!

Glyphack commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcreager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser commented Mar 4, 2025

Uh oh!

dcreager left a comment

Choose a reason for hiding this comment

Uh oh!

dcreager Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Glyphack Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

dcreager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Glyphack commented Feb 20, 2025 •

edited

Loading

github-actions bot commented Feb 20, 2025 •

edited

Loading

`ruff-ecosystem` results

Glyphack commented Feb 21, 2025 •

edited

Loading