-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Auto generate ast expression nodes #16285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fd8c7cd to
85c9400
Compare
|
Uhh, exciting. I do feel a bit conflicted about using |
|
|
@MichaReiser I will also use ungrammar on a separate branch to see the difference I just know the name nothing more. I was also unsure if I should extend the I first started by re-using the ASDL parser by python and then realized the ASDL is not offering more functionality here for generating Nodes and Enums. I decided to continue with |
|
RustPython used to use ASDL and, uff, that was painful. It's probably different because we actually parsed the official python grammar and derived the AST from it. The problem with that is that our AST diverged in many places. |
dcreager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do feel a bit conflicted about using
tomlto define our grammar over e.g. something like ungrammar but maybe it's the right choice and doesn't really matter?
To be honest I don't have a strong opinion between our hand-rolled TOML, ASDL, and ungrammar. To me, the main concerns are:
- Is the AST definition easy to understand and maintain?
- Is it easy to parse in our code generator?
Our hand-rolled TOML was great for the first proof-of-concept, since we could rely on Python's builtin tomllib to do the bulk of the parsing.
I thought ASDL could be nice primarily because we could reuse a lot of the parsing logic from CPython, since (at least at the moment) we are using a Python script to parse the AST definition and generate our (Rust) code. (At first I hoped it would also have the benefit of letting us reuse the grammar itself from CPython, but as @MichaReiser points out, our AST has diverged from CPython's representation.)
RustPython used to use ASDL and, uff, that was painful.
What part was painful about it? Was the syntax itself too limited? Or was it cumbersome to parse and generate code from?
The main pain point was that it parsed python's official AST grammar and then did some hacky overrides in code for where we wanted to diverge. My other concern with using ASDL is that it isn't easy for us to extend if we need to and I always found it hard to read (e.g. could we extract field documentation?) |
|
good news 🎉 I was able to complete the generation for all expression nodes. There are a few hacky things I need to fix. We need a way to provide the derive attribute to structs. Right now I just added a I initially thought only knowing a field is a sequence or not would be enough to insert it in a I added a rule to automatically box if a field type is the group of the node it belongs to so we box every I'm not sure if we refer to The current code also uses I'm reading about ungrammar but I feel because of the extensibility we need in the generator using a custom file would be easier because we can add stuff without hacking it on top of something else. But I have to read it first. |
dcreager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good news 🎉 I was able to complete the generation for all expression nodes. There are a few hacky things I need to fix.
This is starting to come together very nicely! Thank you for tackling this.
I'm reading about ungrammar but I feel because of the extensibility we need in the generator using a custom file would be easier because we can add stuff without hacking it on top of something else. But I have to read it first.
Given some of my style nits about cleaning up the TOML, I think it's fine to proceed with this PR without also looking for a way to migrate to ungrammar. I think that can be a separate experiment/PR — and in all honesty, a lower priority one, since this seems to be shaping up nicely as it currently is.
177cfed to
ce4cf6f
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this is going.
I do find the TOML somewhat hard to parse because of how verbose fields is. I'd suggest exploring using regular lists with inline tables to make the format more compact. We can still use the "full-table" layout in cases where it is necessary (because inline-tables need to be single line)
|
@dcreager this looks good to me. I'll let you have the final review as the issue author |
dcreager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! My only comment is that we need to update the documentation to describe the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment at the top of this file:
rustdoc→docin the "group options" section- Describe
doc,derives, andfieldsin the "syntax node options" section - In particular, make sure to describe the mini-language for
fields.type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. I delayed this to document when changes are final but completely forgot.
Co-authored-by: Douglas Creager <dcreager@dcreager.net>
Co-authored-by: Douglas Creager <dcreager@dcreager.net>
dcreager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. Thanks for tackling this!
<!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary <!-- What's the purpose of the change? What does it do, and why? --> Part of #15655 Replaced statement nodes with autogenerated ones. Reused the stuff we introduced in #16285. Nothing except for copying the nodes to new format. ## Test Plan Tests run without any changes. Also moved the test that checks size of AST nodes to `generated.rs` since all of the structs that it tests are now there. <!-- How was it tested? -->
Summary
Part of #15655
ast.toml. I added attributes similar toFieldin ASDL to hold field informationTest Plan
Nothing outside the
ruff_python_astpackage should change.