Codegen AST related code #3819

Boshen · 2024-06-22T05:03:48Z

Background

For keeping compilation speed down for ourselves and our downstream crate users, I have avoided using macros to generate boilerplate code at all cost.

All performance issues (runtime and compilation speed) are considered as bugs in this project.

But due to the amount of AST related supporting code has expanded over the last few months, and our desire to experiment with different ideas with the AST, our current approach of manually updating everything is no longer functional.

The goal of this task is to provide code generation for all AST node related code.

Requirements

no build.rs published to the user
all generated code are checked into git
no nightly
Rust code residing in crates/oxc_ast should be the source of truth

Workflow

a user changes the oxc_ast crate
a watch change picks it up
parse all #[visited_node]
saves them into a model, e.g. struct ASTModel { name: String, attributes: Vec<Attributes> }
generate code and save file

Code to generate

ASTKind (Milestone 1)
GetSpan (Milestone 2)
Hash, PartialEq implementations (Milestone 3)
ASTBuilder (Milestone 4)
Visit (Milestone 5)
VisitMut (Milestone 5)
Traverse, remove current js code (Milestone 6)

~~typescript definition for napi and wasm~~ moving the scope to AST transfer.

Every milestone should be a stack of PRs, since they will involve changing other files as well.

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-06-22T17:37:16Z

The AST also includes some types which are in oxc_syntax e.g. AssignmentOperator, BinaryOperator etc. Should we move those types into oxc_ast to simplify which files the codegen needs to read?

(I've never been clear why these types are in oxc_syntax, since they are part of AST)

Boshen · 2024-06-22T18:00:12Z

Should we move

No, let's not scope creep.

There are other traits and functions that depend on these common operators.

Part of #3819

Boshen · 2024-07-02T13:56:25Z

Are we able to manually inline @inherit MemberExpression and generate all the other utilities for these inherit_variants! enums?

rzvxa · 2024-07-02T15:04:10Z

Are we able to manually inline @inherit MemberExpression and generate all the other utilities for these inherit_variants! enums?

Can you elaborate on it? Wouldn't we lose some information if we don't express the inheritance in the type system? While the actual code would behave the same, It might be useful for the user to see this as the definition:

inherit_variants! {
/// Argument
///
/// Inherits variants from [`Expression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Argument<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>) = 64,
    // `Expression` variants added here by `inherit_variants!` macro
    @inherit Expression
}
}

Instead of this:

/// Argument
///
/// Inherits variants from [`Expression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Argument<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>) = 64,

    BooleanLiteral(Box<'a, BooleanLiteral>) = 0,
    NullLiteral(Box<'a, NullLiteral>) = 1,
    NumericLiteral(Box<'a, NumericLiteral<'a>>) = 2,
    BigIntLiteral(Box<'a, BigIntLiteral<'a>>) = 3,
    RegExpLiteral(Box<'a, RegExpLiteral<'a>>) = 4,
    StringLiteral(Box<'a, StringLiteral<'a>>) = 5,
    TemplateLiteral(Box<'a, TemplateLiteral<'a>>) = 6,

    Identifier(Box<'a, IdentifierReference<'a>>) = 7,

    MetaProperty(Box<'a, MetaProperty<'a>>) = 8,
    Super(Box<'a, Super>) = 9,

    ArrayExpression(Box<'a, ArrayExpression<'a>>) = 10,
    ArrowFunctionExpression(Box<'a, ArrowFunctionExpression<'a>>) = 11,
    AssignmentExpression(Box<'a, AssignmentExpression<'a>>) = 12,
    AwaitExpression(Box<'a, AwaitExpression<'a>>) = 13,
    BinaryExpression(Box<'a, BinaryExpression<'a>>) = 14,
    CallExpression(Box<'a, CallExpression<'a>>) = 15,
    ChainExpression(Box<'a, ChainExpression<'a>>) = 16,
    // .
    // .
    // .
}

On another note, if we get the macro caching to work (related to this) we can have this without any compilation penalty which I think is much superior(And I guess @overlookmotel would agree).

/// Argument
///
/// Inherits variants from [`Expression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[inheritance]
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Argument<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>) = 64,
    #[inherit] // or maybe `#[inline]` since it is more rust-like?
    Expression(Expression<'a>),
}

rzvxa · 2024-07-02T15:15:01Z

If we want to have the inheritance expanded we should do the expansion with the codegen to keep it consistent and schema close to the ECMAspec. But IMHO, manually inlining the variants is going to be a maintenance nightmare. because they would easily drift apart with some time and careless reviewing(which always happens no matter how many people review the same piece of code).

rzvxa · 2024-07-08T23:08:28Z

Hash, PartialEq implementations (Milestone 3)

@Boshen @overlookmotel I think we should remove this one from our milestones and here's why.

The performance gain from pre-expanding builtin derives is going to chop off a handful of seconds from our build time and even that is an optimistic estimate. It might at first feel unintuitive but let me explain myself.

Syntax Extensions

The macro calls(including the macro_rules itself), attributes, and derives are among syntax extension kinds, A syntax extension is the last form of a macro(usually) before turning into the desired AST.

If you look at the implementation of SyntaxExtensionKind you'd find that there are 2 kinds of any macro kind, There are 2 sets Bang, Attr, Derive and LegacyBang, LegacyAttr, LegacyDerive.

What's the difference?

In a nutshell, legacy kinds have AST -> AST signature as opposed to TokenStream -> TokenStream.
As of right now, Rust registers all of its builtin macros as the former kind, These implementations are the closest to the bare metal and get executed almost as fast as any lowering process happening in the compiler.

What happens if they phase out the legacy kinds?

Well again it might seem intuitive that with this change a pre-expand can improve the build time by a good margin but the truth is that for dead simple macros such as these 2 if written as a procedural macro you are going to mostly wait on the initial build of the macro crate and then spend a good amount of time on the overhead of the proc_macro_server, So if they were to phase out the legacy expanders they still can implement the MultiItemModifier differently to short-circuit to the builtin implementation avoiding all of the shortcomings of normal procedural macros. That's exactly why declarative macros are fast to build.

Conclusion

I guess my conclusion is in the form of a question.
While pre-expanding the built-in derives can speed up the builds by N seconds is it really worth giving up on the built-in macros? It yields more performance if it is done on the derives implemented in the users' land.

Links

https://github.com/rust-lang/rust/blob/32e692681e457b06efedd05c5a4be0d718f39292/compiler/rustc_builtin_macros/src/derive.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_builtin_macros/src/deriving/default.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_builtin_macros/src/deriving/hash.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_builtin_macros/src/deriving/cmp/partial_eq.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_expand/src/proc_macro.rs#L110
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_expand/src/mbe/macro_rules.rs#L366

overlookmotel · 2024-07-09T00:00:46Z

You make a strong case for why performance wouldn't be a good reason to codegen these 2 traits. Thanks for all the background info. I was completely unaware of the legacy macro kinds, and Rust's macro machinery in general. Really interesting.

But, just to explain, my rationale for putting these 2 on the list wasn't for perf reasons. The reasons were:

`Hash`

We want Hash to skip span fields, and also symbol_id etc. Skipping Hash is currently achieved by a dummy Hash impl on Span type:

oxc/crates/oxc_span/src/span.rs

Lines 309 to 313 in 1b91d40

    
           impl Hash for Span { 
        
               fn hash<H: Hasher>(&self, _state: &mut H) { 
        
                   // hash to nothing so all ast spans can be comparable with hash 
        
               } 
        
           }

I saw this as a bit of a hack, and thought it would be more "proper" solution to implement Hash on all AST types with the span field explicitly skipped. That would be laborious and error-prone to do by hand, but much easier with codegen.

`PartialEq`

The reason why we are implementing Hash on AST types is for (if I remember right) comparing expressions in a lint rule to identify pointless if branches e.g. if (x) {} else if (x) { /* unreachable */ }. But actually Hash is not ideal for this use - PartialEq would be more efficient.

But it's hard to implement PartialEq because, again, we need to skip span fields, and the no-op hash hack isn't viable here, so we can't use #[derive(PartialEq)].

So I thought we could do that with codegen.

Conclusion

I'm not at all arguing against what you've said @rzvxa. I just wanted to explain that there was another angle.

rzvxa · 2024-07-09T00:41:37Z

Thanks for the explanation, Now I see the full picture for this one.

Boshen · 2024-07-09T03:46:41Z

We can defer the implementation of Hash and PartialEq because we don't need them right now (we didn't need them for so long so probably won't need them for a longer time.)

As for Traverse, @rzvxa can decide continue working on it, or come back after research for ast template macro is completed.

Boshen · 2024-07-11T03:00:45Z

Consider this is done, we can migrate Traverse at a later time, and raise new issues regarding ast codegen.

Boshen added the C-enhancement Category - New feature or request label Jun 22, 2024

Boshen assigned rzvxa Jun 22, 2024

Boshen mentioned this issue Jun 24, 2024

feat(tasks/ast_codegen): prototype for codegen AST related code #3815

Merged

Boshen pushed a commit that referenced this issue Jun 25, 2024

feat(tasks/ast_codegen): prototype for codegen AST related code (#3815)

f6c4ec4

Part of #3819

Boshen mentioned this issue Jul 9, 2024

chore(deps): bump oxc to 0.19.0 rolldown/rolldown#1560

Merged

rzvxa mentioned this issue Jul 9, 2024

CI infrastructre for the ast_codegen #4138

Closed

Boshen mentioned this issue Jul 10, 2024

RFC: codegen AST related codes #4134

Closed

Boshen closed this as completed Jul 11, 2024

overlookmotel mentioned this issue Aug 28, 2024

ast: Derive Hash and PartialEq on AST node types #5283

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen AST related code #3819

Codegen AST related code #3819

Boshen commented Jun 22, 2024 •

edited by rzvxa

Loading

overlookmotel commented Jun 22, 2024

Boshen commented Jun 22, 2024 •

edited

Loading

Boshen commented Jul 2, 2024

rzvxa commented Jul 2, 2024 •

edited

Loading

rzvxa commented Jul 2, 2024 •

edited

Loading

rzvxa commented Jul 8, 2024 •

edited

Loading

overlookmotel commented Jul 9, 2024

rzvxa commented Jul 9, 2024

Boshen commented Jul 9, 2024

Boshen commented Jul 11, 2024

Codegen AST related code #3819

Codegen AST related code #3819

Comments

Boshen commented Jun 22, 2024 • edited by rzvxa Loading

Background

Requirements

Workflow

Code to generate

overlookmotel commented Jun 22, 2024

Boshen commented Jun 22, 2024 • edited Loading

Boshen commented Jul 2, 2024

rzvxa commented Jul 2, 2024 • edited Loading

rzvxa commented Jul 2, 2024 • edited Loading

rzvxa commented Jul 8, 2024 • edited Loading

Syntax Extensions

What's the difference?

What happens if they phase out the legacy kinds?

Conclusion

overlookmotel commented Jul 9, 2024

Hash

PartialEq

Conclusion

rzvxa commented Jul 9, 2024

Boshen commented Jul 9, 2024

Boshen commented Jul 11, 2024

Boshen commented Jun 22, 2024 •

edited by rzvxa

Loading

Boshen commented Jun 22, 2024 •

edited

Loading

rzvxa commented Jul 2, 2024 •

edited

Loading

rzvxa commented Jul 2, 2024 •

edited

Loading

rzvxa commented Jul 8, 2024 •

edited

Loading

`Hash`

`PartialEq`