Add `ast_node_id` field to all ast nodes #2818

rzvxa · 2024-03-26T04:25:13Z

We should have a field similar to IdentifierReference's reference_id that would keep the AstNodeId of each AST node.

Can be written manually, using a declarative ast_node macro for definition or a procedural derive macro.
I guess we can also remove AstNode as it can be changed into SOA in the AstNodes structure.
If we remove the AstNode struct we can add a trait with the same name which would implement a function called ast_node_id that would give us the information for getting things that AstNode would've provided before. With this trait, we can use any ast node with the help of generics since with the ast_node_id alone we can infer the node's kind, parents, scope, control flow, and anything that gets added down the road.

The text was updated successfully, but these errors were encountered:

rzvxa · 2024-03-26T04:25:23Z

cc @Boshen

rzvxa · 2024-03-26T09:28:39Z

Let's say we are in the process of visiting an expression, Is there any way that can get that expression's AstNodeId? If there is not we should certainly add one.

For doing such a thing 2 things come to my mind.

Adding a reverse hash between the inner node(the AST item itself, expression in the example above) and the AstNodeId to retrieve it after semantic analysis.
Go through with adding the ast_node_id field to all those AST structs to allow access to the AstNodeId without
any extra lookup table.

Without any one of these, we have to basically intercept the visit at a much higher level than necessary to gain knowledge of the surrounding nodes.

I believe that having ast_node_id on individual structs can reduce the complexity when working with ASTs since it would eliminate the need for an additional wrapper structure(AstNode) around the node itself. Instead of being limited to AstNode fields we can fetch all sorts of data using a node's ID from various sources.
It would also work very well with SOA designs. Although it is a bit imperative.

Boshen · 2024-03-26T14:37:11Z

It's a 👍 for adding a node id because rustc has them as well https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/node_id/struct.NodeId.html, so the arguments are not theoretical but practical.

Since this requires a lot of effort, I think we need a few real world scenarios that can motivate this change, to decide whether we want to do this right now, or later when it's absolutely necessary.

rzvxa · 2024-03-26T19:19:28Z

Well, let me try my best to explain a real-world situation in which this change can help make things easier.

Let's say we want to transform a given node based on its parent, Right now this is achieved by monitoring all nodes and keeping a stack of them. I'm going to take the ArrowFunctions transformer as an example but after studying the babel transformers I find it a common thing.

impl<'a> VisitMut<'a> for ArrowFunctions<'a> {
    fn enter_node(&mut self, kind: AstType) {
        self.nodes.push(kind);
    }

    fn leave_node(&mut self, _kind: AstType) {
        self.nodes.pop();
    }

    fn visit_jsx_identifier(&mut self, ident: &mut JSXIdentifier<'a>) {
        let parent_kind = self.nodes.last().unwrap();
        let parent_parent_kind = self.nodes[self.nodes.len() - 2];
        // ...
        walk_jsx_identifier_mut(self, ident);
    }
}

Well if we go for this change it can be simplified into something more akin to this:

impl<'a> VisitMut<'a> for ArrowFunctions<'a> {
    fn visit_jsx_identifier(&mut self, ident: &mut JSXIdentifier<'a>) {
        let parent = self.ctx.semantics().nodes().parent(ident.ast_node_id());
        let parent_kind = parent.kind();
        let parent_parent_kind = self.ctx.semantics().nodes().parent_kind(parent.ast_node_id());
        // ...
        walk_jsx_identifier_mut(self, ident);
    }
}

Keep in mind that the example above is only using its parent's kind, If we want the actual parent node in VisitMut we have to do crazy things like having a reverse lookup table, searching through the nodes, or trying to achieve the same thing with only scope information. While having the ast_node_id would make it a simple array indexing.

rzvxa · 2024-03-26T19:33:58Z

Even in Visit which has the node reference passed through enter/leave_node, the need to keep a stack of parents while there is already such representation in semantics is counterproductive. It may also have minimal impact on performance and memory footprint while scaling but I don't find this part problematic. Not to mention the loss of extra information because of lack of access to the ast_node_id, basically if we want to add any complex analysis in which its result is needed further down the road would make for a good mind-boggler.

I guess it makes sense that Rust does it; since it needs complex analysis. Roc and Elm get away with this because of their language design.

If you are worried about having extra fields in the AST, We may be able to make it optional via features if we make the AstNodeId non-clonable I guess it would result in the cell needing a mutable reference to get the node_id without swapping it and it would limit its scope to the VisitMut trait which would result in it not being necessary for basic semantic analysis and tree traversal. I'm not really into having this field as a feature but it shouldn't be a problem implementing either.

rzvxa · 2024-03-26T19:38:47Z

In some cases we can intercept the parent using a visit on a higher node, for example instead of detecting a function parameter and then getting its parent function node, We can detect functions and search for a viable function parameter. I strongly believe that these cases should be written/ported from Babel in a manner that keeps the data flow local, We should go for using semantics information if it is absolutely necessary however it doesn't mean that we won't need it to implement some of these cases.

Edit

In other situations, it may result in cleaner code which could be favored over data locality.

rzvxa · 2024-03-26T21:17:09Z

Well if we go for this change it can be simplified into something more akin to this:

impl<'a> VisitMut<'a> for ArrowFunctions<'a> {
    fn visit_jsx_identifier(&mut self, ident: &mut JSXIdentifier<'a>) {
        let parent = self.ctx.semantics().nodes().parent(ident.ast_node_id());
        let parent_kind = parent.kind();
        let parent_parent_kind = self.ctx.semantics().nodes().parent_kind(parent.ast_node_id());
        // ...
        walk_jsx_identifier_mut(self, ident);
    }
}

Just want to point out an interesting approach, If we implement the ast_node_id method via the AstNode trait we can have methods and functions like parent take advantage of generics which makes a nicer interface.

impl<'a> VisitMut<'a> for ArrowFunctions<'a> {
    fn visit_jsx_identifier(&mut self, ident: &mut JSXIdentifier<'a>) {
        let parent = self.ctx.semantics().nodes().parent(ident);
        let parent_kind = parent.kind();
        let parent_parent_kind = self.ctx.semantics().nodes().parent_kind(parent);
        // ...
        walk_jsx_identifier_mut(self, ident);
    }
}

The AstNode trait can also be implemented for AstKind to expose the underlying id which would make them interchangeable in the context of semantics and syntax tree.

rzvxa · 2024-03-26T21:36:33Z

We can detect functions and search for a viable function parameter.

Just to make it clear, This approach is necessary if we want to replace the node instead of mutating it in place. So I believe that it makes sense to default to this when appropriate. At least for the minifier and transformers! Linters have more leeway for having declarative implementations.

Boshen · 2024-03-29T11:44:15Z

How do we deal with enum types? The current approach has enum types in the parent pointing tree, it's going to be a impractical to migrate it (I think).

rzvxa · 2024-03-29T14:25:47Z

The current approach has enum types in the parent pointing tree

Can you give an example of these enums? Several types are enum but I can't find why it would conflict with them.
Let's say we want to change the Declaration enum to use this approach.

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    FunctionDeclaration(Box<'a, Function<'a>>),
    ClassDeclaration(Box<'a, Class<'a>>),
    UsingDeclaration(Box<'a, UsingDeclaration<'a>>),

    TSTypeAliasDeclaration(Box<'a, TSTypeAliasDeclaration<'a>>),
    TSInterfaceDeclaration(Box<'a, TSInterfaceDeclaration<'a>>),
    TSEnumDeclaration(Box<'a, TSEnumDeclaration<'a>>),
    TSModuleDeclaration(Box<'a, TSModuleDeclaration<'a>>),
    TSImportEqualsDeclaration(Box<'a, TSImportEqualsDeclaration<'a>>),
}

As you can see all of these enum variants would contain a node at the end, So we can use the underlying node's ID, Here's how we can implement this trait for enums:

trait AstNode {
    fn ast_node_id(&self) -> AstNodeId;
}

impl<'a> AstNode for Declaration<'a> {
    fn ast_node_id(&self) -> AstNodeId {
        match self {
                VariableDeclaration(node) => node.ast_node_id(),
                FunctionDeclaration(node) => node.ast_node_id(),
                ClassDeclaration(node) => node.ast_node_id(),
                UsingDeclaration(node) => node.ast_node_id(),
                // ....
        }
    }
}

rzvxa · 2024-03-29T14:42:18Z

@Boshen @milesj @Dunqing If I can get a little more feedback here and a 2 for 3 👍I'll start working on a prototype PR so we can find edge cases if any.

rzvxa · 2024-03-29T15:25:53Z

@Boshen

/// Array Expression Element
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum ArrayExpressionElement<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>),
    Expression(Expression<'a>),
    /// Array hole for sparse arrays
    /// <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Trailing_commas#arrays>
    // Serialize as `null`. `serialize_elision` in serialize.rs.
    #[cfg_attr(
        feature = "serialize",
        serde(serialize_with = "ArrayExpressionElement::serialize_elision")
    )]
    Elision(Span),
}

The Elision variant here is the only example I found which needs refactoring; Since Span isn't an AstNode and shouldn't be we have to add an ElisionElement similar to SpreadElement and RestElement.

Edit:

We can also rewrite it as such:

    Elision(Span, AstNodeId),

Boshen · 2024-03-29T16:34:59Z

The Elision variant here is the only example I found which needs refactoring;

Feel free to create another issue for this change.

You may have missed the thread on discord https://discord.com/channels/1079625926024900739/1222938985224081528

I'll create an umbrella issue for progress report and a milestone to officially kick off new transformer implementation.

rzvxa · 2024-03-29T17:07:06Z

Feel free to create another issue for this change.

I think it is a part of this issue and just something we have to do in order to get this working. It won't have any concerning effect on the other parts, We almost always ignore elision elements. It is just there to fill a space in the array.

You may have missed the thread on discord https://discord.com/channels/1079625926024900739/1222938985224081528

Yes, I saw this after commenting here, I don't have a habit of checking my discord servers(I still have notifications for directs).

I guess It's decided then, I'll start by creating a PR for this change. As soon as it gets to a working state I'll mention you guys for feedback.

Boshen · 2024-03-29T17:44:49Z

FYI I created the AST builder back in the day envisioning that we are going to add node ids ... I think this is the day. Good luck!

rzvxa · 2024-05-11T09:26:37Z

@overlookmotel @Boshen I still think there is a use for this ID, If we have the node ID along with our nodes we can traverse through things like scopes with no issue(both in linters and transformers). It makes rules like #2637 simpler(since we can rely on scope data without having to do a lot of iterations).

Let me know what you think, Since we are doing a lot of changes on AST maybe it is a good time to also sneak this in.

overlookmotel · 2024-05-11T10:40:11Z

I do not have a clear idea at present of best way to implement scope binding lookups in transformers. You may be right that an AST node ID is the best way. On the other hand, we may be able to find a more efficient way using the kind of "pointer tricks" that Traverse is using.

So... er... don't know!

But I've made a comment on #2861 to not delete the branch in case we want to come back to this. Could you also please make sure you keep a local copy of the branch too, just in case?

rzvxa · 2024-05-11T10:53:19Z

I'll fork the branch; Graphite might cause me to delete the local branch by accident.

Boshen added this to the Transformer Milestone 1 milestone Mar 29, 2024

This was referenced Mar 30, 2024

feat(ast): add AstNode trait and provide access to AstNodeId #2861

Closed

refactor(ast): rework enum of structures #2876

Closed

rzvxa mentioned this issue Apr 19, 2024

feat(transformer/react): correct import binding names #3014

Closed

Boshen closed this as not planned Won't fix, can't repro, duplicate, stale May 1, 2024

rzvxa mentioned this issue Jun 25, 2024

feat(ast)!: generate ast_builder.rs. #3890

Merged

rzvxa mentioned this issue Jul 11, 2024

AST Node ID #4188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ast_node_id` field to all ast nodes #2818

Add `ast_node_id` field to all ast nodes #2818

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024 •

edited

Loading

Boshen commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 26, 2024 •

edited

Loading

Boshen commented Mar 29, 2024

rzvxa commented Mar 29, 2024 •

edited

Loading

rzvxa commented Mar 29, 2024 •

edited

Loading

rzvxa commented Mar 29, 2024 •

edited

Loading

Boshen commented Mar 29, 2024

rzvxa commented Mar 29, 2024

Boshen commented Mar 29, 2024

rzvxa commented May 11, 2024 •

edited

Loading

overlookmotel commented May 11, 2024

rzvxa commented May 11, 2024

Add ast_node_id field to all ast nodes #2818

Add ast_node_id field to all ast nodes #2818

Comments

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024 • edited Loading

Boshen commented Mar 26, 2024 • edited Loading

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024

rzvxa commented Mar 26, 2024 • edited Loading

Edit

rzvxa commented Mar 26, 2024 • edited Loading

rzvxa commented Mar 26, 2024 • edited Loading

Boshen commented Mar 29, 2024

rzvxa commented Mar 29, 2024 • edited Loading

rzvxa commented Mar 29, 2024 • edited Loading

rzvxa commented Mar 29, 2024 • edited Loading

Edit:

Boshen commented Mar 29, 2024

rzvxa commented Mar 29, 2024

Boshen commented Mar 29, 2024

rzvxa commented May 11, 2024 • edited Loading

overlookmotel commented May 11, 2024

rzvxa commented May 11, 2024

Add `ast_node_id` field to all ast nodes #2818

Add `ast_node_id` field to all ast nodes #2818

rzvxa commented Mar 26, 2024 •

edited

Loading

Boshen commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 26, 2024 •

edited

Loading

rzvxa commented Mar 29, 2024 •

edited

Loading

rzvxa commented Mar 29, 2024 •

edited

Loading

rzvxa commented Mar 29, 2024 •

edited

Loading

rzvxa commented May 11, 2024 •

edited

Loading