Faster passing ASTs from Rust to JS #2409

overlookmotel · 2024-02-14T00:51:57Z

Currently OXC's parser is extremely fast, but using it from NodeJS is not. The primary cause is the overhead of the JS/Rust boundary - specifically serializing/deserializing large AST structures, in order to pass them between the two "worlds".

Right now, it's not a problem, as OXC is mainly consumed as a Rust lib. However, I suspect that as OXC's transformer, linter, and minifier are built out and gain popularity, this may become a bottleneck, because people will be asking for a way to write transformer/linter/etc plugins in JavaScript, and the performance will not be up to their expectations.

Currently OXC uses JSON as the serialization format. There's a POC implementation using Flexbuffers, which I imagine is much faster.

However, I believe that OXC is uniquely placed to go one better, and cut the overhead of serialization/deserialization practically to zero - in a way that no other current tool that I'm aware of will be able to match.

Apologies in advance this is going to be a long one...

Background: Why I think this is important

JavaScript as we know it today is the result of a great spurt of innovation over the past decade (particularly around ES6). Babel was pivotal in that process. Many of the new language features (e.g. array destructuring) are essentially syntax sugar, and a working implementation as a Babel plugin became both a requirement of the TC39 process, and an important part of the process of developing and refining features - allowing people to test them out and suggest improvements etc.

At this point, the trend towards tooling written in native languages like Rust is irreversible. This is great for DX. However, it does have the unfortunate side effect of making those tools less accessible to JavaScript developers who only "speak" JS. And of course it's JS programmers who are most familiar with the language, most aware of what its rough edges are, and most motivated to play a role in improving the language.

I believe that to enable the continued evolution of JS, it's important to ensure that, as Babel fades into the distance, the new crop of tools replacing it also fulfil the role Babel has played up until now, allowing JS developers to prototype new language features in the language they know best - JavaScript.

Therefore I feel it's important that transformer plugins written in JS continue to be a thing.

More "selfishly", from the point of view of OXC, I think there is also a real opportunity here. Most people's needs will be mostly met by the most common plugins which OXC will offer as standard, implemented in Rust.

However, I would bet that there's a very long tail of projects/companies who rely on at least one less popular Babel/ESLint plugin, and are therefore currently blocked from migrating from Babel/ESLint to OXC/SWC/etc. This is likely a major pain point for them.

Pursuing a goal of satisfying every developer's needs by re-implementing every plugin that has any user base would be an immense maintenance burden. And many companies/developers will not have the capability to do it themselves in Rust. If OXC can offer a solution for plugins in JS, and unlock their path to much faster builds, it could be a significant driver to adoption.

How to do it?

I attempted to tackle exactly this problem on SWC a couple of years ago swc-project/swc#2175.

My first prototype using rkyv as serializer did show solid performance gains vs JSON - around 4x. I had the beginnings of a 2nd version which was way faster again, based on a much faster serializer. But performance was still in roughly same ballpark as Babel, rather than the order of magnitude improvement I was hoping for.

I came to the conclusion that only way to achieve that kind of improvement was to remove serialization from the equation entirely, and this could only be achieved by using an arena allocator. It became clear that SWC's maintainers did not feel JS plugins were a priority, and so would not consider that kind of fundamental re-architecting of the project to support it. So I abandoned the effort.

OXC, of course, already has an arena allocator at its core, so the largest problem is already solved.

How to destroy the overhead

It's really simple.

The requirements of a serialization format are that it must be reasonably space-efficient, and well-specified. Such a format already exists in OXC - the native Rust types for AST nodes.

So don't serialize at all!

OXC stores the entire AST in an arena. Rust can transfer the arena allocator's memory blocks via napi-rs to JavaScript, where it becomes NodeJS Buffer objects. This transfer is just passing pointers, involves no memory copying, and the overhead is close to zero.

On the JS side, you need a deserializer which understands the memory layout of the Rust types. This is the tricky part, but the deserializer code can be generated from a schema, or even from analysis of the type layouts within Rust itself (layout_inspect is a prototype of the latter approach).

(side note: TS type defs can also be auto-generated at same time)

From my experiments on SWC, the JS deserializer can be surprisingly performant (see graph here). Deserializing on JS side was twice as fast as Rust-side serialization with rkyv. I suspect that because the deserializer code is so simple and completely monomorphic, V8 is able to optimize it very effectively.

It's also possible to do the same in reverse. JS passes Buffers back to Rust, you reconstruct the arena, and just cast a pointer back to a &mut Program. Again, this is only possible because of the arena, and because all the AST node types are non-drop.

Complications

Enabling this would require some changes to OXC's internals, some of which are a bit annoying. So there are some trade-offs, and it might only be workable if the project feels it's appropriate to make JS plugins a "first class citizen" of OXC.

Stable type layouts

All AST node types would need to be #[repr(C)] to ensure a stable layout. That's not a big deal in itself, I think, but the annoyance would be that e.g. bool fields would need to move to the last fields of types, to avoid excess padding.
All AST enums would likely need to be #[repr(u8)] with explicit discriminators.
Maybe there'd be a problem maintaining the niche optimization for Options, as deserializer needs to know the niche value for None, which Rust does not expose (I say "maybe" as I can see potential solutions to that).

These annoyances could be largely negated by using proc macros, but at the cost of increased compile times (not sure to what degree).

Strings

2 problems here:

All the data for the AST must be in the arena, or part of the source text, so JS can access it. This imposes some constraints on what you can put in an Atom.
Decoding strings from UTF-8 is the most costly part of the JS deserializer. Each decode involves a call across the JS/native boundary, which is a major slow down. So by far the most efficient way to handle it is to ensure all strings are stored together in one buffer, decode the whole lot in one go, and then slice up the resulting JS string to get each individual string. The allocator would probably need a separate StringStore arena. NB: This does not apply to strings which are already in the source text, as JS has that as a string already.

I don't think either of these are a big problem in the parser, but maybe they are in transformer or minifier?

Pointers

Box and Vec contain 64-bit pointers. On JS side, the deserializer needs to be able to convert a pointer to an offset in a Buffer, but JS does not have a u64 type. A further complication is that the arena is composed of multiple buffers.

This is doable without any changes to OXC's allocator. But to make it really fast might require a new arena allocator implementation which e.g. aligns buffers on 4 GiB memory boundaries, so only the bottom 32 bits of memory addresses are relevant. Or to have a 2nd allocator implementation which uses a WebAssembly.Memory as the backing storage for the arena. WASM Memory in V8 already has the 4 GiB alignment property, and can be extended dynamically up to 4 GiB without memory copies, so entire arena could be a single buffer.

In my opinion, replacing bumpalo could be a gain in itself anyway, as I don't think it's quite as optimized as it could be for OXC's types. But obviously that's significant work.

Further optimizations

Lazy deserialization

The above assumes that the entire AST needs to be deserialized on JS side. But in most cases, a plugin only cares about a few AST node types, which will comprise a small subset of the entire AST. Lazy deserialization could reduce the overhead of deserialization to only the parts of the AST which are actually needed.

Updating the AST

A transformer visitor on JS side could make whatever changes it wants to the AST by directly mutating the data in the buffer. No need to convert to JS Objects and then serialize it all back to a buffer. The user-facing API would hide this behind a "facade" of AST node objects with getters/setters, or Proxys.

This would be difficult to make work without breaking Rust's aliasing rules, as JavaScript allows shared mutable references. And the JS code writing to the buffer would essentially be fiddling with bytes in Rust's memory, so would need to be absolutely bullet-proof to ensure no UB.

This would be a real challenge, but the reward would be extreme speed. JS plugins will never be as fast as native, but my guess is that this could get them at least in the same ball park.

I would not propose that this be part of the v1 implementation, but the potential is I think worth considering when weighing up whether this effort overall is worthwhile or not.

WASM traverser

The number-crunching of following pointers and traversing the AST could be performed in WASM, with WASM returning control back to JS when it's found the next node the visitor wants. WASM is faster than JS, but crossing the JS/WASM boundary can in some circumstances be very low cost.

Conclusion

In my personal opinion:

This could be a very performant solution to a common need.
This feature could be an opportunity for OXC to differentiate it from other JS tools. Because most other tools don't use arena allocators, they could not do this even if they wanted to.
I believe everything I've outlined above is technically achievable.
But there are significant challenges, and it would be a large effort.
Implementation could proceed in incremental steps. A working first version would only require a subset of the above.
In a few cases, some trade-offs with OXC's other aims might be required.

My questions are:

Do you see any potential in this?
If there are trade-offs required, would they be worth it?

Hopefully it goes without saying that if you are willing to consider something along these lines, I would be keen to work on it.

One last thing: I'm not sure if there's currently a solution for linter plugins on the table, but if not, perhaps this could be it?

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-02-14T00:54:01Z

By the way, I've opened this as an issue rather than discussion as issues have higher visibility, and I'd be really keen to get feedback from the community. But Boshen if you feel it'd be better as discussion, please feel free to move it across.

ArnaudBarre · 2024-02-14T19:49:44Z

Thanks for taking time to investigate this! I personally think that JS interop is key to the success of tooling, so that the long tail of entreprise use case can craft their custom format & lint rules by just dropping a few lines of TS into a config. (which is for me one of the biggest reason Vite is preferred over Parcel).

This weekend I tried to plug OXC's parser into Prettier. The main blocker for me is the missing support for comments in the AST but the performance was already very noticeable. I think that in a first time, having full support of the AST so that people can build custom formatter, linter, bundler or plug in into existing tools would be really great for the adoption and the community.

I know that the only way to go fast is to own the full stack, but I personally think that speeding up tools like ESLint or Prettier by 3 is already a big deal and a more manageable scope in the short term.

overlookmotel · 2024-02-14T20:13:23Z

Thanks for going through this lengthy post and giving your thoughts @ArnaudBarre.

Good point! I had not considered the use case of other tools using OXC's parser stand-alone.

Making the NodeJS interface to OXC's parser faster would certainly have to be the first step in this process (though also not without its challenges). But it's nice to know you think benefits would start to become visible from that first stage, even before the next step of implementing a JS plugin framework for OXC's linter/transformer.

Boshen · 2024-02-16T14:03:36Z

I thought I saw your name somewhere before but never recalled until you mentioned that swc PR.

I had numerous discussions with different people and we all concluded that AST transfer is in a dead end because of the conclusion of that PR.

Let me think about this for a bit before answering all the questions.

overlookmotel · 2024-02-16T22:15:22Z

Thanks for coming back Boshen, and thanks for reading through my essay. I wrote way too long!

Wow I never had any idea anyone really noticed that SWC issue, let alone discussed it.

Yes, it was a disappointing conclusion. Personally, I felt the main thing was it was a bad fit with SWC's priorities - they were firmly committed to WASM plugins - and I was disheartened that after a lot of work, it was clearly going nowhere. But, personally, it felt more like "wrong place, wrong time" than that the concept had been proved unviable in principle. So I chalked it up as "R&D".

Of course, think about it as much or as little as you like. Obviously I'm keen, but I'm also aware there are complications, and it may not be the best path forwards. I would appreciate your thoughts when you have time.

The only point I'd like to make is that there's one fundamental difference between SWC and OXC, which completely unlocks the problem: arenas. Ultimately, the idea is not particularly novel or revolutionary - sharing state by sharing memory - but it's the arena which makes that old paradigm possible in this context.

Boshen · 2024-02-18T06:00:14Z

Here are my requirements after some research and thoughts:

Change UTF-8 to UTF-16 span positions
Serialize to estree for maximum ecosystem compatibility
Target a valuable use case to maximize the cost-benefit ratio of this feature task

overlookmotel · 2024-02-19T11:50:26Z

Thanks for coming back Boshen. Some questions:

Change UTF-8 to UTF-16 span positions

Do you mean everywhere? (i.e. start and end fields of the Span type in Rust become UTF-16 positions) Or just in the JavaScript version of AST?

I'm aware the parser currently relies on start and end being UTF-8 positions to e.g. slice strings from the source code &source[start..end]. But we could find another mechanism for that using SourcePositions (which would also be a little more performant).

But I don't know if linter/transformer/minifier also rely on Span positions being UTF-8?

Serialize to estree for maximum ecosystem compatibility

As far as I'm aware, the differences between OXC's AST and estree are quite minimal, so this should be doable without translation being costly. Is there a list somewhere of the differences? (I thought I saw one, but now can't find it)

Target a valuable use case to maximize the cost-benefit ratio of this feature task

The first step would need to be speeding up the NodeJS parser interface (oxc-parser). Same should be possible for @oxc-parser/wasm without too much difficulty.

But after that, what do you think should be highest priority?

An AST visitor in JS with lazy deserialization which offers only a read-only interface to the AST would be much easier to implement than one in which the AST can be mutated. I assume that'd be sufficient for linter plugins?

ArnaudBarre · 2024-02-19T15:53:10Z

@overlookmotel I've added you to my WIP to explore OXC as a prettier parser. You can look at it to see that the number of difference with TSESTree start to be non trivial (I've not yet finished the mapping)

overlookmotel · 2024-02-19T17:10:36Z

@ArnaudBarre Thanks for sharing.

Doesn't look too bad. Some transformations are annoying (Typescript mostly), but in many cases, OXC's JSON output could be aligned with ESTree just by using #[serde(rename)] etc. And presumably can import all the tests from another ESTree implementation (e.g. Acorn), rather than writing a ton of tests from scratch.

I'd suggest best way to go about this would be to first get the current JSON output to be ESTree-compatible, and then work from there, replacing serde with "raw" transfer. Having a set of tests which already pass would make it much easier to catch any faults.

How complete is your translation implementation? Aside from the "TODO" comments, do you think there are a lot more differences still to be found?

Boshen · 2024-02-20T07:57:03Z

But after that, what do you think should be highest priority?

After some consideration, let's put a milestone on prettier-oxc-parser. I don't have the time and energy to work on oxc_prettier, so speeding up prettier will bring its own values.

Let's gather up the requirements and a todo list after tonight.

ArnaudBarre · 2024-02-20T08:48:33Z

I think there are still differences to be discovered, the one todo is where I was at when looking at AST node one by one (by manually comparing the types). I think I can finish this tonight!

ArnaudBarre · 2024-02-21T01:48:14Z

I've pushed a new commits with some new diffs. I've tried running the parser on the typescript node_module folder and it hangs /lib/lib.dom.d.ts, I will investigate tomorrow!

matthew-dean · 2024-03-06T16:58:55Z

Just a thought - I think this is very important research and feature, but I would also add that it might be extremely valuable as a general-purpose strategy for any projects communicating between Rust and JS. (e.g. creating Rust-based tooling that allows JS plugins) so there may be some additional community value in documenting an independent package that efficiently does this conversion.

overlookmotel · 2024-03-06T21:18:23Z

Thanks for your thoughts @matthew-dean. If it works, I agree it could have wider applications beyond OXC. For now, we're only at an early stage, and I think best to focus on trying to make it work within OXC. But, yes, I'd be keen to share whatever findings come up in that process further down the line.

@inherit

OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), /* ...other Declaration variants... */ } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, /* ...other Statement variants... */ VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as #2847, but here applied to *all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (#2987). 2. It would help a lot with AST Transfer (#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as *pure data* (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not *required* to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is *fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```

@inherit

OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (oxc-project#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), /* ...other Declaration variants... */ } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, /* ...other Statement variants... */ VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as oxc-project#2847, but here applied to *all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (oxc-project#2987). 2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as *pure data* (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (oxc-project#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not *required* to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is *fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```

thernstig · 2024-06-26T20:51:12Z

I digress slightly here, but I would recon that if this turns out to be achievable, Ruff would be very interested in how it was achieved. Allowing Python devs to write custom lint rules in Python.

kylecarbs · 2025-02-05T00:14:53Z

@overlookmotel how feasible is it for me to use parse_sync_buffer from WASM? Presumably it would be much faster, just with lots of TypeScript on top?

overlookmotel · 2025-02-05T18:40:07Z

@kylecarbs parse_sync_buffer was an early experiment, and will likely be removed quite soon. For various reasons, we may also need to remove serde support, and replace it with a custom serializer designed only for JSON. serde has some restrictions which make it a difficult fit for our primary usecase - serializing to JSON (that last part is TBC, though).

So unfortunately, no, I would not recommend using parse_sync_buffer.

But... it is my intention to produce a "WASM interop" AST. If your WASM module is written in Rust, then this will make getting the AST from native Rust into Rust-based WASM almost zero cost - much faster than flexbuffers (which parse_sync_buffer uses).

If you're compiling to WASM from a language other than Rust, I'm afraid this won't help. Though we could in future enable that via some other method.

What's your use case? I'm working on this area at the moment, and it'd be useful to have an idea what needs people have in this department.

kylecarbs · 2025-02-05T19:11:15Z

I'm essentially just trying to parse an AST and replace some modifiers.

e.g. dynamic import transforms import to kylesMagicImport

I suppose I should just do this in Rust instead and pass the source file back, as it'll be much faster.

overlookmotel · 2025-02-06T03:33:56Z

What language were you compiling to WASM from? If Rust, then yes, just doing it in native Rust will be faster. Regardless of serializing and whatnot, native code generally runs a bit faster than WASM anyway.

kylecarbs · 2025-02-06T21:50:26Z

I was just using the WASM package itself with JS, but now I've opted to write it all in Rust, and it's of course much faster.

overlookmotel mentioned this issue Feb 19, 2024

chore(deps): update bumpalo crate #2417

Merged

overlookmotel mentioned this issue Feb 20, 2024

AST transfer WIP #2457

Draft

This was referenced Feb 27, 2024

Proper string type for rolldown rolldown/rolldown#427

Closed

Serialize identifiers to ESTree #2521

Merged

Boshen added the A-ast Area - AST label Mar 10, 2024

This was referenced Mar 10, 2024

Change shape of Language #2677

Closed

Combine Statement and Declaration types #2685

Closed

Ability to create Bump from pre-existing memory allocation? fitzgen/bumpalo#237

Closed

overlookmotel mentioned this issue Mar 19, 2024

feat(parser): serialize to estree #2463

Closed

overlookmotel mentioned this issue Apr 27, 2024

refactor(ast): squash nested enums #3115

Merged

Boshen mentioned this issue Jul 3, 2024

[napi] Give access to ast #4046

Closed

overlookmotel mentioned this issue Jul 16, 2024

AST transfer milestone 1 #4294

Open

7 tasks

overlookmotel mentioned this issue Oct 8, 2024

Implement serde::Serialize on AST types via #[generate_derive] #6347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster passing ASTs from Rust to JS #2409

Faster passing ASTs from Rust to JS #2409

overlookmotel commented Feb 14, 2024 •

edited

Loading

overlookmotel commented Feb 14, 2024

ArnaudBarre commented Feb 14, 2024

overlookmotel commented Feb 14, 2024 •

edited

Loading

Boshen commented Feb 16, 2024

overlookmotel commented Feb 16, 2024 •

edited

Loading

Boshen commented Feb 18, 2024 •

edited

Loading

overlookmotel commented Feb 19, 2024 •

edited

Loading

ArnaudBarre commented Feb 19, 2024

overlookmotel commented Feb 19, 2024 •

edited

Loading

Boshen commented Feb 20, 2024

ArnaudBarre commented Feb 20, 2024

ArnaudBarre commented Feb 21, 2024

matthew-dean commented Mar 6, 2024

overlookmotel commented Mar 6, 2024

thernstig commented Jun 26, 2024

kylecarbs commented Feb 5, 2025

overlookmotel commented Feb 5, 2025 •

edited

Loading

kylecarbs commented Feb 5, 2025

overlookmotel commented Feb 6, 2025

kylecarbs commented Feb 6, 2025

Faster passing ASTs from Rust to JS #2409

Faster passing ASTs from Rust to JS #2409

Comments

overlookmotel commented Feb 14, 2024 • edited Loading

Background: Why I think this is important

How to do it?

How to destroy the overhead

Complications

Stable type layouts

Strings

Pointers

Further optimizations

Lazy deserialization

Updating the AST

WASM traverser

Conclusion

overlookmotel commented Feb 14, 2024

ArnaudBarre commented Feb 14, 2024

overlookmotel commented Feb 14, 2024 • edited Loading

Boshen commented Feb 16, 2024

overlookmotel commented Feb 16, 2024 • edited Loading

Boshen commented Feb 18, 2024 • edited Loading

overlookmotel commented Feb 19, 2024 • edited Loading

ArnaudBarre commented Feb 19, 2024

overlookmotel commented Feb 19, 2024 • edited Loading

Boshen commented Feb 20, 2024

ArnaudBarre commented Feb 20, 2024

ArnaudBarre commented Feb 21, 2024

matthew-dean commented Mar 6, 2024

overlookmotel commented Mar 6, 2024

thernstig commented Jun 26, 2024

kylecarbs commented Feb 5, 2025

overlookmotel commented Feb 5, 2025 • edited Loading

kylecarbs commented Feb 5, 2025

overlookmotel commented Feb 6, 2025

kylecarbs commented Feb 6, 2025

overlookmotel commented Feb 14, 2024 •

edited

Loading

overlookmotel commented Feb 14, 2024 •

edited

Loading

overlookmotel commented Feb 16, 2024 •

edited

Loading

Boshen commented Feb 18, 2024 •

edited

Loading

overlookmotel commented Feb 19, 2024 •

edited

Loading

overlookmotel commented Feb 19, 2024 •

edited

Loading

overlookmotel commented Feb 5, 2025 •

edited

Loading