Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of mutable syntax node usage #15710

Open
Veykril opened this issue Oct 4, 2023 · 13 comments
Open

Get rid of mutable syntax node usage #15710

Veykril opened this issue Oct 4, 2023 · 13 comments
Assignees
Labels
A-assists Broken Window Bugs / technical debt to be addressed immediately C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-hard

Comments

@Veykril
Copy link
Member

Veykril commented Oct 4, 2023

Somewhat meant as a means of discussion (though ultimately just expressing my opinion):

After having used and reviewed a bunch of PRs using the mutable syntax node API I am personally not too happy about it. It can be awkward to use at times and has some pitfalls that one has to be aware of. A major problem with that API is that it introduces iterator invalidation (you can iterate while mutating the tree), giving way to easily end up panicking if one does not delay the edits after iteration via collecting into a vec or similar. Likewise it makes it harder to replace our version of rowan to one that uses proper slots which is something I am in favor of, as the slot API is unlikely to have the same feature (unless we implement it back into rome's fork of rowan which would be the current slot based rowan solution).

The current benefit of it is that we have the ted api that allows automagically fixing up whitespace on edits, but that's really not necessary in the end game when we have our own syntax formatter which would allow us to format edited nodes on the fly. It is also a rather powerful api in that you can patch up syntax trees on the fly.

The main question then arises, how do we do edits if not via mutable syntax nodes? The current approach was using the handwritten make module to stitch nodes together (which is still somewhat used). But that approach is really annoying to use, inefficient and looks horrid as well. One nice looking approach would be having a quasi quote api as described in #2227, that could also do some static checks ideally (like checking well typedness of the tree at comp time maybe). With quoting one could just reconstruct the patched tree / construct a the biggest tree encompassing the changes of interested and then ask the assist building api to replace the node pointed to by a specific node ptr with the constructed tree. And alternative to mutation is #9649 though I haven't looked too much into it.

In general, assists are all over the place currently, some do text based editing, some do make api + stringified text edits and finally we have some that use the mutable node api. Obviously not ideal if we mix and match however we like to.

(Issue spawned of me really not enjoying reviewing assist PRs nowadays anymore, as they tend to be messy in general, though also because I really want us to switch to the trailing trivia model)

@Veykril Veykril added E-hard C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) A-assists labels Oct 4, 2023
@Veykril
Copy link
Member Author

Veykril commented Oct 4, 2023

cc @matklad feels like an issue you'd have an opinion about or two :)

@matklad
Copy link
Member

matklad commented Oct 5, 2023

Yeah, I wouldn’t say that mutable syntax API is a success :-)

That being said, I think it barks at the right tree: the fundamental capability there is “tracking nodes across edits”, and I think it would be important to have that in one way or another. But I am not too confident here: maybe we don’t need cursor API at all?

One curious solution here is Roslyn’s SyntaxAnnotations:

https://joshvarty.com/2015/09/18/learn-roslyn-now-part-13-keeping-track-of-syntax-nodes-with-syntax-annotations/

my understanding is that this mechanism powers all “mutable” editing APIs, which allow you to incrementally mutate the tree.

I looked at how that’s implemented under the hood, and I think it actually is pretty horrible —- there’s a global static weak map of annotated nodes (no link to source, so I might be misremembering)

I think another related issue is how our semantics API can panic if you feed it with a “wrong” Syntax Node. I was thinking that maybe the solution to both problems is biting the bullet and ensuring that every single bit of syntax has a globally unique id?

Definitely need to change something here, though I still am not 100% positive about the ideal end state :P

@Veykril
Copy link
Member Author

Veykril commented Dec 19, 2023

Another thing I just realized, this API makes it near impossible to map up edits back into macro input token trees

@Veykril Veykril added the Broken Window Bugs / technical debt to be addressed immediately label Dec 19, 2023
@DropDemBits
Copy link
Contributor

I feel like using both approaches of using SyntaxEditor and using SyntaxAnnotations could work as a replacement to the mutable syntax node approach. I mention SyntaxAnnotations because they could, I dunno, be used to figure out where to place LSP snippets 🙂

Since a SyntaxEditor would store all of the edit actions (insert, replace, delete, etc.) to transform the original tree into a final tree, we retain information on how nodes are transformed. This also means that a SyntaxEditor can propagate SyntaxAnnotations forward to the final tree based on this information, so long as the SyntaxEditor knows what nodes are tracked.

A caveat of letting SyntaxEditor propagate SyntaxAnnotations is that the make constructors will need to inform the SyntaxEditor of where to propagate SyntaxAnnotations to, otherwise we lose track of nodes. Possible approaches of resolving this could be:

  • Require passing in a SyntaxEditor to a make constructor (or potential quasi-quoting macro) and then let it add mappings from the old nodes to the new nodes. This would be fine for the make constructors, but would be a bit cumbersome for the quasi-quoting macro since you'd need to separate the SyntaxEditor input from the actual syntax to construct
  • Have the make constructors/quasi-quoting return a SyntaxResult which holds a mapping from old to new nodes and have an explicit into_syntax(&mut SyntaxEditor) which adds the mappings to SyntaxEditor.
  • Give up and use a static global concurrent map. It's certainly an option 🙃

One interesting situation to handle is when trying to change both a parent node and its children's nodes, e.g. PathTransform trying to transform

SomeTrait<Item = [u8; SOME_CONSTANT]>;

into

m1::SomeTrait<Item = [u8; m2::SOME_CONSTANT]>;

(based on #17321)

Roslyn's approach is to have a ReplaceNode that takes in a lambda to compute the replacement node. This has a downside of serializing transformation (see dotnet/roslyn#28378), but maybe partitioning the changes list by each computed replacement change may help?

Other idle thoughts:

  • Do we want to expose the most recent version of the tree? This would imply applying the transformations to a tree at some point (either after adding any change, or whenever we request for the current tree). Not sure if there's anywhere where we do need to inspect the current version of the tree, and the current semantic model will panic on any newly created nodes (see The Semantics API is prone to panics #17367)
  • Do we want to allow combining SyntaxEditor changes? Would be useful for e.g. PathTransform, since it could produce a SyntaxEditor with all of the changes instead of needing to take in the current SyntaxEditor.
  • Is tracking all node transformations enough to up-map changes back to the original macro inputs? I feel like this helps somewhat, but I'm not sure of the specifics of what is required for that.
  • Moving away from mutable syntax nodes will necessarily affect the edit_in_place APIs, since they assume that they'll refer to the same node. Passing in a SyntaxEditor will help with creating a new version of the node, but we'll still have a reference to the old node. Maybe just use the quasi-quote macro instead?

@Veykril
Copy link
Member Author

Veykril commented Jun 9, 2024

Require passing in a SyntaxEditor to a make constructor (or potential quasi-quoting macro) and then let it add mappings from the old nodes to the new nodes. This would be fine for the make constructors, but would be a bit cumbersome for the quasi-quoting macro since you'd need to separate the SyntaxEditor input from the actual syntax to construct

That seems fine to me, make functions could be exposed as methods on a SyntaxEditor I guess which would make it a bit more convenient. As for quasi quoting, sure it'll be a bit more annoying but at the same time it feels bit a like how the quote macro has an extra optional input for re-spanning the tokens. Feels like we can do it the same way, where having no extra input just creates a standalone node as well if that's useful. Overall this kind of API seems a lot nicer still than mutable syntax trees as with those you have to actually fight at times when iterating and mutating at the same time.

Have the make constructors/quasi-quoting return a SyntaxResult which holds a mapping from old to new nodes and have an explicit into_syntax(&mut SyntaxEditor) which adds the mappings to SyntaxEditor.

This sounds a lot worse usability wise than option 1 to me.

Give up and use a static global concurrent map. It's certainly an option 🙃

I don't think anything needs to be said here 😬

Do we want to expose the most recent version of the tree? This would imply applying the transformations to a tree at some point (either after adding any change, or whenever we request for the current tree). Not sure if there's anywhere where we do need to inspect the current version of the tree, and the current semantic model will panic on any newly created nodes (see
#17367)

I feel like we don't? At least I can't tell an immediate use case (we have one place where we kind of do this right now which is in adjustment inlay hints, but that is really just a hack we need to replace generally)

Do we want to allow combining SyntaxEditor changes? Would be useful for e.g. PathTransform, since it could produce a SyntaxEditor with all of the changes instead of needing to take in the current SyntaxEditor.

Feels like something that should be naturally supported.

Is tracking all node transformations enough to up-map changes back to the original macro inputs? I feel like this helps somewhat, but I'm not sure of the specifics of what is required for that.

I'd expect us to only touch real file trees here. That is if we are inside a macro call we ought to first upmap that out and then attempt any edits. Anything else gets hairy for no benefit (in my eyes at least).

@DropDemBits
Copy link
Contributor

That is if we are inside a macro call we ought to first upmap that out and then attempt any edits.

For assists like add_missing_match_arms, it'd be nice if there was a way to edit a synthetic version of the macro input for declarative macros (i.e. the "parsed" macro input) and re-serialize that rather than plain token trees, but that's another issue (do we even have an issue for this? I know it's something we maybe wanted to have).

@Veykril
Copy link
Member Author

Veykril commented Jun 9, 2024

Ah right, yes that should be in theory possible

@davidbarsky
Copy link
Contributor

I'll try and summarize my understanding of (the mostly agreed upon?) plan to accomplish this:

  • Finish @DropDemBits's branch that introduces a SyntaxEditor.
    • The remaining work on that branch is to implement SyntaxAnnotation propagation, which boils down to applying SyntaxMapping::upmap_child_element(annotated_element, first_mappable_ancestor, root) to each (SyntaxElement, SyntaxAnnotation).
    • Bee has informed me over Zulip (please correct me if I'm wrong!) that someone could finish and land their work, as they won't be able to work on this until September.
  • After that branch is landed with the SyntaxMapping mapping changes, all assists should be moved over the new SyntaxEditor API. I'm pretty confident that some assists can't be moved over to the SyntaxEditor API, so SyntaxEditor would need grow new APIs as those assists are migrated to the syntax editor API.
  • Once all assists are on the SyntaxEditor API, mutable syntax trees can be removed from Rowan and SyntaxEditor can be implemented in terms of immutable syntax trees.

To summarize a meeting with Lukas Wirth, Wilfred Hughes, David Richey, and me (notes here), removing mutable syntax trees from Rowan would allow the removal of doubly-linked lists in Rowan in favor of a more contiguous representation (e.g., a vector). I've determined through my (relatively unscientific!) benchmarking that this could make Rowan twice as fast.

@matklad
Copy link
Member

matklad commented Aug 17, 2024

Also, this whole refactor would make for a great post for the blog, if anyone is keen to write one!

@DropDemBits
Copy link
Contributor

Confirming that I'm okay with handing off the work to someone else, excited to see it land regardless

@ShoyuVanilla
Copy link
Member

I'll try the SyntaxEditor thing unless someone else is working on it

@DropDemBits
Copy link
Contributor

@ShoyuVanilla Have you had the chance to pick this up yet? I'm available to work on this again now that I've got time to do so.

@ShoyuVanilla
Copy link
Member

Not yet. And I think that it would be better if you could take this because you know this better than me 😅

@DropDemBits DropDemBits self-assigned this Sep 2, 2024
bors added a commit that referenced this issue Sep 10, 2024
internal: Add preliminary `SyntaxEditor` functionality

Related to #15710

Implements a `SyntaxEditor` interface to abstract over the details of modifying syntax trees, to both simplify creating new code fixes and code actions, as well as start on the path of getting rid of mutable syntax nodes.

`SyntaxEditor` relies on `SyntaxMappingBuilder`s to feed in the correct information to map AST nodes created by `make` constructors, as `make` constructors do not guarantee that node identity is preserved. This is to paper over the fact that `make` constructors simply re-parse text input instead of building AST nodes from the ground up and re-using the provided syntax nodes.

`SyntaxAnnotation`s are used to find where syntax elements have ended up after edits are applied. This is primarily useful for the `add_{placeholder,tabstop}` set of methods on `SourceChangeBuilder`, as that currently relies on the nodes provided being in the final syntax tree.

Eventually, the goal should be to move this into the `rowan` crate when we move away from mutable syntax nodes, but for now it'll stay in the `syntax` crate.

---

Closes #14921 as `SyntaxEditor` ensures that all replace changes are disjoint
Closes #9649 by implementing `SyntaxAnnotation`s
bors added a commit that referenced this issue Sep 18, 2024
internal: Extend SourceChangeBuilder to make make working with `SyntaxEditor`s easier

Part of #15710

Adds additional `SourceChangeBuilder` methods to make it easier to migrate assists to `SyntaxEditor`.

As `SyntaxEditor`s are composable before they're completed, each created `SyntaxEditor` can represent logical groups of changes (e.g. independently performing renames of uses in a file from inserting the new item). Once a group of changes is considered "done", `SourceChangeBuilder::add_file_edits` is used to submit a set of changes to be part of the source change.

`SyntaxAnnotation`s are used to indicate where snippets are attached to, and using `SyntaxAnnotation`s also means that we can attach snippets at any time, rather than being required to be after all edits.
bors added a commit that referenced this issue Sep 27, 2024
internal: Add `SyntaxFactory` to ease generating nodes with syntax mappings

Part of [#​15710](#15710)

Instead of requiring passing a `&mut SyntaxEditor` to every make constructor to generate mappings, we instead wrap that logic in `SyntaxFactory`, and afterwards add all the mappings to the `SyntaxEditor`.

Includes an example of using `SyntaxEditor` & `SyntaxFactory` in the `extract_variable` assist.
lnicola pushed a commit to lnicola/rust that referenced this issue Oct 8, 2024
…kril

internal: Add `SyntaxFactory` to ease generating nodes with syntax mappings

Part of [#​15710](rust-lang/rust-analyzer#15710)

Instead of requiring passing a `&mut SyntaxEditor` to every make constructor to generate mappings, we instead wrap that logic in `SyntaxFactory`, and afterwards add all the mappings to the `SyntaxEditor`.

Includes an example of using `SyntaxEditor` & `SyntaxFactory` in the `extract_variable` assist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-assists Broken Window Bugs / technical debt to be addressed immediately C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-hard
Projects
None yet
Development

No branches or pull requests

5 participants