-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Comment attachment #3
Comments
This can also happen with whitespace: {
statement1;
WHOA();
statement2;
} might become: {
statement0;
statement1;
WHOA();
statement2;
} |
Another nasty forumlation: {
fix1();
fix2();
// Everything is fixed!
} And perhaps now only one fix is required, but removing the second call invalidates the part of the tokens array that holds the second comment. {
fix1();
} |
It should be noted that we already provide the best remedy possible to any of these issues: the individual writing the codemod can tell us what they meant by updating the tokens array themselves. We could trivially offer helpers for tasks like appending or prepending into an array-valued property's array while also appending or prepending a new reference into tokens to ensure that none of the off-by-ones occur. We'd probably need to fork some common helpers like |
I am most concerned about shifting comments around. People are going to want their comments to stay attached to things. They will say that the output isn't just weird when the comments shift, it's WRONG. ...and I don't want to destroy one of the best characteristics of my solution and of recast -- that it allows you to manipulate the AST without worrying about formatting except where strictly necessary. I think the only possible "magical" solution is to have the reference tokens have some way of retaining information about exactly which node they were referencing. Then you could go through and basically do a diff in order to figure out where references were added, removed, or reordered. You could then use that information to avoid entering fallback even when the tokens array doesn't quite match the current state of the AST. I'm against including the reference itself in the reference token object in such a way that it would be accessible to users. I think the best way to encode the data would be as diff(
node.tokens.filter(token => token.type === 'Reference' && value === 'specifiers').map(ref => nodesByReferenceToken.get(ref)),
node.specifiers
) |
I think this is pretty necessary to real users, ands significant value, and is an ideal kind of logic to design as part of the prototyping phase. Let's see if I can do it! |
Thinking about this some more. What if I do something different? What if I always push optional comments and whitespace downwards instead of upwards? For example when the code is BUT. We want to be careful. If all whitespace belongs to children, this will also not be good. For example when a transform swaps |
Ah, but I have been thinking specifically about prettier and its object formatting. I'd essentially never want to split up any of these patterns:
|
I think if I can do that -- push everything but those patterns into children when possible -- the result would be excellent in most common cases without much of any additional work |
A case to consider: o = {
foo, // foo
bar, // bar
} This is unpleasant to deal with because those comments can't possibly be tokens for the |
I could store the illegal structure in my CST. For reasons that should be obvious you can't work with this directly as text, but you can store it and use it as an intermediate step:
|
I don't think I can endorse that. The data structure needs to be interoperable. With luck it could even become part of the ESTree spec. Storing tokens out of order like that and demanding that everyone else handle it would seriously damage the value of the whole structure. |
We could ensure that those line comment tokens are not pushed down, which leaves us to build both pushdown and diffing. Could be better, could be worse I guess. |
We could just convert them from line to block comments on the fly... o = {
foo /* foo*/,
bar /* bar*/,
} I don't think people would want to use the tool unless we somehow knew to convert them back before printing. |
Another similar but less-common case:
I think such representing such relationships is technically out-of-scope for a syntax tree, but would support a convention in which they appear as extension data, ideally with JSON interchange compatibility (e.g., Footnotes
|
I think instead of extension data I'm more likely to offer some CST-aware APIs for common operations like The good news is this means I can make the the calls configurable, like |
OK I've given this more thought. First. A line comment on its own line as part of a statement list is ambiguous. It could be about the next line as with: // describe conditionA
if (conditionA) {
// describe conditionB
} else if (conditionB) {
} or it could be about the state of the program at some moment in time as with transitionA(state);
transitionB(state);
// state is now C So there's definitely nothing I can do with that, and the best thing is to think of it like it's its own statement, like an assert. There will be some weird results sometimes when comments that should've been attached are treated as independent, but I'm not creating that problem, I just can't solve it here. It may still be possible to provide excellent tools to identify locations in the diffs generated by codemods which could point out places where statements next to line comments were removed. |
For initialization contexts like import specifiers, parameter definitions, and object and array literals the fundamental ambiguity is whether a line comment is about one line, or a group of lines that follows. E.g. o = {
// a
a,
// b
b,
} vs o = {
// a's
a1,
a2,
// b's
b1,
b2,
} Again I think the only sane thing to do is to consider the line comment to be part of the parent. We risk leaving comments which describe something that no longer exists, but as I still think it's fine to say that that's just an issue that would have to be caught in code review. |
Also I think I was wrong to say that we can have the transforming code handle it. For there to be a useful ecosystem of transforms it will need to be possible for writers of transforms to describe changes to codebases whose formatting they will not know anything about. |
When line comments are on the same line as a statement, I think it is reasonable to consider those comments to be attached to that statement, even if there is an intervening comma. I guess I think I'm leaning towards just pushing them into the tree out of order, even though I said I thought it would be bad. The fact is, this tree is going to be messy. It's going to contain things that may be invalid or don't quite make sense. For example it will not be uncommon for an altered tree to contain line comment tokens which are not followed by any newline token. Printing the tree is never going to be as simple as just concatenating the tokens, so maybe it's best to embrace that. If I attach line comment tokens to the wrong side of the comma, I can work it out at print time. Then the primary difficulty is just in offering some APIs that help you query a "normalized" tree structure when in fact what you have is a non-normal tree (one with inconsistencies). |
I think it's time to design the traversal API, which is the only API that will be able to answer queries about lines since it will have be able to have context about parent and thus about sibling tokens. Its |
Maybe I just have control inverted. I've discussed two main strategies, diffing and pushdown. The library definitely needs to implement diffing. It's crucial to get right in order to support the most basic and common operations. Pushdown though... Comments are just pretty fundamentally ambiguous. It's pretty clear that sometimes it's important for certain comments to remain stuck to certain nodes, but it's not always clear what those cases are. The best examples I can think of are JSDoc comments. I'm pretty sure that in almost every case it makes sense for JSDoc comments to be pushed down into the nodes they describe, and yet I don't really want to design my library such that it knows about JSDoc. No, I think the thing to do is to make pushdown one of the fundamental types of operations this library supports doing on the tree. Since our intention is to treat trees as mutable, we can just return a CST, then the holder of that tree can call back in triggering the pushdown rules they wish to trigger, e.g.: import { rebuildTokens, pushTokensDown, print } from 'cst-tokens';
import JSDocPlugin from '@cst-tokens/plugin-pushdown-jsdoc';
const cst = /* ... */
rebuildTokens(cst);
pushTokensDown(cst, [JSDocPlugin]);
transform(cst); This ensures that we don't churn the library core on formatting rules, it allows each project to specify the pushdowns that actually make sense for the way their code is written, any configuration is possible, and the transforms themselves don't need to know. If a transform needs to remove a node, it just removes it. If the environment had pushed comment tokens into that node, then those tokens are gone! Everyone gets what they want, and I especially get what I want in not having all of it be my problem! |
All in all I am hoping it would be possible to write something like this: import { readFile, writeFile } from 'fs/promises';
import { rebuildTokens, pushTokensDown, print } from 'cst-tokens';
import JSDocPlugin from '@cst-tokens/plugin-pushdown-jsdoc';
import TypeCommentPlugin from '@cst-tokens/plugin-pushdown-type-comment';
import PureCommentPlugin from '@cst-tokens/plugin-pushdown-pure-comment';
import prettier from 'prettier/cst'; // wishful thinking
import eslint from 'eslint/cst'; // more wishful thinking
async function codemod(path, transform) {
const cst = someParser.parse(await readFile(path, 'utf-8'));
rebuildTokens(cst);
pushTokensDown(cst, [JSDocPlugin, TypeCommentPlugin, PureCommentPlugin]);
transform(cst);
eslint(cst, { fix: true, quiet: true });
prettier(cst);
await writeFile(path, print(cst));
} |
There's some interesting discussion of the issues here: rome/tools#2768 |
OK OK OK I can do it! Pushdown comment attachment will still be the only way to keep comments attached during arbitrary moves. In the specific instance of adding, removing, and reordering items in an array however I should be able to keep comments and whitespace attached without needing any extra configuration, which is ideal. I can do this because the current internal return from traverse is a match tree, currently represented as This is perfect since it captures the original contents of the array in the match tree so that they can be retrieved like this: matchNode.cstToken
.filter(tok => tok.type === 'Reference' && tok.value === name)
.map(matchNode => matchNode.node) This works due to our fundamental assumption that arrays of elements are in source order -- an assumption that already exists due to the reference-counting nature of Once we have the current array stored in The thing is, sometimes the rules are a bit wonky. Prettier for example changes call(
arg1, /* comment */
arg2,
) into call(
arg1 /* comment */,
arg2,
) This implies that to avoid breaking that behavior we'll want a prettier pushdown plugin which applies its logic about line breaks being more important than element separators in comment attachment. |
|
I've shifted tacks once again. The more I got into the mechanics, the clearer it became that diffing was going to be insane. There are just too many situations in which it's not clear what should happen to comments and whitespace. Since my goal is to eliminate insanity, I've eliminated diffing. Instead I am changing the core premise of the library. Instead of working with malformed CSTs, the library now always validates that CSTs are well-formed. This means that the only comment attachment mechanism I need is pushdown which, conveniently, requires relatively little core architecture. |
Right now I have problems with comment attachment caused by my interchangable references. Let's say you have this code:
And you write a codemod:
The problem is that now when you go to reprint you'll end up with:
That's essentially a design feature of our references, but here it hurts us. They are too simple!
The text was updated successfully, but these errors were encountered: