-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TokenWrapper to be a concrete all-in-one-class instead of the token hierarchy #3354
Comments
I've also added a CodeActionToken, which is a RangeToken specifically recognized as a range relevant for code actions. But all I really need is to know (a) that this is the COdeAction information and (b) have a start and end position. The fact that a RangeToken contains two tokens is a nuisance from my point of view, especially that the actual range is from start through the end, not up to the end. -- or at least I need that as an option, and the easiest way is just to have a start and length. |
We could put a token property named CodeAction in TokenWrapper as well. It's part of the flat metadata of the token anyway, and as you states, is different from Rangetoken, in that it could contain one (or more ?) ranges. Why is it a nuisance that RangeToken contains two tokens? Would you prefer that we record StartToken and EndToken independently, and that RangeToken is just a regular token that does not contain them? |
For the purpose of a simple CodeAction, all that is needed is a range (start and length), which may not correspond to a token of the AST. It may well be that more complex code actions will need a reference to a part of the AST (e.g. a renaming or restructuring), but I have not progressed far enough to know what information will be needed for those. I suppose it might be better actually to base CodeActionToken on Token instead of RangeToken. What does RangeToken actually need both tokens for? Eventually one just extracts an LspRange anyway? |
RangeToken is actually just a way to record the start position and a length, and since we need the start token and end token of an expression for other reasons, I found it was the natural way to holding these two together. RangeTokens are used to display errors not just on tokens but on entire expressions, both on the CLI with --show-snippets and on the LSP. |
Context
Although tokens are normally output by the scanner and provided to the parser, Dafny developers found a loophole and replaced regular tokens with rich tokens, which can attach useful metadata to expression. This metadata can help identify if an expression was included in another file, was automatically generated, was refined from another expression. It also makes sense to send such rich tokens to Boogie, because Boogie attach them to errors it finds. This gives a quick way to attach more information to display meaningful errors, besides ProofObligationDescription. Here are the rich tokens Dafny is currently defining and using:
These rich tokens are used for error reporting, for example to display related ranges and messages (NestedToken), customize the error message (QuantifiedVariableDomain), or just not report an error (Includetoken)
Besides rich tokens, ProofConditionDescriptions can be provided for Boogie assertions so that, whether if fails or succeeds, one can know both their token above, and their ProofConditionDescription, to extract failure or success messages.
Cloners also have a method “Tok” meant at cloning tokens and can wrap tokens as well, add refinement information.
Problem description
Problems arise when you consider the canonical nesting of tokens, for example,
There is a bigger problem now because of #3290
If we followed the pattern above, we should just add a “format token” that contains the original token and the range. But then, the canonical ordering of tokens is even harder to define.
Possible solutions
Do nothing about it. This has the problems above, but we could just add cases whenever we see one that is not covered.
Define a canonical ordering of tokens so that their nesting order is checked at run-time. Could we even make such order checked at compile time using a specific hierarchy of tokens?
A big inconvenient is to have to traverse tokens to get valuable information anyway.
If we have a canonical ordering, we might as well have a single type of token, as below.
record TokenWrapper
(my preference)Token
class, we remove all other classes extendingTokenWrapper
and incorporate their semantics inTokenWrapper
itself by making it concrete, with the following properties:WrappedToken
: This is the original untouched token that theTokenWrapper
is “wrapping”.WrappedToken cannot be a TokenWrapper itself. It has to be a regular
Dafny.Token
(not even aDafny.IToken
).InnerToken
: This can be either aToken
or aTokenWrapper
. It’s borrowed from NestedToken.Message
: Either null or a string. This is borrowed fromNestedToken
, to describe why theInnerToken
is necessary.IncludedIn
: Either null or anInclude
which contains the included file name. If a string, then the filename of theTokenWrapper
is overriden by IncludedIn (the filename of the included file can still be recovered through WrappedToken.FileName).RefinedFrom
: Either a module declaration or null. Replaces the need forRefinedToken
. That way, there is no canonical order problem betweenIncludedIn
andRefinedFrom
StartToken
: A regular token only. If this token describes the position of an expression,StartToken
describes the start position. Borrowed fromRangeToken
EndToken
: A regular token only. If this token describes the position of an expression, StartToken describes the end position. Borrowed from RangeTokenAutoGenerated
: A boolean indicating if the attached expression was auto-generated, meaning its wrapped token is not relevant. It would have noStartToken
and noEndToken
either,WrappedToken
could still point to an original token, for error reporting (but not for token ownership)QuantifiedVariableDomain
: A booleanQuantifiedVariableRange
: A booleanForceCheckToken
: A booleanAdvantages: Extraction is easy and canonical. No need to change things. Cloning would be easy.
Why is this important?
This would help solve the following issue in a simple way
#3290
Rather than defining a new Format, we could just define RichToken, have cloners add information to RichTokens by cloning them as records. That way, failing assertions could even report both a token and a range. The range is good for underlining, but the token helps disambiguating where the failing ranges are.
@keyboardDrummer introduces a way to access the parsed document on the language server.
#3338
However, it does so by “cloning” the AST being transformed, so it’s essential that all the formatting is not lost, especially the RangeToken. OwnedTokens can be computed after that independently.
When 3290 is done, I can proceed with the migration of the formatting PR to the new token format.
#2399
This will make it possible to remove the need of two tree traversal (one pre-parser, the other post-parser)
We want to remove the concrete children that I introduce in my PR of the Formatter.
#3339
The only way to do that is to ensure Children computes pre-resolution trees if called before resolution. But given the architecture of the language server, that tree would be a clone, and clones currently don’t clone RangeTokens, which is used for OwnedTokens, that the formater requires...
The text was updated successfully, but these errors were encountered: