-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexer confusion about operators #3066
Comments
I don't think there is such a document actually. |
Well, I always considered TwigPHP sources as the de-facto specifications, so that's fine. But the PHP lexer is quite confusing and in my course of providing a standalone nodejs lexer for Twig, I first thought that my implementation was faulty. I'll stick with my mantra: TwigPHP sources are the reference implemetation of the specs. But I think something should be improved there in the future. |
Let's close as I won't have time to actually change this (especially as it works as is). |
Actually, it's not a real issue with TwigPHP as long as the parser knows what to do with the lexed tokens - and it does. It's more an issue for people that want to use the lexer to provide other implementations, linting and code analysis: they have to write a lexer that gives different results than the reference implementation. For example, @PolyPik, who is working on twig.js, have some concerns with the TypeScript lexer we wrote because it does not match your reference lexer: NightlyCommit/twig-lexer#10 The lexer works quite well, is lossless and removes a few confusion (but not the one we are talking about here unfortunately) but his concerns remain valid: he'd like to write a parser that handles the official tokens instead of the arbitrary ones of twig-lexer. But the absence of specs there can't guarantee anything and we had to make some choice that we are not sure are what you would do would you rewrite your lexer in the future. I see that Twig 3 is in the work. Is there something new about the lexer? Can we help (by we I mean the community and the nodejs one mainly because this is where Twig support is very active recently) on establishing some specs or something? |
Consider the following template:
{{in}}
When lexed, here is what is returned:
Now consider the following one:
{{in }}
When lexed, here is what is returned:
As you can see, in the latter case,
before the
ìn
is recognized as an operator, while in the former it is a name. The lexer is not able to distinguish an operator from a variable name. It is confused by formatting characters (in the second template, the}}
) that are not supposed to be relevant inside blocks:{{ foo.bar }}
is lexically identical to{{foo.bar}}
in Twig, like{% foo %}
and{%foo%}
.More generally, the lexer is not very robust when it comes to operators. It is not predictable when the lexer will find an operator token or a name token:
{% for in in in %}
is tokenized into:While the first and last
in
actually are variable names.{{ in.in }}
is tokenized into:While the lexically identical template
{{in.in}}
is tokenized into:I can't find the official lexical specs of the language - I assume it is an internal document at Symfony's, thus I can't be sure that this is the expected behavior. But from an external point of view, this makes the lexer not very robust and not quite what is expected from a syntactic analyzer tool.
The text was updated successfully, but these errors were encountered: