-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP parser explorations #15818
Comments
Does that mean you plan to contribute to https://github.com/nikic/PHP-Parser? I have a couple of issues open (and many already closed) regarding error recovery and indeed every bit there makes PHP IntelliSense work in more cases. |
@felixfbecker yep, that was where I started, but it's exceedingly difficult to retrofit a parser, let alone an auto-generated one, with things like "error tolerance". Over time, you end up with a long tail of esoteric bugs. There are additional challenges, such as how the tree is represented and performance, that are a bit difficult to manage (I'll enumerate them further if you're interested). So, I began working on a handspun parser (also written in PHP). Right now we're heads down with the VS Code endgame, but I plan to share more details (and there are a lot of details 😉 ) later this month. The code currently lives in a private repo. I'll make it public after a little bit more cleanup, which is needed so that people are able to read the code, understand it, and contribute back to it. Just to set expectations - it's a proof-of-concept. It should, however, be enough to decide whether there's enough interest to invest further and bring it to production quality or whether we want to change direction. Once it's public - should be soon - we should check it against the error-tolerance issues you mentioned. |
I just don't know if the effort is worth to build a completely new parser from scratch and support it long-term with new syntax etc. when there is already a pretty good one out there that has a lot of support and is used by other big libraries. The improvements nikic did for error recovery where all very small patches in the grammar and I think doing improvements there is a lot easier than writing a completely new parser. Crane's author stated that his motiviation for not using PHP but TypeScript to parse was that the extension "should work out of the box" and have no dependencies and not PHP runtime requirement. I on the other hand chose PHP to make use of great PHP packages that help the LS. So in what language your new parser is written in will decide who of Crane/PHP IntelliSense is able to use it. But anyway, appreciate that PHP gets some love from you 👍 |
Well, as someone that has already done a lot of exploration into an IDE extension for PHP back in February/March, I actually agree with @mousetraps on needing a custom implementation. Initially, I too tried to start with a generated parser. My "little" concept extension needed to work across all systems and not have any dependencies on the installed PHP version, so I needed a JS/TS implementation. That led me to the glayzzle/php-parser project (which the Crane extension uses a fork of). Unfortunately, it didn't support PHP 7 at the time, and there were no other suitable projects to build on. So I went to the source, figuratively and literally. I looked at how PHP itself worked, and thankfully, PHP 7 used a new generated parser and abstract syntax tree (AST). This meant that it would be relatively simple to use its own grammar file and some of the other packages that the glayzzle project was built on, and then integrate it into vscode. In reality however, I ended up with a TON of shift-reduce, and worse, reduce-reduce conflict warnings from the generator. These are ambiguities that could affect which actions are run, and thereby possibly introduce very difficult bugs to track down later on. After a bit of work, I eventually ended up with a grammar where I had replaced most of the offending rules regarding PHP's expressions. More importantly it had 0 conflicts and by then several hundred tests to show that it was still working as expected (which still isn't full coverage). Now, having any lexer, parser, and abstract syntax tree is generally enough to implement relatively simple IDE features like IntelliSense on a small scale (although really, even IntelliSense has quirks, like determining the correct type of The first is performance. Neither the generated lexer, nor the generated parser supported methods to only tokenize and parse partial segments of a file. In a large project, you shouldn't parse entire files when someone does a "find and replace all" or "rename symbol". After each change, diagnostics must also be run, and there are a lot of them. Syntax errors are only the beginning. Speaking of syntax errors, the next major issue would be error tolerance. If a parser choked at the first syntax error it saw as you typed, you would lose any symbols found later in the file and potentially trigger a lot of diagnostic errors. Errors are typically solved in one of two ways. The text can simply be ignored, or an additional fake token can be added to make it valid. For example, take the following text: Now, remember how I said comments and whitespace could essentially be ignored? Well, not really. To fully take advantage of an IDE's formatting capabilities, your analyzers need contextual information. Which means that they should really be represented in the tree. Take the semicolons in a for-block. A generic formatting rule of "nothing after a semi-colon except a newline or line comment" would fail here without knowledge of the for statement. Now try making a generated parser account for all the possible combinations of whitespace in various expressions (like TL;DR: Generated parsers are very fast and excellent for compilers that only need a single pass, but bad for IDEs where files constantly change. Either way, I am also very interested in seeing how your parser works and how applicable concepts used by Roslyn are 😃 . |
My story is a lot like @mattacosta. I looked into using Jison to generate a php7 parser by modifying the official PHP zend_language_parser.y grammar file only to be met with many shift-reduce conflicts. I ironed out all those and then started experimenting with using error tokens to recover from parse errors but found this approach very limited. I ditched the generated parser and wrote a recursive descent parser and regex lexer for php7 and am making progress towards providing another PHP intellisense offering in Javascript/Typescript. It's slow going and in a private repo for now. I think a fast, error tolerant parser is key to having good code completion and static code analysis and am interested in @mousetraps efforts. |
I am thinking bigger scale. A language server should work with many IDEs, not just VS Code. Now, what is a PHP developer more likely to have installed, Node or PHP? Definitely the latter. That is also the language he/she knows and can contribute too. My language server is already used by Eclipse Che, Eclipse (LSP4E) and more in the pipeline, which are all not Node-based.
The problem with PHP's on parser is that it is only able to parse code of the version it is running.
My language server has no problem with these
I have not met any performance issues with reparsing the document on every change.
nikic/PHP-Parser has error recovery. It can meet a statement that is not terminated by a semicolon for example, and unfinished variable, etc and still give you an AST. The statement is not simply ignored, you would get a Why are you all trying to build new extensions and parsers, instead of contributing to existing open-source efforts (crane, glayzzle/php-parser, nikic/PHP-Parser, vscode-php-intellisense)? |
@felixfbecker There's nothing wrong with those projects, it's just that we may have different design goals and/or requirements to meet. Also, it's not that generated parsers can't collect errors, it's that from a user's perspective, some of those errors might show up in inappropriate locations or have confusing messages when shown within an IDE. With custom parsers, it's easier to apply custom logic in order to provide a better user experience. Now that I have read that paragraph again, I probably should have made that a lot clearer. |
Hello! Just a comment! Thanks in advance! |
@darcome Please note that my extension is fully capable of providing IntelliSense for files written in PHP 5. Just the runtime for the language server needs to be PHP 7. You should be able to install versions side-by-side if you need to and point the LS to use PHP 7. |
Hey all! As promised, the parser is now available here: As I mentioned, it's still in its early stages, and we're hungry for feedback, so please please please try it out and let us know what you think! |
Hello, Thanks in advance, |
@darcome It's not a full extension on the marketplace yet. It's just under-the-hood code that may eventually power a future extension. As of right now, you can only debug the parser using VSCode. If you want to test out its implementation, you can also play around with the Syntax Visualizer mini-extension to view the abstract syntax tree it creates as well. Also, for performance/memory reasons it does require PHP 7. |
@mattacosta Sorry to bother you again, but how can I do it? Thanks in advance |
Hey @darcome - as @mattacosta mentioned, the parser is under-the-hood code that produces a syntax tree from a source code string. This syntax tree represents the lexical and syntactic structure of source code, which enables services (like hover, navigations, code-completions, formatting, refactoring, etc.) to intelligently process it. There are lots of ways to contribute:
|
I recently looked into what ongoing PHP efforts we might want to contribute to, and was especially excited by the ongoing language service efforts from extensions like PHP IntelliSense and Crane. Eventually I zeroed into the parsers underlying these language services, and began investigating what it might look like to have a PHP parser optimized for IDE scenarios (e.g. error tolerance, performance, tree representation, etc.) I've made some good progress, and hope to reach out to the community with more details soon.
This issue tracks progress on that work.
The text was updated successfully, but these errors were encountered: