-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: avoid processing incomplete tokens #5725
HTML API: avoid processing incomplete tokens #5725
Conversation
603db00
to
2393b92
Compare
The HTML Tag Processor is able to know if it starts parsing a syntax element and reaches the end of the document before it reaches the end of the element. In these cases, after this patch, the processor will indicate this condition. For example, when processing `<div><input type="te` there is an incomplete INPUT element. The processor will fail to find the INPUT, it will pause right after the DIV, and `paused_at_incomplete_token()` will return `true`. This patch doesn't change any existing behaviors, but it adds the new method to report on the final failure condition. It provides a mechanism for later use to add chunked parsing to the class, wherein it will be possible to process a document without having the entire document loaded in memory, for example when processing unbuffered output. This is also a necessary change for adding the ability to scan every token in the document. Currently the Tag Processor only exposes tags as tokens, but it will need to process `#text` nodes, HTML comments, and other markup in order to enable behaviors in the HTML Processor and in refactors of existing HTML processing in Core.
8056376
to
35502ad
Compare
Rebuild of 8905faf after WordPress#5725. Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com> Co-authored-by: Bernie Reiter <bernhard-reiter@git.wordpress.org>
I ran the tests from #5794 against this branch and it skips fewer tests, but all of those tests seem to fail. This may be a limitation with #5794 and not this PR. All of the new failures seem to be with test input that add I pulled the test data and the test suite from that branch (no fixes) and ran the tests with these results: ./vendor/bin/phpunit -c tests/phpunit/tests/html-api/phpunit.xml --filter Html5lib Trunk:
This branch:
|
I think the problem is that the tag processor always assumes we're in body, but there are tests that add tags to head. So it's a limitation of the way we're using the tests now. It's not clear to me why those tags were unsupported before this PR but are supported now… was that intentional? |
@sirreal those tests need to be setup differently, because the HTML Processor currently only supports the it's not that it assumes we're IN BODY but rather the default parameter to |
Rebuild of 8905faf after WordPress#5725. Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com> Co-authored-by: Bernie Reiter <bernhard-reiter@git.wordpress.org>
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
da5f9d9
to
9e7167a
Compare
With the merge of 1270194 everything is now passing or skipped in the html5lib test suite, thanks again @sirreal. There were a couple of fixes I incorporated from your work there, and then I updated that branch with some specific test exclusions based on either a design question or a test that tests behaviors we don't currently support (but will probably do so once we support scanning all tokens). The necessary change to the html5lib tests was to check for |
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
1270194
to
8488317
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you for taking the time to walk me through these changes!
As always, the test coverage also really helps with confidence in these changes 🙂
Committed to Core in https://core.trac.wordpress.org/changeset/57211. |
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc - WordPress/wordpress-develop#5535 Adds support for H1 - H6 elements in the HTML Processor. - WordPress/wordpress-develop#5725 Pause the Tag Processor when reaching the end of the document inside an incomplete syntax element. - Incorporates linting changes from "TODO:" to "todo" This patch adds the blanket exclusion from the HTML API compatability layer after they were removed and started blocking updates from Core. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. From Gutenberg's perspective they are no different than NPM packages. Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com> Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc - WordPress/wordpress-develop#5535 Adds support for H1 - H6 elements in the HTML Processor. - WordPress/wordpress-develop#5725 Pause the Tag Processor when reaching the end of the document inside an incomplete syntax element. - Incorporates linting changes from "TODO:" to "todo" This patch adds the blanket exclusion from the HTML API compatability layer after they were removed and started blocking updates from Core. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. From Gutenberg's perspective they are no different than NPM packages. Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com> Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc - WordPress/wordpress-develop#5535 Adds support for H1 - H6 elements in the HTML Processor. - WordPress/wordpress-develop#5725 Pause the Tag Processor when reaching the end of the document inside an incomplete syntax element. - Incorporates linting changes from "TODO:" to "todo". - WordPress/wordpress-develop#5762 Handles the "all other tags" rules in IN BODY insertion mode. - WordPress/wordpress-develop#5539 Adds support for list elements in the HTML Processor. This patch adds the blanket exclusion from the HTML API compatability layer after they were removed and started blocking updates from Core. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. From Gutenberg's perspective they are no different than NPM packages. Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com> Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc - WordPress/wordpress-develop#5535 Adds support for H1 - H6 elements in the HTML Processor. - WordPress/wordpress-develop#5725 Pause the Tag Processor when reaching the end of the document inside an incomplete syntax element. - Incorporates linting changes from "TODO:" to "todo". - WordPress/wordpress-develop#5762 Handles the "all other tags" rules in IN BODY insertion mode. - WordPress/wordpress-develop#5539 Adds support for list elements in the HTML Processor. This patch adds the blanket exclusion from the HTML API compatability layer after they were removed and started blocking updates from Core. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. From Gutenberg's perspective they are no different than NPM packages. Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com> Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
Trac ticket: Core-60122
Depends on #5721
Currently the Tag Processor assumes that an input document is a full HTML document. Because of this, if there's lingering content after the last tag match it will treat that content as plaintext and skip over it. This is fine for the Tag Processor because if there is lingering content that isn't a valid tag then there's nothing for
next_tag()
to match.However, in order to support a number of feature expansions it is important to recognize that the remaining content may involve partial syntax elements, such as incomplete tags, attributes, or comments.
In this patch we're adding a mode inside the Tag Processor which will flip when we start parsing HTML syntax but the document finishes before the token does. This will provide the ability to:
In the process of building this patch a few fixes were identified and fixed in the Tag Processor, namely in the handling of incomplete syntax elements.