-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Implement active format reconstruction #6982
base: trunk
Are you sure you want to change the base?
HTML API: Implement active format reconstruction #6982
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
a9a2f2d
to
9bf1a03
Compare
Since the HTML Processor started visiting all nodes in a document, both real and virtual, the breadcrumb accounting became a bit complicated and it's not entirely clear that it is fully reliable. In this patch the breadcrumbs are rebuilt separately from the stack of open elements in order to eliminate the problem of the stateful stack interactions and the post-hoc event queue. Breadcrumbs are greatly simplified as a result, and more verifiably correct, in this construction.
The HTML Processor internally throws an exception when it reaches HTML that it knows it cannot process, but this exception is not made available to calling code. It can be useful to extract more knowledge about why it gave up, especially for debugging purposes. In this patch, more context is added to the WP_HTML_Unsupported_Exception and the last exception is made available to calling code, if it asks.
…rithm. As part of work to add more spec support to the HTML API, this patch fills out the active format reconstruction algorithm so that more HTML can be supported in situations requiring that reconstruction, for example, when a formatting element such as an A tag or a CODE tag is implicitly closed. See Core-61576
9bf1a03
to
ab1096f
Compare
Use the method implemented in WordPress#6982 to avoid duplicating the same functionality.
Extracted from WordPress#6982
I think the HTML5lib tests will help here. I'll try to make some progress on this. FAILURES!
-Tests: 1494, Assertions: 1076, Failures: 2, Skipped: 418.
+Tests: 1494, Assertions: 1103, Failures: 11, Skipped: 391. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm seems to be implemented correctly, I've tested and reviewed its behavior compared to the spec 👍
As already noted, it does not behave correctly reproducing attributes in the created nodes. That seems like something we should figure out with virtual tokens. For now, we can likely bail when creating nodes with attributes.
Also already noted, when pushing to the list of active formatting elements, we need to handle the "noah's ark clause." That seems important to handle because it affects the behavior when reconstructing the list of active formatting elements. That may also give insights about how to deal with attributes on generated nodes.
/** | ||
* Returns the node at the given 1-offset index in the list of active formatting elements. | ||
* | ||
* Do not use this method; it is meant to be used only by the HTML Processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to say this? The entire class is marked @acess private
and considered internal.
* @access private | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, maybe this is redundant given the entire class has this private tag.
* @access private | |
* |
I'm exploring the Noah's Ark work to limit equivalent elements on the active formatting element stack in dmsnell#19. |
Note from conversation: can we simply bail if we have more than three of the same kind of tag name on the list of active formatting elements (up to the nearest marker) and then defer any format Noah's Ark processing? this could potentially let us process the most common form of active format reconstruction without implementing the complicated and costly parts of the algorithm. |
Need to report attributes even on virtual reconstructed attributes. Prior to now we haven't actually recreated any formatting elements, but once we start doing so, we have to ensure we don't report that attributes are missing |
Trac ticket: Core-61576
Status
I think that the only thing remaining on this one is getting the attributes right for reconstructed nodes, and also adhering to the Noah's Ark rule with a count of three.
We can probably have an inserted node refer to another one, that is, have a virtual node point to another node for reading the attributes.
Description
Adds support for active format reconstruction, which occurs when crossing certain HTML boundaries, such as when entering a new
P
element which implicitly closed the previous one and all of the formatting elements inside it.This raises the question what to do when elements are implicitly created. This appears already with the unexpected
</p>
, which creates an emptyP
element.next_tag()
never finds these elements even though they appear in the breadcrumbs when moving past them.Tests