-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Html api/add html processor #6
base: trunk
Are you sure you want to change the base?
Conversation
…he HTML specification
…/add-html-processor
while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) { | ||
$tag_name = $p->get_tag(); | ||
|
||
if ( ! $p->is_tag_closer() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are void elements closers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they're not, that's why this code branch works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you probably realized, tag closers only exist when the tag name is preceded by a /
, which is distinct from the self-closing flag which appears right before the >
closing the tag opener
$element = WP_HTML_Spec::element_info( end( $this->open_elements ) ); | ||
// @TODO: Handle self-closing HTML foreign elements: must convey self-closing flag on stack. | ||
if ( $element::is_void ) { | ||
array_pop( $this->open_elements ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it say if ( ! element::is_void )
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or not? I don't get why void elements have any interaction with the open elements stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, why are void elements landing on the open elements stack, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something to document later and it appeared towards the end of the day. I think I've seen this in the HTML spec for tree-building.
the problem is that we have to push the void element to the open stack in order to get proper depth information and climb out of the nesting we're in. this looks funny here because we aren't currently tracking if the last element on the stack requires no closer (is void or self-closing foreign element). that means whenever we approach the next tag we have to immediately remove any void elements because they cannot enclose over other tags.
it's definitely unexpected until you reason through it, or as I did, get reminded by it with failing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, popping should happen in the same step as the insertion:
A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr"
Reconstruct the active formatting elements, if any.Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
Set the frameset-ok flag to "not ok".
|
||
$starting_depth = count( $this->open_elements ); | ||
|
||
while ( $this->next_tag() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should bale out if between the $starting_depth and the child we see anything with $current_depth <= $starting_depth
as that would imply the initial tag was closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the ensure_support()
function that runs I don't think it's possible for this to happen, but I think I can defer some of the support function. I've added a test to make sure, though I haven't yet added any non-HTML elements in the tests.
$current_depth = count( $this->open_elements ); | ||
|
||
if ( $this->is_tag_closer() && $current_depth < $starting_depth ) { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here for the depth check I mentioned above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all garbage that will go away
|
||
$stack = array(); | ||
|
||
$p = new WP_HTML_Tag_Processor( $this->html ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart! I kind of don't like that it throws away the progress we have made so far, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? my vision for this is that it can grow with our added support.
as for running a first-pass through the document I'm not claiming this is certainly needed. I'm sure it's possible to work it in alongside the normal traversal.
for now though it helped quite a bit, and I figured the cost is meager given that it's a read-only scan.
@luisherranz just pointed out that we may not need Bookmark "end" value. We don't seem to use it anywhere, and it generates problems with outer HTML logic. I like that. What do you think @dmsnell |
|
||
$self_closes = $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ); | ||
if ( $self_closes ) { | ||
$this->open_elements[] = $tag_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this part, it's not really an open element. Even in the parsing spec these are only added to the stack for brevity of the document – they're always immediately popped in the same step and before ingesting the next token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's essential: see above. may not be able to articulate it well yet, but I think one case we might examine is finding the next_sibling()
of an <img>
. if we don't push void elements onto the stack we'll be looking for a tag that's a parent of the img
instead of a sibling.
I think this will matter also for CSS selection. we will need to know the full "tree path" (a term I'm making up here without intending to convey specific information about it) in order to match our current position against the selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw it's not what the spec says, is it? Depending on the insertion mode and element name, we may or may not have to push it to the stack of open elements. There may be other operations to do before pushing, too, and the order matters.
return $this->fully_supported_input; | ||
} | ||
|
||
public function next_tag( $query = null ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we could make it private in the descendant class. It doesn't even make sense to have it public, it can't support the same set of queries as the parent, and it's a pain to deal with PHP's idea of polymorphism where the parent class will call the child implementation of this method anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funny enough I added _doing_it_wrong()
here before realizing that I call it from within this tag processor.
is it the tag processor that's calling this?
the one thing I wish we could do is remove $query
but it wouldn't let me. still some thinking about this one to do.
} | ||
$this->set_bookmark( 'end' ); | ||
|
||
$start = $this->bookmarks['start']->end + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's one usage of the end
bookmark property, but we could do without it by just storing this in a variable a few lines above. (I'm referring to the idea of ditching the end
property)
Oh this is great @dmsnell! A few more notes:
|
$end = $this->bookmarks['end']->end + 1; | ||
$this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $new_html ); | ||
$this->get_updated_html(); | ||
$this->bookmarks['start']->start = $start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done the same in one of my attempts! Why is this one not getting released in get_updated_html
, though? I remember I had to create a new WP_HTML_Text_Span and re-insert it into $this->bookmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite sure why it isn't released
Oh, one other thing – be sure to test |
I would love to hear more about this, how it creates problems. it seems inherently essential for separating inside vs. outside of tag tokens, so I'm confused how it isn't necessary. |
stewing on this I came up with an idea and then realized it's probably the same as your pinned bookmarks 🙃 |
private function enter_element( $element ) { | ||
$this->depth++; | ||
|
||
parent::set_bookmark( "{$this->depth}_{$element}" ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why set a bookmark for each element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a stub function for the stack of open elements. I think I'm going to rip it out and use a proper stack, but we have to have a bookmark because if we make edits we have to know that the stack might have changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need a bookmark for the current element, though, right? Not for all the elements seen so far, nor for all parents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the stack of open elements might actually shift and some may disappear. in any case, if we seek()
we have to reset them, so we need a good way to reconstruct the stack when seeking.
in your PR I think you're doing that by storied copies of all the open elements for every bookmark, but also we have to make sure those are bookmarks because we expect changes to be applied to the document which could change things.
the main case that matters I think is wiping out a bookmark, but inserting nodes will shift them all.
<div><p>inside</p></div>
^^^ bookmark
unwrap
<p>inside</p>
^^^ stack of open elements here is no longer DIV > P but rather only P
d2791d0
to
0e3ada5
Compare
default: | ||
return self::NOT_IMPLEMENTED_YET; | ||
} | ||
} catch ( Exception $e ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching all exceptions might hide legitimate errors, it would be safer to have a HTML_Exception class or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. at some point I wanted to go back and special-case known exceptions so we don't hide others, but I didn't do that yet because I wasn't entirely sure how I felt about the escaping mechanism as throwing Exceptions.
parent::__construct( $html ); | ||
|
||
$this->tag_openers = new WP_HTML_Element_Stack(); | ||
$this->tag_closers = new WP_HTML_Element_Stack(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would you push to this stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still working on an answer for this 😆
TODO: - abstract and optimize "if P in button scope then close P"
Trac ticket:
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.