Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: HTML API: Add set_inner_html() to HTML Processor #7326

Draft
wants to merge 5 commits into
base: trunk
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -5024,6 +5024,66 @@ public function get_comment_type(): ?string {
return $this->is_virtual() ? null : parent::get_comment_type();
}

/**
* Replaces the inner markup of the currently-matched tag with provided HTML.
*
* This function will normalize the given input and enforce the boundaries
* within the existing HTML where it's called.
*
* @since 6.8.0
*
* @param string $new_inner_html New HTML to inject as inner HTML for the currently-matched tag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans to support in the future the HTML fragment as the param in addition to string? I can see how that might slightly improve developer experience when needing to construct more complex HTML and avoid serializing to HTML before passing to this method to convert back to the HTML fragment instance when insering into document.

Copy link
Member Author

@dmsnell dmsnell Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't in my plans, mostly because I'm wearywary of passing around $processor objects and think it's important to encourage keeping them local. as stateful objects, surprising things can happen when passing them to functions and getting them back. there's long been a discussion concerning locking them down or making them read only or limiting the work that can be done, but that doesn't exist yet.

On the other hand, when working earlier with templating I had a desire to wrap HTML content in some class indicating that it was safe. Something like that would allow deferred parsing and might be a worthwhile investigation here. On the other other hand I've also been a bit doubtful about the likelihood of people using that. Maybe it would eventually be good to have both as you suggest, where a string would be re-parsed, but a wrapped fragment could in theory be deferred.

In a wrapped model, to achieve performance wins, I think we have to be careful to defer until the last moment when rendering the outermost processor, which may not be easy and may require stacking complicated updates. For example, this code:

$processor = WP_HTML_Processor::create_fragment( '<li>' );
$processor->next_tag( 'li' );
$processor->set_inner_html( WP_HTML_Processor::create_fragment( 'Apples and Oranges' ) );

$outer_processor = WP_HTML_Processor::create_fragment( '<ul>' );
$outer_processor->next_tag( 'ul' );
$outer_processor->set_inner_html( $processor );

This is a challenge because the original processor isn't going to know that it shouldn't apply its update and advance. This also demonstrates some of the questions arising from non-local access of a processor: when passing it into another function it might scan to the end and "finish." What if calling code expects to continue using it after passing it in?

For the time being I see the eager string parsing as the more approachable option with computational weight as the tradeoff. This would leave room in the future for an optimization. Other potential optimizations might also arise, such as creating subtrees of the full tree for edits, but these do get complicated quickly.

Copy link
Member

@gziolo gziolo Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the spawn_fragment_parser() method hasn't been implemented yet. I assume it's going to be very similar to the create_fragment() method, but the $context would be inferred from the root document. So, the improvement for the developer could be minimal for the time being in spawn_fragment_parser() would essentially change the context if necessary on the passed instance of the created fragment. It gets serialized in the next line, so it would get evaluated anyway.

$fragment =  $this->spawn_fragment_parser( $new_inner_html );
$new_markup = $fragment->serialize();

In the scenario when using only string as param, the same example would look like this:

$processor = WP_HTML_Processor::create_fragment( '<li>' );
$processor->next_tag( 'li' );
$processor->set_inner_html( (string) WP_HTML_Processor::create_fragment( 'Apples and Oranges' ) );

$outer_processor = WP_HTML_Processor::create_fragment( '<ul>' );
$outer_processor->next_tag( 'ul' );
$outer_processor->set_inner_html( (string) $processor );

So there are two steps:

  • the dev needs to explicitly serialize to string
  • the processor need to recreate the fragment

Aside. It made me realize one important aspect for the example provided with the outlined implementation. By eagerly processing HTML when inserting inner HTML, it will happen in a bottom up fashion. In effect, the processor will have to serialize the 'Apples and Oranges' fragment multiple times as the fragments build on top of each other. At every stage, it will know more about the entire document so in certain scenarios it might even impact the produced HTML. Just to illustrate it better:

  • first fragment <p>My text</p> -> entire paragraph processed
  • second fragment <blockquote></blockquote> -> has to process <blockquote><p>My text</p></blockquote>
  • third fragment <ul><li></li></ul> -> has to process <ul><li><blockquote><p>My text</p></blockquote></li></ul>

So this isn't something I initially thought about, but after thinking a bit more about it, it's another part where things are getting interesting.


This is a challenge because the original processor isn't going to know that it shouldn't apply its update and advance. This also demonstrates some of the questions arising from non-local access of a processor: when passing it into another function it might scan to the end and "finish." What if calling code expects to continue using it after passing it in?

I didn't think about that earlier. Yes, that is indeed interesting. When passing as a string, you have to call get_updated_html() so that means the fragment will end up at the end of the document. So that part becomes predictable.

Copy link
Member

@sirreal sirreal Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sketched an implementation for the spawn_fragment_parser method in dmsnell#22. The algorithm is well described in the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #7144 some prior work on different fragment contexts.

* @return bool Whether the inner markup was modified for the currently-matched tag, or `NULL`
* if called on a node which doesn't allow changing the inner HTML.
*/
public function set_inner_html( string $new_inner_html ): ?bool {
$tag_name = $this->get_tag();

if (
WP_HTML_Tag_Processor::STATE_MATCHED_TAG !== $this->parser_state ||
$this->is_tag_closer() ||
( 'html' === $this->get_namespace() &&
(
self::is_void( $tag_name ) ||
in_array( $tag_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true )
)
)
) {
// @todo Support setting inner HTML for SCRIPT, STYLE, TEXTAREA, and TITLE.
return null;
}

$fragment = $this->spawn_fragment_parser( $new_inner_html );
$new_markup = $fragment->serialize();

$this->set_bookmark( 'start' );
$depth = $this->get_current_depth();
while ( $this->get_current_depth() >= $depth && $this->next_token() ) {
continue;
}

if (
$this->paused_at_incomplete_token() ||
null !== $this->get_last_error()
) {
return false;
}

$this->set_bookmark( 'end' );
$start = $this->bookmarks['_start'];
$end = $this->bookmarks['_end'];

$this->lexical_updates[] = new WP_HTML_Text_Replacement(
$start->start + $start->length,
$end->start - ( $start->start + $start->length ),
$new_markup
);

$this->get_updated_html();
$this->seek( 'start' );
return true;
}

/**
* Removes a bookmark that is no longer needed.
*
Expand Down
Loading