Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: HTML API: Add set_inner_html() to HTML Processor #7326

Draft
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Sep 11, 2024

This method depends on other methods, but this change introduces an idea for how to accomplish setting the inner HTML, leaving supporting details unresolved.

Trac ticket:


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

This method depends on other methods, but this change introduces an idea
for how to accomplish setting the inner HTML, leaving supporting details
unresolved.
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

*
* @since 6.8.0
*
* @param string $new_inner_html New HTML to inject as inner HTML for the currently-matched tag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans to support in the future the HTML fragment as the param in addition to string? I can see how that might slightly improve developer experience when needing to construct more complex HTML and avoid serializing to HTML before passing to this method to convert back to the HTML fragment instance when insering into document.

Copy link
Member Author

@dmsnell dmsnell Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't in my plans, mostly because I'm wearywary of passing around $processor objects and think it's important to encourage keeping them local. as stateful objects, surprising things can happen when passing them to functions and getting them back. there's long been a discussion concerning locking them down or making them read only or limiting the work that can be done, but that doesn't exist yet.

On the other hand, when working earlier with templating I had a desire to wrap HTML content in some class indicating that it was safe. Something like that would allow deferred parsing and might be a worthwhile investigation here. On the other other hand I've also been a bit doubtful about the likelihood of people using that. Maybe it would eventually be good to have both as you suggest, where a string would be re-parsed, but a wrapped fragment could in theory be deferred.

In a wrapped model, to achieve performance wins, I think we have to be careful to defer until the last moment when rendering the outermost processor, which may not be easy and may require stacking complicated updates. For example, this code:

$processor = WP_HTML_Processor::create_fragment( '<li>' );
$processor->next_tag( 'li' );
$processor->set_inner_html( WP_HTML_Processor::create_fragment( 'Apples and Oranges' ) );

$outer_processor = WP_HTML_Processor::create_fragment( '<ul>' );
$outer_processor->next_tag( 'ul' );
$outer_processor->set_inner_html( $processor );

This is a challenge because the original processor isn't going to know that it shouldn't apply its update and advance. This also demonstrates some of the questions arising from non-local access of a processor: when passing it into another function it might scan to the end and "finish." What if calling code expects to continue using it after passing it in?

For the time being I see the eager string parsing as the more approachable option with computational weight as the tradeoff. This would leave room in the future for an optimization. Other potential optimizations might also arise, such as creating subtrees of the full tree for edits, but these do get complicated quickly.

Copy link
Member

@gziolo gziolo Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the spawn_fragment_parser() method hasn't been implemented yet. I assume it's going to be very similar to the create_fragment() method, but the $context would be inferred from the root document. So, the improvement for the developer could be minimal for the time being in spawn_fragment_parser() would essentially change the context if necessary on the passed instance of the created fragment. It gets serialized in the next line, so it would get evaluated anyway.

$fragment =  $this->spawn_fragment_parser( $new_inner_html );
$new_markup = $fragment->serialize();

In the scenario when using only string as param, the same example would look like this:

$processor = WP_HTML_Processor::create_fragment( '<li>' );
$processor->next_tag( 'li' );
$processor->set_inner_html( (string) WP_HTML_Processor::create_fragment( 'Apples and Oranges' ) );

$outer_processor = WP_HTML_Processor::create_fragment( '<ul>' );
$outer_processor->next_tag( 'ul' );
$outer_processor->set_inner_html( (string) $processor );

So there are two steps:

  • the dev needs to explicitly serialize to string
  • the processor need to recreate the fragment

Aside. It made me realize one important aspect for the example provided with the outlined implementation. By eagerly processing HTML when inserting inner HTML, it will happen in a bottom up fashion. In effect, the processor will have to serialize the 'Apples and Oranges' fragment multiple times as the fragments build on top of each other. At every stage, it will know more about the entire document so in certain scenarios it might even impact the produced HTML. Just to illustrate it better:

  • first fragment <p>My text</p> -> entire paragraph processed
  • second fragment <blockquote></blockquote> -> has to process <blockquote><p>My text</p></blockquote>
  • third fragment <ul><li></li></ul> -> has to process <ul><li><blockquote><p>My text</p></blockquote></li></ul>

So this isn't something I initially thought about, but after thinking a bit more about it, it's another part where things are getting interesting.


This is a challenge because the original processor isn't going to know that it shouldn't apply its update and advance. This also demonstrates some of the questions arising from non-local access of a processor: when passing it into another function it might scan to the end and "finish." What if calling code expects to continue using it after passing it in?

I didn't think about that earlier. Yes, that is indeed interesting. When passing as a string, you have to call get_updated_html() so that means the fragment will end up at the end of the document. So that part becomes predictable.

Copy link
Member

@sirreal sirreal Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sketched an implementation for the spawn_fragment_parser method in dmsnell#22. The algorithm is well described in the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #7144 some prior work on different fragment contexts.

Comment on lines +5023 to +5036
/**
* Normalize an HTML string by serializing it.
*
* This removes any partial syntax at the end of the string.
*
* @since 6.7.0
*
* @param string $html Input HTML to normalize.
*
* @return string|null Normalized output, or `null` if unable to normalize.
*/
public static function normalize( string $html ): ?string {
return static::create_fragment( $html )->serialize();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in #7331.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants