Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editor: Introduce HTML Tag Processor #3920

Closed
wants to merge 36 commits into from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jan 26, 2023

Status

❓ Is it acceptable to use the u Unicode flag in a PCRE pattern in Core?
^^^
This is the last remaining pertinent question form the author from a design perspective; it should be answered before a merge.

Draft commit message
Introduce HTML API with HTML Tag Processor

This commit pulls in the HTML Tag Processor from the Gutenbeg repository.
The Tag Processor attempts to be an HTML5-spec-compliant parser that
provides the ability in PHP to find specific HTML tags and then add,
remove, or update attributes on that tag. It provides a safe and reliable
way to modify the attribute on HTML tags.

#3920

Props: antonvlasenko, azaozz, bernhard-reiter, costdev, dmsnell, felixarntz, gziolo, helloFromTanya, zieladam

Description

This commit pulls in the HTML Tag Processor from the Gutenbeg repository (pulled from WordPress/gutenberg@98ce5c5
). The Tag Processor attempts to be an HTML5-spec-compliant parser that provides the ability in PHP to find specific HTML tags and then add, remove, or update attributes on that tag. It provides a safe and reliable way to modify the attribute on HTML tags.

// Add missing `rel` attribute to links.
$p = new WP_HTML_Tag_Processor( $block_content );
if ( $p->next_tag( 'A' ) && empty( $p->get_attribute( 'rel' ) ) ) {
    $p->set_attribute( 'noopener nofollow' );
}
return $p->get_updated_html();

Introduced originally in WordPress/gutenberg#42485 and in https://core.trac.wordpress.org/ticket/56299 and developed within the Gutenberg repository, this HTML parsing system was built in order to address a persistent need (properly modifying HTML tag attributes) and was motivated after a sequence of block editor defects which stemmed from mismatches between actual HTML code and expectations for HTML input running through existing naive string-search-based solutions.

The Tag Processor is intended to operate fast enough to avoid being an obstacle on page render while using as little memory overhead as possible. It is practically a zero-memory-overhead system, and only allocates memory as changes to the input HTML document are enqueued, releasing that memory when flushing those changes to the document, moving on to find the next tag, or flushing its entire output via get_updated_html().

Rigor has been taken to ensure that the Tag Processor will not be consfused by unexpected or non-normative HTML input, including issues arising from quoting, from different syntax rules within <title>, <textarea>, and <script> tags, from the appearance of rare but legitimate comment and XML-like regions, and from a variety of syntax abnormalities such as unbalanced tags, incomplete syntax, and overlapping tags.

The Tag Processor is constrained to parsing an HTML document as a stream of tokens. It will not build an HTML tree or generate a DOM representation of a document. It is designed to start at the beginning of an HTML document and linearly scan through it, potentially modifying that document as it scans. It has no access to the markup inside or around tags and it has no ability to determine which tag openers and tag closers belong to each other, or determine the nesting depth of a given tag.

It includes a primitive bookmarking system to remember tags it has previously visited. These bookmarks refer to specific tags, not to string offsets, and continue to point to the same place in the document as edits are applied. By asking the Tag Processor to seek to a given bookmark it's possible to back up and continue processsing again content that has already been traversed.

Attribute values are sanitized with esc_attr() and rendered as double-quoted attributes. On read they are unescaped and unquoted. Authors wishing to rely on the Tag Processor therefore are free to pass around data as normal strings.

Convenience methods for adding and removing CSS class names exist in order to remove the need to process the class attribute.

// Update heading block class names
$p = new WP_HTML_Tag_Processor( $html );
while ( $p->next_tag() ) {
    switch ( $p->get_tag() ) {
	case 'H1':
	case 'H2':
	case 'H3':
	case 'H4':
	case 'H5':
	case 'H6':
	    $p->remove_class( 'wp-heading' );
	    $p->add_class( 'wp-block-heading' );
	    break;
}
return $p->get_updated_html();

The Tag Processor is intended to be a reliable low-level library for traversing HTML documents and higher-level APIs are to be built upon it. Immediately, and in Core Gutenberg blocks it is meant to replace HTML modification that currently relies on RegExp patterns and simpler string replacements.

See the following for examples of such replacement:

Co-Authored-By: Adam Zielinski adam@adamziel.com
Co-Authored-By: Bernie Reiter ockham@raz.or.at
Co-Authored-By: Grzegorz Ziolkowski grzegorz@gziolo.pl

Trac ticket: https://core.trac.wordpress.org/ticket/57575

This commit pulls in the HTML Tag Processor from the Gutenbeg repository.
The Tag Processor attempts to be an HTML5-spec-compliant parser that
provides the ability in PHP to find specific HTML tags and then add,
remove, or update attributes on that tag. It provides a safe and reliable
way to modify the attribute on HTML tags.

```php
// Add missing `rel` attribute to links.
$p = new WP_HTML_Tag_Processor( $block_content );
if ( $p->next_tag( 'A' ) && empty( $p->get_attribute( 'rel' ) ) ) {
    $p->set_attribute( 'noopener nofollow' );
}
return $p->get_updated_html();
```

Introduced originally in WordPress/gutenberg#42485 and developed within
the Gutenberg repository, this HTML parsing system was built in order
to address a persistent need (properly modifying HTML tag attributes)
and was motivated after a sequence of block editor defects which stemmed
from mismatches between actual HTML code and expectectations for HTML
input running through existing naive string-search-based solutions.

The Tag Processor is intended to operate fast enough to avoid being an
obstacle on page render while using as little memory overhead as possible.
It is practically a zero-memory-overhead system, and only allocates memory
as changes to the input HTML document are enqueued, releasing that memory
when flushing those changes to the document, moving on to find the next
tag, or flushing its entire output via `get_updated_html()`.

Rigor has been taken to ensure that the Tag Processor will not be consfused
by unexpected or non-normative HTML input, including issues arising from
quoting, from different syntax rules within `<title>`, `<textarea>`, and
`<script>` tags, from the appearance of rare but legitimate comment and
XML-like regions, and from a variety of syntax abnormalities such as
unbalanced tags, incomplete syntax, and overlapping tags.

The Tag Processor is constrained to parsing an HTML document as a stream
of tokens. It will not build an HTML tree or generate a DOM representation
of a document. It is designed to start at the beginning of an HTML
document and linearly scan through it, potentially modifying that document
as it scans. It has no access to the markup inside or around tags and it
has no ability to determine which tag openers and tag closers belong to each
other, or determine the nesting depth of a given tag.

It includes a primitive bookmarking system to remember tags it has previously
visited. These bookmarks refer to specific tags, not to string offsets, and
continue to point to the same place in the document as edits are applied. By
asking the Tag Processor to seek to a given bookmark it's possible to back
up and continue processsing again content that has already been traversed.

Attribute values are sanitized with `esc_attr()` and rendered as double-quoted
attributes. On read they are unescaped and unquoted. Authors wishing to rely on
the Tag Processor therefore are free to pass around data as normal strings.

Convenience methods for adding and removing CSS class names exist in order to
remove the need to process the `class` attribute.

```php
// Update heading block class names
$p = new WP_HTML_Tag_Processor( $html );
while ( $p->next_tag() ) {
    switch ( $p->get_tag() ) {
	case 'H1':
	case 'H2':
	case 'H3':
	case 'H4':
	case 'H5':
	case 'H6':
	    $p->remove_class( 'wp-heading' );
	    $p->add_class( 'wp-block-heading' );
	    break;
}
return $p->get_updated_html();
```

The Tag Processor is intended to be a reliable low-level library for traversing
HTML documents and higher-level APIs are to be built upon it. Immediately, and
in Core Gutenberg blocks it is meant to replace HTML modification that currently
relies on RegExp patterns and simpler string replacements.

See the following for examples of such replacement:
    WordPress/gutenberg@1315784
    https://github.com/WordPress/gutenberg/pull/45469/files#diff-dcd9e1f9b87ca63efe9f1e834b4d3048778d3eca41aa39c636f8b16a5bb452d2L46
    WordPress/gutenberg#46625

Co-Authored-By: Adam Zielinski <adam@adamziel.com>
Co-Authored-By: Bernie Reiter <ockham@raz.or.at>
Co-Authored-By: Grzegorz Ziolkowski <grzegorz@gziolo.pl>
@azaozz
Copy link
Contributor

azaozz commented Jan 28, 2023

@azaozz
Copy link
Contributor

azaozz commented Jan 28, 2023

Wondering if it would be better to move the if ( ! class_exists( ... ) ) to wp-html.php (where all files are required). There are quite a few places in core where similar code is used. It will also stop WPCS from complaining about indentation.

@dmsnell
Copy link
Member Author

dmsnell commented Jan 28, 2023

Wondering if it would be better to move the if ( ! class_exists( ... ) ) to wp-html.php

agreed. I've already done this in the Gutenberg side to get around the linter and will move the changes here once that merges, or beforehand maybe if I can.

Copy link
Contributor

@costdev costdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dmsnell and to everyone who worked on this!

I've left some thoughts below 🙂

src/wp-includes/class-wp-html-attribute-token.php Outdated Show resolved Hide resolved
src/wp-includes/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
tests/phpunit/tests/html/wpHtmlTagProcessorTest.php Outdated Show resolved Hide resolved
tests/phpunit/tests/html/wpHtmlTagProcessorTest.php Outdated Show resolved Hide resolved
tests/phpunit/tests/html/wpHtmlTagProcessorTest.php Outdated Show resolved Hide resolved
tests/phpunit/tests/html/wpHtmlTagProcessorTest.php Outdated Show resolved Hide resolved
src/wp-includes/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
@ockham
Copy link
Contributor

ockham commented Jan 30, 2023

Wondering if it would be better to move the if ( ! class_exists( ... ) ) to wp-html.php

agreed. I've already done this in the Gutenberg side to get around the linter and will move the changes here once that merges, or beforehand maybe if I can.

I don't think we should even include those guards at all here -- after all, Core is the most "canonical" source for those classes, and its versions should "prevail" over the ones provided by the Gutenberg plugin for back-compat (which shouldn't happen normally, but in case the order of imports is mixed up for whatever reason).

@hellofromtonya
Copy link
Contributor

Wondering if it would be better to move the if ( ! class_exists( ... ) ) to wp-html.php (where all files are required). There are quite a few places in core where similar code is used. It will also stop WPCS from complaining about indentation.

Hmm, IMO these guards shouldn't be in Core. Why? Core should expect to load itself. The loading of these classes into memory happens early and before a plugin or theme loads. Thus, class_exists() is unnecessary in Core as another instance of Core's classes should not already be in memory when Core loads.

That said, code extending Core does need these guards. For example, Gutenberg needs these guards to prevent loading its classes of the same name.

dmsnell and others added 4 commits January 30, 2023 13:29
* Rename data providers to match test per coding standard.
* Restructure data provider datasets into a single array form for consistency.
* Add `WP_HTML_Tag_Processor::` to @Covers methods per coding standard.
* Add empty line between set up and assertion groupings.
* Moved well-formed HTML into separate test of updating attributes.
* Replaced assertEquals() with assertSame().
@azaozz
Copy link
Contributor

azaozz commented Jan 31, 2023

I don't think we should even include those guards at all here -- after all, Core is the most "canonical"

Core does not have the responsibility of checking if its classes are already in memory.

It depends :)

What I'm getting at here is that both Gutenberg's PHP (in /lib) and core's PHP should be considered parts of the same codebase. I wouldn't be opposed to have class_exists() and function_exists() in core when they can safeguard against possible fatal errors, now or in the future.

If the Tag Processor's classes (files) are always loaded from wp-settings.php, which happens earlier than any plugin, I agree the class_exists() can only be in Gutenberg. For other cases (conditional loading, etc.) both places should have them.

@dmsnell
Copy link
Member Author

dmsnell commented Jan 31, 2023

@costdev I've marked a number of our conversations as resolved to reduce the noise when reading this PR, but if you didn't find them resolved please un-resolve them and let me know. normally I wouldn't have done that but there were so many non-controversial changes you suggested and I figured it'd be easiest if I hid them from the chat once I accepted your changes.

Copy link

@anton-vlasenko anton-vlasenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell Please find my notes below.
Thanks!

Comment on lines 30 to 38
public $start;

/**
* Byte offset into document where span ends.
*
* @since 6.2.0
* @var int
*/
public $end;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declaring these properties as public breaks encapsulation - one of the basic OOP principles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @anton-vlasenko - do you see a particular problem with them being public or are you sharing your value about how you prefer the code to be written?

what benefit do you see us gaining by hiding these?

of note, these are helper classes specifically for performance reasons, and are essentially stand-ins for PHP's lack of records. there is a massive gain from using these vs. using associative arrays, and if PHP coding style were more comfortable with more than one class per file they'd be in the main tag processor file.

that being said, unless you have a concrete benefit in mind or see a clear problem with the public visibility I'd very much to prefer keeping them as is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply, @dmsnell.
I've replied here.

src/wp-includes/html/class-wp-html-text-replacement.php Outdated Show resolved Hide resolved
src/wp-includes/html/class-wp-html-attribute-token.php Outdated Show resolved Hide resolved
}

while ( $this->parse_next_attribute() ) {
continue;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the continue operator is needed here.
It's the only and the last statement in the loop.

Copy link
Member Author

@dmsnell dmsnell Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is here explicitly as a compromise with the linter, because if the continue is gone someone will try again to come in and remove the loop body, which itself has been empirically one of the most dangerous code patterns in history, alongside null and zero-terminated strings.

// what the linter fixes want
while ( $this->parse_next_attribute() );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tried to commit this code

while ( $this->parse_next_attribute() ) {
    // No code is needed here.
 }

and I got no complaints from the linter.
In my case, removing the continue worked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appeared in Gutenberg not in Core, and I'd prefer having separate subtle forms like this in both repos. are you okay leaving the continue in? it's a lose-lose situation IMO and given that the linter's choice is dangerous and this is benign I'd rather leave the benign option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm ok with that.
The continue operator in the loop body doesn't seem to affect performance significantly.

Copy link
Member Author

@dmsnell dmsnell Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue operator in the loop body doesn't seem to affect performance significantly.

The continue in the loop body should not affect the performance at all.

src/wp-includes/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
*
* @param WP_HTML_Text_Replacement $a First attribute update.
* @param WP_HTML_Text_Replacement $b Second attribute update.
* @return int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return int
* @return int An integer less than, equal to, or greater than zero if
the first argument is considered to be respectively less than,
equal to, or greater than the second.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a single-line annotation because I found that some IDEs truncate multi-line descriptions like this

Copy link

@anton-vlasenko anton-vlasenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found any performance related issues.
However, set_attribute()s memory usage looks suspicious.

Update: #3920 (comment)

@dmsnell
Copy link
Member Author

dmsnell commented Feb 2, 2023

Correct me if I am wrong but I think I noticed parsing of script tags accounted for the old school <script></script> but couldn't see any tests for that. Unless I've missed something, could that be tested too.

@peterwilsoncc you should find several tests for script tag processing, including the case you mentioned, in test_next_tag_ignores_script_tag_contents. The exact string you noted isn't in the tests, but it's covered by existing tests and handled properly.

@hellofromtonya hellofromtonya dismissed their stale review February 2, 2023 19:31

Changes requested have been resolved. Dismissing this review to unblock the PR's progress.

@hellofromtonya
Copy link
Contributor

Status Update:

Following up on #3920 (comment).

  • All change requested are complete ✅

What's left for it to be committed?

  • Committer approval: currently there are no committers who have approved this PR yet
  • Completion of the performance assessment I asked @felixarntz to do ⚪

My weekend starts soon (well as soon as I clear my backport review and commit queue - not including this PR). I'll be back on Monday. The decision to commit this PR before Monday is in @azaozz and @felixarntz hands.

@azaozz @felixarntz IF there's consensus to commit it, then please commit it as soon as it's approved. Why? Other work is dependent upon this PR. So IF it's going into 6.2, sooner is better than later.

Thank you everyone for contributions, patience, and expertise!

cc @ntsekouras and @Mamaduka to keep you aware.

@hellofromtonya
Copy link
Contributor

For transparency, I was asked today:
Why not just commit it and then do the performance regression testing during Beta or RC?

Here's my response:

The reason why I asked to get it done before it gets committed is this:

If there's a performance regression that warrants this not going in, then all of the other work this is dependent upon it needs to be reworked and plans changed. That needs to known ASAP. Waiting to do it later means putting those features and the 6.2 release at risk.

@felixarntz
Copy link
Member

I ran a performance test for this via #3971, a PR purely intended for testing, which is directly based on this PR here, but also includes relevant usage from #3914 and dmsnell#1 (see 8d7ebfb).

TLDR: The performance impact appears to be negligible, which is a great outcome. There is no clear tendency of increase or decline in performance. Overall the data, if anything, indicates a this PR being a tiny bit faster than trunk today. The difference is too small to be able to surely say that, but at least it should be clear now that this is not a concern from a performance perspective. 👍

The test is based on the Gutenberg demo content, tested with the last three default themes, using the median from 20 requests in each scenario (#3971 vs trunk).

See this spreadsheet for the data and also notes on how exactly the test was conducted.

So I believe this should be unblocked now. Thanks everyone for the patience, I am really glad to see this positive outcome.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the patience and continued iterations on this. Following the previous feedback from others being addressed and performance not looking to be a problem, I am happy to approve this.

Of course that performance analysis is a lab analysis and therefore not entirely comprehensive (performance may differ depending on which / how many blocks are used), but the impact with the relatively large Gutenberg demo content is so small that I don't think there's anything to be worried about in terms of performance. 👍

We will still need to commit the actual usage of this API (see e.g. #3914), but we can definitely get this in and then go from there.

@hellofromtonya
Copy link
Contributor

What is the performance regression test I've asked for?

For blocks or features that are in Core now (<=6.1) but will change to use this API, what is the performance impact when users upgrade to 6.2?

Why?
This API will execute a lot of PHP code vs running RegEx and/or PHP native string functions. It will have an impact How much of an impact is unknown. Why? This type of testing has not happened yet.

The data collected from a regression test is needed to know what the impact will be to users upgrading to 6.2. That information will then help with decision making about if that impact is acceptable or not.

There are many benefits to this API. For example, it will make maintaining string manipulation much easier and understandable. It will help resolve the challenges of working with HTML markup in PHP (whitespaces, malformed, etc.), though these challenges have always existed. That said, it also has be performant to not cause issues for users.

@dmsnell
Copy link
Member Author

dmsnell commented Feb 2, 2023

Note to whoever commits this: I've posted a commit message in a details tag in the description at the top. A couple typos are fixed from the git commit message, and I've updated the props syntax for core instead of for GitHub

pento pushed a commit that referenced this pull request Feb 3, 2023
This commit pulls in the HTML Tag Processor from the Gutenbeg repository.

The Tag Processor attempts to be an HTML5-spec-compliant parser that provides the ability in PHP to find specific HTML tags and then add, remove, or update attributes on that tag. It provides a safe and reliable way to modify the attribute on HTML tags.

More information: #3920.

Props: antonvlasenko, bernhard-reiter, costdev, dmsnell, felixarntz, gziolo, hellofromtonya, zieladam, flixos90, ntsekouras, peterwilsoncc, swissspidy, andrewserong, onemaggie, get_dave, aristath, scruffian, justlevine, andraganescu, noisysocks, dlh, soean, cbirdsong, revgeorge, azaozz.
Fixes #57575.

git-svn-id: https://develop.svn.wordpress.org/trunk@55203 602fd350-edb4-49c9-b593-d223f7449a82
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Feb 3, 2023
This commit pulls in the HTML Tag Processor from the Gutenbeg repository.

The Tag Processor attempts to be an HTML5-spec-compliant parser that provides the ability in PHP to find specific HTML tags and then add, remove, or update attributes on that tag. It provides a safe and reliable way to modify the attribute on HTML tags.

More information: WordPress/wordpress-develop#3920.

Props: antonvlasenko, bernhard-reiter, costdev, dmsnell, felixarntz, gziolo, hellofromtonya, zieladam, flixos90, ntsekouras, peterwilsoncc, swissspidy, andrewserong, onemaggie, get_dave, aristath, scruffian, justlevine, andraganescu, noisysocks, dlh, soean, cbirdsong, revgeorge, azaozz.
Fixes #57575.
Built from https://develop.svn.wordpress.org/trunk@55203


git-svn-id: http://core.svn.wordpress.org/trunk@54736 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@azaozz
Copy link
Contributor

azaozz commented Feb 3, 2023

Committed to core: https://core.trac.wordpress.org/changeset/55203.

@azaozz azaozz closed this Feb 3, 2023
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Feb 3, 2023
This commit pulls in the HTML Tag Processor from the Gutenbeg repository.

The Tag Processor attempts to be an HTML5-spec-compliant parser that provides the ability in PHP to find specific HTML tags and then add, remove, or update attributes on that tag. It provides a safe and reliable way to modify the attribute on HTML tags.

More information: WordPress/wordpress-develop#3920.

Props: antonvlasenko, bernhard-reiter, costdev, dmsnell, felixarntz, gziolo, hellofromtonya, zieladam, flixos90, ntsekouras, peterwilsoncc, swissspidy, andrewserong, onemaggie, get_dave, aristath, scruffian, justlevine, andraganescu, noisysocks, dlh, soean, cbirdsong, revgeorge, azaozz.
Fixes #57575.
Built from https://develop.svn.wordpress.org/trunk@55203


git-svn-id: https://core.svn.wordpress.org/trunk@54736 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@adamziel
Copy link
Contributor

adamziel commented Feb 3, 2023

This is amazing, I really appreciate everyone's collaboration on this one ❤️

VenusPR added a commit to VenusPR/Wordpress_Richard that referenced this pull request Mar 9, 2023
This commit pulls in the HTML Tag Processor from the Gutenbeg repository.

The Tag Processor attempts to be an HTML5-spec-compliant parser that provides the ability in PHP to find specific HTML tags and then add, remove, or update attributes on that tag. It provides a safe and reliable way to modify the attribute on HTML tags.

More information: WordPress/wordpress-develop#3920.

Props: antonvlasenko, bernhard-reiter, costdev, dmsnell, felixarntz, gziolo, hellofromtonya, zieladam, flixos90, ntsekouras, peterwilsoncc, swissspidy, andrewserong, onemaggie, get_dave, aristath, scruffian, justlevine, andraganescu, noisysocks, dlh, soean, cbirdsong, revgeorge, azaozz.
Fixes #57575.
Built from https://develop.svn.wordpress.org/trunk@55203


git-svn-id: http://core.svn.wordpress.org/trunk@54736 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@dmsnell dmsnell deleted the add/html-tag-processor branch June 8, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.