HTML API: Fix CDATA section null and whitespace handling #7230

sirreal · 2024-08-22T19:54:47Z

CDATA sections should behave like text. This means applying the rules
for inserting text in foreign content:

Null bytes should not change the frameset-ok flag.
Whitespace should not change the frameset-ok flag.
Other characters set frameset-ok flag to false.

Fix this behavior.

The HTML input looks like this (null byte has been replaced with visual character):

<svg><![CDATA[ ␀]]></svg><frameset>

In this case, frameset-ok should not be changed. The HTML API should fail with WP_HTML_Unsupported_Exception: Cannot process non-ignored FRAMESET tags.

The current behavior (fixed in this PR) is to add the CDATA token without any inspection. This is incorrect because, depending on the contents of the CDATA section, the frameset-ok flag should be set to false.

Trac ticket: https://core.trac.wordpress.org/ticket/61576
Follow up to: [58868]

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

CDATA sections should behave like text. This means applying the rules for inserting text in foreign content: - Null bytes should not change the frameset-ok flag. - Whitespace should not change the frameset-ok flag. - Other characters set frameset-ok flag to false. Fix this behavior.

github-actions · 2024-08-22T20:07:40Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

sirreal · 2024-08-22T20:30:23Z

src/wp-includes/html-api/class-wp-html-processor.php

+				$current_token        = $this->bookmarks[ $this->state->current_token->bookmark_name ];
+				$cdata_content_start  = $current_token->start + 9;
+				$cdata_content_length = $current_token->length - 12;
+				if ( $cdata_content_length !== strspn( $this->html, "\0 \t\n\f\r", $cdata_content_start, $cdata_content_length ) ) {


The yoda condition PHPCS flagged here seems to be a false positive. I suspect it's confusing a string literal on the line with a literal comparison, but the string literal is not being compared. It's fine to have it on the other side…

it doesn't define Yoda conditions the way most of us do, I think. by flipping this conditional it's happy.

I suspect it was finding a literal in the RHS of the comparison and thinking that literal are supposed to go on the left (yoda), but it ignores the fact that the literal is in a function call. But that's just my intuition.

Switching the order is fine if it satisfies the linter 🤷

I believe it's the variable on the left hand side that it doesn't like, unless both are variables, in which case it's happy and doesn't care.

joda condition checks like these aren't about literals, but accidental assignment to a variable.

github-actions · 2024-08-22T20:40:41Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

src/wp-includes/html-api/class-wp-html-processor.php

…ling

dmsnell

@sirreal with the merge of #7236 we should be able to add a short-circuit to skip the additional processing if the ->text_node_classification is WP_HTML_Tag_Processor::TEXT_IS_NULL_SEQUENCE || WP_HTML_Tag_Processor::TEXT_IS_WHITESPACE.

that may not make much of a difference, but it should have already been checked in those cases. maybe it's good enough to leave this in since we don't have to perform any character reference decoding.

what do you think?

…ling

sirreal · 2024-09-03T08:20:40Z

After reviewing this and text subdivision, my preference is to go the other direction. I've pushed a change to stop running subdivision on CDATA and let this change handle CDATA on its own.

sirreal · 2024-09-03T08:23:11Z

src/wp-includes/html-api/class-wp-html-tag-processor.php

@@ -3368,70 +3368,55 @@ public function get_comment_type(): ?string {
 	 * @return bool Whether the text node was subdivided.
 	 */
 	public function subdivide_text_appropriately(): bool {


When viewing with whitespace changes, the diff here is confusing. I've adjusted this method so that it only operates on text nodes.

dmsnell

This seems fitting. It seemed fitting to join them together since text nodes and CDATA sections both represent text content. But computationally, CDATA sections are easier and don't need the abstraction.

I think I prefer classifying CDATA sections, but it also seems rather unlikely to find fully NULL byte or fully whitespace content inside of a CDATA section, so in practice it may not matter.

…ling

…orbid FRAMESET. When CDATA sections (which can only occur inside SVG and MathML content) consist only of NULL bytes or whitespace characters they should not clear the "frameset ok" flag. Previously they have always been clearing this flag, but in this patch the logic is updated to detect these sequences properly. Developed in #7230 Discussed in https://core.trac.wordpress.org/ticket/61576 Follow-up to [58867]. Props dmsnell, jonsurrell. See #61576. git-svn-id: https://develop.svn.wordpress.org/trunk@58977 602fd350-edb4-49c9-b593-d223f7449a82

…orbid FRAMESET. When CDATA sections (which can only occur inside SVG and MathML content) consist only of NULL bytes or whitespace characters they should not clear the "frameset ok" flag. Previously they have always been clearing this flag, but in this patch the logic is updated to detect these sequences properly. Developed in WordPress/wordpress-develop#7230 Discussed in https://core.trac.wordpress.org/ticket/61576 Follow-up to [58867]. Props dmsnell, jonsurrell. See #61576. Built from https://develop.svn.wordpress.org/trunk@58977 git-svn-id: http://core.svn.wordpress.org/trunk@58373 1a063a9b-81f0-0310-95a4-ce76da25c4cd

dmsnell · 2024-09-03T19:56:43Z

Merged in [58977]
79c1047

…orbid FRAMESET. When CDATA sections (which can only occur inside SVG and MathML content) consist only of NULL bytes or whitespace characters they should not clear the "frameset ok" flag. Previously they have always been clearing this flag, but in this patch the logic is updated to detect these sequences properly. Developed in WordPress/wordpress-develop#7230 Discussed in https://core.trac.wordpress.org/ticket/61576 Follow-up to [58867]. Props dmsnell, jonsurrell. See #61576. Built from https://develop.svn.wordpress.org/trunk@58977 git-svn-id: https://core.svn.wordpress.org/trunk@58373 1a063a9b-81f0-0310-95a4-ce76da25c4cd

Fix comment about cdata section handling

ff41a65

sirreal commented Aug 22, 2024

View reviewed changes

sirreal marked this pull request as ready for review August 22, 2024 20:40

sirreal mentioned this pull request Aug 22, 2024

HTML API: Fix a bug where the namespace was forced to 'html' #7232

Closed

A Yoda violation this is not.

47ccb34

sirreal force-pushed the html-api/fix-cdata-whitespace-frameset-handling branch from d5a528c to 47ccb34 Compare August 22, 2024 20:49

dmsnell reviewed Aug 23, 2024

View reviewed changes

src/wp-includes/html-api/class-wp-html-processor.php Show resolved Hide resolved

Update comment and rearrange joda condition

8554f21

dmsnell mentioned this pull request Aug 24, 2024

HTML API: Plans for 6.7 WordPress/gutenberg#60396

Closed

19 tasks

Merge branch 'trunk' into html-api/fix-cdata-whitespace-frameset-hand…

702d3bf

…ling

sirreal mentioned this pull request Aug 27, 2024

HTML API: Allow subdividing text nodes for appropriate parsing rules. #7236

Closed

dmsnell reviewed Sep 2, 2024

View reviewed changes

sirreal added 2 commits September 3, 2024 10:07

Merge branch 'trunk' into html-api/fix-cdata-whitespace-frameset-hand…

381b04b

…ling

Remove text subdivision on CDATA nodes

7a92ed3

sirreal commented Sep 3, 2024

View reviewed changes

sirreal requested a review from dmsnell September 3, 2024 08:23

dmsnell approved these changes Sep 3, 2024

View reviewed changes

Merge branch 'trunk' into html-api/fix-cdata-whitespace-frameset-hand…

a22147e

…ling

dmsnell closed this Sep 3, 2024

sirreal deleted the html-api/fix-cdata-whitespace-frameset-handling branch October 8, 2024 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML API: Fix CDATA section null and whitespace handling #7230

HTML API: Fix CDATA section null and whitespace handling #7230

sirreal commented Aug 22, 2024 •

edited

Loading

github-actions bot commented Aug 22, 2024

sirreal Aug 22, 2024

dmsnell Aug 23, 2024

sirreal Aug 23, 2024

dmsnell Aug 23, 2024

github-actions bot commented Aug 22, 2024 •

edited

Loading

dmsnell left a comment

sirreal commented Sep 3, 2024

sirreal Sep 3, 2024

dmsnell left a comment

dmsnell commented Sep 3, 2024

HTML API: Fix CDATA section null and whitespace handling #7230

HTML API: Fix CDATA section null and whitespace handling #7230

Conversation

sirreal commented Aug 22, 2024 • edited Loading

github-actions bot commented Aug 22, 2024

Test using WordPress Playground

Some things to be aware of

sirreal Aug 22, 2024

Choose a reason for hiding this comment

dmsnell Aug 23, 2024

Choose a reason for hiding this comment

sirreal Aug 23, 2024

Choose a reason for hiding this comment

dmsnell Aug 23, 2024

Choose a reason for hiding this comment

github-actions bot commented Aug 22, 2024 • edited Loading

dmsnell left a comment

Choose a reason for hiding this comment

sirreal commented Sep 3, 2024

sirreal Sep 3, 2024

Choose a reason for hiding this comment

dmsnell left a comment

Choose a reason for hiding this comment

dmsnell commented Sep 3, 2024

sirreal commented Aug 22, 2024 •

edited

Loading

github-actions bot commented Aug 22, 2024 •

edited

Loading