Skip to content

Commit

Permalink
Tag Processor: Fix a problem backing up too far after updating HTML (#…
Browse files Browse the repository at this point in the history
…46598)

A defect introduced in #46018 led to the tag processor backing up
one index too far after flushing its queued changes on a document.

For most operations this didn't cause any harm because when immediately
moving forward after an update, the `next_tag()` returned to the same
spot: it was backing up to one position before the current tag instead
of at the start of the current tag.

Unfortunately, when the current tag was the first in the document this
would lead the processor to rewind to position `-1`, right before the
start of the document, and lead to errors with `strpos()` when it
received out-of-bounds indices.

In this fix we're correcting the adjustment for the HTML tag's `<` and
documenting the math in the file so that it's clearer why it's there
and providing guidance should another fix be necessary.

As supporting work to this patch we're making the text replacement sort
stable, inside the tag processor, for when determining the order in which
to apply text replacements. This isn't necessary for the runtime but is
a nuissance for testing because different PHP versions produce different
unstable sort orderings and this prevents that from causing the unit
tests to fail in one version but pass in another.

Props to @anton-vlasenko for finding this bug.

Enforce sort stability when flushing out text replacements
  • Loading branch information
dmsnell committed Dec 16, 2022
1 parent a10e7eb commit 5b7ef38
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 39 deletions.
34 changes: 29 additions & 5 deletions lib/experimental/html/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1331,16 +1331,33 @@ public function seek( $bookmark_name ) {
}

/**
* Sort function to arrange objects with a start property in ascending order.
* Compare two WP_HTML_Text_Replacement objects.
*
* @since 6.2.0
*
* @param object $a First attribute update.
* @param object $b Second attribute update.
* @param WP_HTML_Text_Replacement $a First attribute update.
* @param WP_HTML_Text_Replacement $b Second attribute update.
* @return integer
*/
private static function sort_start_ascending( $a, $b ) {
return $a->start - $b->start;
$by_start = $a->start - $b->start;
if ( $by_start !== 0 ) {
return $by_start;
}

$by_text = isset( $a->text, $b->text ) ? strcmp( $a->text, $b->text ) : 0;
if ( $by_text !== 0 ) {
return $by_text;
}

/*
* We shouldn't ever get here because it would imply
* that we have two identical updates, or that we're
* trying to replace the same input text twice. Still
* we'll handle this sort to preserve determinism,
* which might come in handy when debugging.
*/
return $a->end - $b->end;
}

/**
Expand Down Expand Up @@ -1657,7 +1674,14 @@ public function get_updated_html() {
$this->updated_bytes = strlen( $this->updated_html );

// 3. Point this tag processor at the original tag opener and consume it
$this->parsed_bytes = strlen( $updated_html_up_to_current_tag_name_end ) - $this->tag_name_length - 2;

/*
* When we get here we're at the end of the tag name, and we want to rewind to before it
* <p>Previous HTML<em>More HTML</em></p>
* ^ | back up by the length of the tag name plus the opening <
* \<-/ back up by strlen("em") + 1 ==> 3
*/
$this->parsed_bytes = strlen( $updated_html_up_to_current_tag_name_end ) - $this->tag_name_length - 1;
$this->next_tag();

return $this->html;
Expand Down
68 changes: 34 additions & 34 deletions phpunit/html/wp-html-tag-processor-test.php
Original file line number Diff line number Diff line change
Expand Up @@ -1132,7 +1132,7 @@ public function data_malformed_tag() {
$examples = array();
$examples['Invalid entity inside attribute value'] = array(
'<img src="https://s0.wp.com/i/atat.png" title="&; First &lt;title&gt; is &notit;" TITLE="second title" title="An Imperial &imperial; AT-AT"><span>test</span>',
'<img foo="bar" class="firstTag" src="https://s0.wp.com/i/atat.png" title="&; First &lt;title&gt; is &notit;" TITLE="second title" title="An Imperial &imperial; AT-AT"><span class="secondTag">test</span>',
'<img class="firstTag" foo="bar" src="https://s0.wp.com/i/atat.png" title="&; First &lt;title&gt; is &notit;" TITLE="second title" title="An Imperial &imperial; AT-AT"><span class="secondTag">test</span>',
);

$examples['HTML tag opening inside attribute value'] = array(
Expand All @@ -1147,152 +1147,152 @@ public function data_malformed_tag() {

$examples['Single and double quotes in attribute value'] = array(
'<p title="Demonstrating how to use single quote (\') and double quote (&quot;)"><span>test</span>',
'<p foo="bar" class="firstTag" title="Demonstrating how to use single quote (\') and double quote (&quot;)"><span class="secondTag">test</span>',
'<p class="firstTag" foo="bar" title="Demonstrating how to use single quote (\') and double quote (&quot;)"><span class="secondTag">test</span>',
);

$examples['Unquoted attribute values'] = array(
'<hr a=1 a=2 a=3 a=5 /><span>test</span>',
'<hr foo="bar" class="firstTag" a=1 a=2 a=3 a=5 /><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" a=1 a=2 a=3 a=5 /><span class="secondTag">test</span>',
);

$examples['Double-quotes escaped in double-quote attribute value'] = array(
'<hr title="This is a &quot;double-quote&quot;"><span>test</span>',
'<hr foo="bar" class="firstTag" title="This is a &quot;double-quote&quot;"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" title="This is a &quot;double-quote&quot;"><span class="secondTag">test</span>',
);

$examples['Unquoted attribute value'] = array(
'<hr id=code><span>test</span>',
'<hr foo="bar" class="firstTag" id=code><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id=code><span class="secondTag">test</span>',
);

$examples['Unquoted attribute value with tag-like value'] = array(
'<hr id= <code> ><span>test</span>',
'<hr foo="bar" class="firstTag" id= <code> ><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id= <code> ><span class="secondTag">test</span>',
);

$examples['Unquoted attribute value with tag-like value followed by tag-like data'] = array(
'<hr id=code>><span>test</span>',
'<hr foo="bar" class="firstTag" id=code>><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id=code>><span class="secondTag">test</span>',
);

$examples['1'] = array(
'<hr id=&quo;code><span>test</span>',
'<hr foo="bar" class="firstTag" id=&quo;code><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id=&quo;code><span class="secondTag">test</span>',
);

$examples['2'] = array(
'<hr id/test=5><span>test</span>',
'<hr foo="bar" class="firstTag" id/test=5><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id/test=5><span class="secondTag">test</span>',
);

$examples['4'] = array(
'<hr title="<hr>"><span>test</span>',
'<hr foo="bar" class="firstTag" title="<hr>"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" title="<hr>"><span class="secondTag">test</span>',
);

$examples['5'] = array(
'<hr id=>code><span>test</span>',
'<hr foo="bar" class="firstTag" id=>code><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id=>code><span class="secondTag">test</span>',
);

$examples['6'] = array(
'<hr id"quo="test"><span>test</span>',
'<hr foo="bar" class="firstTag" id"quo="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id"quo="test"><span class="secondTag">test</span>',
);

$examples['7'] = array(
'<hr id' . $null_byte . 'zero="test"><span>test</span>',
'<hr foo="bar" class="firstTag" id' . $null_byte . 'zero="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id' . $null_byte . 'zero="test"><span class="secondTag">test</span>',
);

$examples['8'] = array(
'<hr >id="test"><span>test</span>',
'<hr foo="bar" class="firstTag" >id="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" >id="test"><span class="secondTag">test</span>',
);

$examples['9'] = array(
'<hr =id="test"><span>test</span>',
'<hr foo="bar" class="firstTag" =id="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" =id="test"><span class="secondTag">test</span>',
);

$examples['10'] = array(
'</><span>test</span>',
'</><span foo="bar" class="firstTag">test</span>',
'</><span class="firstTag" foo="bar">test</span>',
);

$examples['11'] = array(
'The applicative operator <* works well in Haskell; <data-tag> is what?<span>test</span>',
'The applicative operator <* works well in Haskell; <data-tag foo="bar" class="firstTag"> is what?<span class="secondTag">test</span>',
'The applicative operator <* works well in Haskell; <data-tag class="firstTag" foo="bar"> is what?<span class="secondTag">test</span>',
);

$examples['12'] = array(
'<3 is a heart but <t3> is a tag.<span>test</span>',
'<3 is a heart but <t3 foo="bar" class="firstTag"> is a tag.<span class="secondTag">test</span>',
'<3 is a heart but <t3 class="firstTag" foo="bar"> is a tag.<span class="secondTag">test</span>',
);

$examples['13'] = array(
'<?comment --><span>test</span>',
'<?comment --><span foo="bar" class="firstTag">test</span>',
'<?comment --><span class="firstTag" foo="bar">test</span>',
);

$examples['14'] = array(
'<!-- this is a comment. no <strong>tags</strong> allowed --><span>test</span>',
'<!-- this is a comment. no <strong>tags</strong> allowed --><span foo="bar" class="firstTag">test</span>',
'<!-- this is a comment. no <strong>tags</strong> allowed --><span class="firstTag" foo="bar">test</span>',
);

$examples['15'] = array(
'<![CDATA[This <is> a <strong id="yes">HTML Tag</strong>]]><span>test</span>',
'<![CDATA[This <is> a <strong id="yes">HTML Tag</strong>]]><span foo="bar" class="firstTag">test</span>',
'<![CDATA[This <is> a <strong id="yes">HTML Tag</strong>]]><span class="firstTag" foo="bar">test</span>',
);

$examples['16'] = array(
'<hr ===name="value"><span>test</span>',
'<hr foo="bar" class="firstTag" ===name="value"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" ===name="value"><span class="secondTag">test</span>',
);

$examples['17'] = array(
'<hr asdf="test"><span>test</span>',
'<hr foo="bar" class="firstTag" asdf="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" asdf="test"><span class="secondTag">test</span>',
);

$examples['18'] = array(
'<hr =asdf="tes"><span>test</span>',
'<hr foo="bar" class="firstTag" =asdf="tes"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" =asdf="tes"><span class="secondTag">test</span>',
);

$examples['19'] = array(
'<hr ==="test"><span>test</span>',
'<hr foo="bar" class="firstTag" ==="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" ==="test"><span class="secondTag">test</span>',
);

$examples['20'] = array(
'<hr =><span>test</span>',
'<hr foo="bar" class="firstTag" =><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" =><span class="secondTag">test</span>',
);

$examples['21'] = array(
'<hr =5><span>test</span>',
'<hr foo="bar" class="firstTag" =5><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" =5><span class="secondTag">test</span>',
);

$examples['22'] = array(
'<hr ==><span>test</span>',
'<hr foo="bar" class="firstTag" ==><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" ==><span class="secondTag">test</span>',
);

$examples['23'] = array(
'<hr ===><span>test</span>',
'<hr foo="bar" class="firstTag" ===><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" ===><span class="secondTag">test</span>',
);

$examples['24'] = array(
'<hr disabled><span>test</span>',
'<hr foo="bar" class="firstTag" disabled><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" disabled><span class="secondTag">test</span>',
);

$examples['25'] = array(
'<hr a"sdf="test"><span>test</span>',
'<hr foo="bar" class="firstTag" a"sdf="test"><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" a"sdf="test"><span class="secondTag">test</span>',
);

$examples['Multiple unclosed tags treated as a single tag'] = array(
Expand All @@ -1306,7 +1306,7 @@ public function data_malformed_tag() {
HTML
,
<<<HTML
<hr foo="bar" class="firstTag" id=">"code
<hr class="firstTag" foo="bar" id=">"code
<hr id="value>"code
<hr id="/>"code
<hr id="value/>"code
Expand All @@ -1318,12 +1318,12 @@ public function data_malformed_tag() {

$examples['27'] = array(
'<hr id =5><span>test</span>',
'<hr foo="bar" class="firstTag" id =5><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id =5><span class="secondTag">test</span>',
);

$examples['28'] = array(
'<hr id a =5><span>test</span>',
'<hr foo="bar" class="firstTag" id a =5><span class="secondTag">test</span>',
'<hr class="firstTag" foo="bar" id a =5><span class="secondTag">test</span>',
);

return $examples;
Expand Down

1 comment on commit 5b7ef38

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky tests detected.
Some tests passed with failed attempts. The failures may not be related to this commit but are still reported for visibility. See the documentation for more information.

🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/3716506846
📝 Reported issues:

Please sign in to comment.