Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Add select handling #5908

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b867b58
Implement SELECT and related tags handling
sirreal Jan 19, 2024
9245930
Remove SELECT from unsupported elements test
sirreal Jul 3, 2024
267a27b
Remove SELECT scope optgroup/option tests
sirreal Jul 3, 2024
b0475fb
Remove SELECT, OPTION, OPTGROUP from unsupported tags
sirreal Jul 3, 2024
1944b66
Remove OPTION,OPTGROUP from gen implied end tags unsupported test
sirreal Jul 3, 2024
8d4b4df
Add OPTION, OPTGROUP to implied end tags
sirreal Jul 4, 2024
1404119
Update step_in_select since tag
sirreal Jul 4, 2024
dc0bbb9
Update since tags on in_select_scope
sirreal Jul 4, 2024
9d99844
Implement in_select_scope via …in_specific_scope
sirreal Jul 4, 2024
3085405
Revert "Implement in_select_scope via …in_specific_scope"
sirreal Jul 4, 2024
e095c62
Add current_node_is method
sirreal Jul 4, 2024
4026493
Merge branch 'html-api/add-current-node-is' into html-api/handle-sele…
sirreal Jul 4, 2024
76c899a
Add text, comment, doctype handling
sirreal Jul 4, 2024
52fd8e5
Handle +HTML case
sirreal Jul 4, 2024
6a10eb0
Update several cases to use current_node_is
sirreal Jul 4, 2024
8b3a9e4
Simplify -OPTGROUP handling implementation
sirreal Jul 4, 2024
87c6596
Add input,keygen,textarea,script,template handling
sirreal Jul 4, 2024
d464330
Add "anything else" handling
sirreal Jul 4, 2024
ddbd676
Add step_in_head stub
sirreal Jul 4, 2024
f6d686d
Multi-line comments should be block comments
sirreal Jul 4, 2024
783cef6
fixup! Update several cases to use current_node_is
sirreal Jul 4, 2024
3222275
Fix step_in_head documentation block
sirreal Jul 4, 2024
eb126e8
Merge branch 'trunk' into html-api/handle-select-tag
dmsnell Jul 4, 2024
f2e1d03
Apply feedback requests.
dmsnell Jul 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 41 additions & 4 deletions src/wp-includes/html-api/class-wp-html-open-elements.php
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,28 @@ public function current_node() {
return $current_node ? $current_node : null;
}

/**
* Checks if the node at the top of the stack matches provided node name.
*
* @example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this style of example only comes over from the Gutenberg end, and to play well with the WordPress documentation we should follow this pattern

/**
 *
 * Example:
 *
 *     // Is the current node a text node?
 *     $stack->current_node_is( '#text' );
 *
 *     // Is the current node a DIV element?
 *     $stack->current_node_is( 'DIV' );
 */

That is: Example:, a blank line, and then four spaces to indent the code. This works well with various IDE support thankfully.

* // Is the current node a text node:
* $stack->current_node_is( '#text' );
*
* // Is the current node a DIV element:
* $stack->current_node_is( 'DIV' );
*
* @since 6.7.0
*
* @param string $node_name The node name to match. Provide a tag name for tags or a
* token name for other types of tokens.
* @return bool True if there are nodes on the stack and the top node has
* a matching node_name.
*/
public function current_node_is( string $node_name ): bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #6968.

$current_node = end( $this->stack );
return $current_node && $current_node->node_name === $node_name;
}

/**
* Returns whether an element is in a specific scope.
*
Expand Down Expand Up @@ -269,19 +291,34 @@ public function has_element_in_table_scope( $tag_name ) {
/**
* Returns whether a particular element is in select scope.
*
* @since 6.4.0
* @since 6.4.0 Stub implementation (throws).
* @since 6.7.0 Full implementation.
*
* @see https://html.spec.whatwg.org/#has-an-element-in-select-scope
*
* @throws WP_HTML_Unsupported_Exception Always until this function is implemented.
* > The stack of open elements is said to have a particular element in select scope when it has
* > that element in the specific scope consisting of all element types except the following:
* > - optgroup in the HTML namespace
* > - option in the HTML namespace
*
* @param string $tag_name Name of tag to check.
* @return bool Whether given element is in scope.
*/
public function has_element_in_select_scope( $tag_name ) {
throw new WP_HTML_Unsupported_Exception( 'Cannot process elements depending on select scope.' );
foreach ( $this->walk_up() as $node ) {
if ( $node->node_name === $tag_name ) {
return true;
}

return false; // The linter requires this unreachable code until the function is implemented and can return.
if (
'OPTION' !== $node->node_name &&
'OPTGROUP' !== $node->node_name
) {
return false;
}
}

return false;
}

/**
Expand Down
239 changes: 235 additions & 4 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@
*
* - Containers: ADDRESS, BLOCKQUOTE, DETAILS, DIALOG, DIV, FOOTER, HEADER, MAIN, MENU, SPAN, SUMMARY.
* - Custom elements: All custom elements are supported. :)
* - Form elements: BUTTON, DATALIST, FIELDSET, INPUT, LABEL, LEGEND, METER, PROGRESS, SEARCH.
* - Form elements: BUTTON, DATALIST, FIELDSET, INPUT, LABEL, LEGEND, METER, OPTGROUP, OPTION, PROGRESS, SEARCH, SELECT.
sirreal marked this conversation as resolved.
Show resolved Hide resolved
* - Formatting elements: B, BIG, CODE, EM, FONT, I, PRE, SMALL, STRIKE, STRONG, TT, U, WBR.
* - Heading elements: H1, H2, H3, H4, H5, H6, HGROUP.
* - Links: A.
Expand Down Expand Up @@ -757,6 +757,12 @@ public function step( $node_to_process = self::PROCESS_NEXT_NODE ) {
case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY:
return $this->step_in_body();

case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD:
return $this->step_in_head();

case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT:
return $this->step_in_select();

default:
$this->last_error = self::ERROR_UNSUPPORTED;
throw new WP_HTML_Unsupported_Exception( "No support for parsing in the '{$this->state->insertion_mode}' state." );
Expand Down Expand Up @@ -1336,6 +1342,51 @@ private function step_in_body() {
case '+TRACK':
$this->insert_html_element( $this->state->current_token );
return true;

/*
* > A start tag whose tag name is "select"
*/
case '+SELECT':
$this->reconstruct_active_formatting_elements();
$this->insert_html_element( $this->state->current_token );
$this->state->frameset_ok = false;

/*
* If the insertion mode is one of
* - "in table"
* - "in caption"
* - "in table body"
* - "in row"
* - "in cell"
* then switch the insertion mode to "in select in table"
*
* Otherwise, switch the insertion mode to "in select".
*/
switch ( $this->state->insertion_mode ) {
case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
$this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE;
break;
default:
$this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT;
break;
}
return true;

/*
* > A start tag whose tag name is one of: "optgroup", "option"
*/
case '+OPTGROUP':
case '+OPTION':
if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
$this->state->stack_of_open_elements->pop();
}
$this->reconstruct_active_formatting_elements();
$this->insert_html_element( $this->state->current_token );
return true;
}

/*
Expand Down Expand Up @@ -1378,16 +1429,13 @@ private function step_in_body() {
case 'NOFRAMES':
case 'NOSCRIPT':
case 'OBJECT':
case 'OPTGROUP':
case 'OPTION':
case 'PLAINTEXT':
case 'RB':
case 'RP':
case 'RT':
case 'RTC':
case 'SARCASM':
case 'SCRIPT':
case 'SELECT':
case 'STYLE':
case 'SVG':
case 'TABLE':
Expand Down Expand Up @@ -1448,6 +1496,184 @@ private function step_in_body() {
}
}

/*
* Parses next element in the 'in head' insertion mode.
*
* This internal function performs the 'in head' insertion mode
* logic for the generalized WP_HTML_Processor::step() function.
*
* @since 6.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can note that this is a stub.

*
* @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
*
* @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhead
* @see WP_HTML_Processor::step
*
* @return bool Whether an element was found.
*/
private function step_in_head() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stub out the rest of these step_in_* methods.

$this->last_error = self::ERROR_UNSUPPORTED;
throw new WP_HTML_Unsupported_Exception( "No support for parsing in the '{$this->state->insertion_mode}' state." );
}

/**
* Parses next element in the 'in select' insertion mode.
*
* This internal function performs the 'in select' insertion mode
* logic for the generalized WP_HTML_Processor::step() function.
*
* @since 6.7.0
*
* @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
*
* @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect
* @see WP_HTML_Processor::step
*
* @return bool Whether an element was found.
*/
private function step_in_select() {
$token_name = $this->get_token_name();
$token_type = $this->get_token_type();
$op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
$op = "{$op_sigil}{$token_name}";

switch ( $op ) {
sirreal marked this conversation as resolved.
Show resolved Hide resolved
/*
* > Any other character token
*/
case '#text':
$this->insert_html_element( $this->state->current_token );
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should perform the same check as we do in IN BODY for null text, as we don't really want to create a new node or pause just for them. that is, if the text node comprises only null bytes, ignore the token.

note that this rule only applies to null bytes directly. it does not apply if the null byte is encoded as a character reference. in that case, the character references � and � both decode to the Unicode replacement character


/*
* > A comment token
*/
case '#comment':
case '#funky-comment':
case '#presumptuous-tag':
$this->insert_html_element( $this->state->current_token );
return true;

/*
* > A DOCTYPE token
*/
case 'html':
// Parse error. Ignore the token.
return $this->step();

/*
* > A start tag whose tag name is "html"
*/
case '+HTML':
return $this->step_in_body();

/*
* > A start tag whose tag name is "option"
*/
case '+OPTION':
if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
$this->state->stack_of_open_elements->pop();
}
$this->insert_html_element( $this->state->current_token );
return true;

/*
* > A start tag whose tag name is "optgroup"
* > A start tag whose tag name is "hr"
*
* These rules are identical except for the treatment of the self-closing flag and
* the subsequent pop of the HR void element, all of which is handled elsewhere in the processor.
*/
case '+OPTGROUP':
case '+HR':
if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
$this->state->stack_of_open_elements->pop();
}

if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) {
$this->state->stack_of_open_elements->pop();
}

$this->insert_html_element( $this->state->current_token );
return true;

/*
* > An end tag whose tag name is "optgroup"
*/
case '-OPTGROUP':
$current_node = $this->state->stack_of_open_elements->current_node();
if ( $current_node && 'OPTION' === $current_node->node_name ) {
foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) {
break;
}
if ( $parent && 'OPTGROUP' === $parent->node_name ) {
$this->state->stack_of_open_elements->pop();
}
}

if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) {
$this->state->stack_of_open_elements->pop();
return true;
}
// Parse error: ignore the token.
return $this->step();

/*
* > An end tag whose tag name is "option"
*/
case '-OPTION':
if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
$this->state->stack_of_open_elements->pop();
return true;
}
// Parse error: ignore the token.
return $this->step();

/*
* > An end tag whose tag name is "select"
* > A start tag whose tag name is "select"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth it IMO to add the quote from the spec, the "Note"

/*
 * > It just gets treated like an end tag.
 */

*/
case '-SELECT':
case '+SELECT':
if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) {
return $this->step();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: Ignore the token

}
$this->state->stack_of_open_elements->pop_until( 'SELECT' );
$this->state->stack_of_open_elements->pop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is superfluous, because pop_until() pops the requested node.

Pops nodes off of the stack of open elements until one with the given tag name has been popped.

$this->reset_insertion_mode();
return true;

/*
* > A start tag whose tag name is one of: "input", "keygen", "textarea"
*/
case '+INPUT':
case '+KEYGEN':
case '+TEXTAREA':
// Parse error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look like it's describing the next line, the if, but it's just a general remark that this entire case is a parse error. Probably best to move it above in the comment for the cluster of tags.

if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) {
return $this->step();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: Ignore token.

}
$this->state->stack_of_open_elements->pop_until( 'SELECT' );
$this->reset_insertion_mode();
return $this->step( self::REPROCESS_CURRENT_NODE );

/*
* > A start tag whose tag name is one of: "script", "template"
* > An end tag whose tag name is "template"
*/
case '+SCRIPT':
case '+TEMPLATE':
case '-TEMPLATE':
return $this->step_in_head();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: Process the token using the rules for the "in head" insertion mode.

}

/*
* > Anything else
* > Parse error: ignore the token.
*/
return $this->step();
}

/*
* Internal helpers
*/
Expand Down Expand Up @@ -2036,6 +2262,7 @@ private function close_a_p_element() {
* Closes elements that have implied end tags.
*
* @since 6.4.0
* @since 6.7.0 Support "option" and "optgroup".
*
* @see https://html.spec.whatwg.org/#generate-implied-end-tags
*
Expand All @@ -2046,6 +2273,8 @@ private function generate_implied_end_tags( $except_for_this_element = null ) {
'DD',
'DT',
'LI',
'OPTGROUP',
'OPTION',
'P',
);

Expand Down Expand Up @@ -2074,6 +2303,8 @@ private function generate_implied_end_tags_thoroughly() {
'DD',
'DT',
'LI',
'OPTGROUP',
'OPTION',
'P',
);

Expand Down
3 changes: 0 additions & 3 deletions tests/phpunit/tests/html-api/wpHtmlProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -406,16 +406,13 @@ public static function data_unsupported_special_in_body_tags() {
'NOFRAMES' => array( 'NOFRAMES' ),
'NOSCRIPT' => array( 'NOSCRIPT' ),
'OBJECT' => array( 'OBJECT' ),
'OPTGROUP' => array( 'OPTGROUP' ),
'OPTION' => array( 'OPTION' ),
'PLAINTEXT' => array( 'PLAINTEXT' ),
'RB' => array( 'RB' ),
'RP' => array( 'RP' ),
'RT' => array( 'RT' ),
'RTC' => array( 'RTC' ),
'SARCASM' => array( 'SARCASM' ),
'SCRIPT' => array( 'SCRIPT' ),
'SELECT' => array( 'SELECT' ),
'STYLE' => array( 'STYLE' ),
'SVG' => array( 'SVG' ),
'TABLE' => array( 'TABLE' ),
Expand Down
Loading
Loading