-
Notifications
You must be signed in to change notification settings - Fork 3k
HTML API: CSS class name methods should behave according to quirks mode #7169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: CSS class name methods should behave according to quirks mode #7169
Conversation
Allow quirks mode to be set before document processing begins.
This is necessary for has_class to work properly. This could be put into a protected method or the class sensitivity could be a parameter if desired.
Subsequent changes introduced document_mode instead of compat_mode
a50e262
to
1435cf9
Compare
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
@@ -4645,7 +4675,7 @@ public function remove_class( $class_name ): bool { | |||
* @return bool|null Whether the matched tag contains the given class name, or null if not matched. | |||
*/ | |||
public function has_class( $wanted_class ): ?bool { | |||
return $this->is_virtual() ? null : parent::has_class( $wanted_class ); | |||
return $this->is_virtual() ? false : parent::has_class( $wanted_class ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a small bug and isn't necessary in this PR. I'd be happy to make another PR if desired. is_virtual
would suggest we're stopped on a tag, but the tag can't have any attributes. This should return false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. I would consider this more of a bug in the type signature. with active format reconstruction it is possible for virtual nodes to have a class, but the null
was meant to convey exactly what you inferred: "This tag can have no classes" rather than "this has has no classes."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug introduced in #6753
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems different from what the tag processor does, where "not on a tag" returns null
, "on a tag" always returns false
with no special handling for tags with no classes.
wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Lines 1181 to 1203 in 830d66c
/** | |
* Returns if a matched tag contains the given ASCII case-insensitive class name. | |
* | |
* @since 6.4.0 | |
* | |
* @param string $wanted_class Look for this CSS class name, ASCII case-insensitive. | |
* @return bool|null Whether the matched tag contains the given class name, or null if not matched. | |
*/ | |
public function has_class( $wanted_class ): ?bool { | |
if ( self::STATE_MATCHED_TAG !== $this->parser_state ) { | |
return null; | |
} | |
$wanted_class = strtolower( $wanted_class ); | |
foreach ( $this->class_list() as $class_name ) { | |
if ( $class_name === $wanted_class ) { | |
return true; | |
} | |
} | |
return false; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for following up on these comments. null
has been used to mean "cannot potentially contain that attribute or any others." maybe what we need is a comment update since we created virtual nodes.
at some point, if we allow reading these this might change, as nodes created during active format reconstruction contain the attributes of their original tags. for now, though, I prefer having a distinction between "this tag does not have this class" and "it's not possible to answer if this tag has this class"
* `QUIRKS_MODE` impacts many styling-related aspects of an HTML document, but | ||
* none of the other changes modifies how the HTML is parsed or selected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines immediately above this about P > TABLE handling in quirks/no-quirks modes seem contradictory. That's directly related to tree-construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the wording. It indeed was self-contradictory
* @return static|null The created processor if successful, otherwise null. | ||
*/ | ||
public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) { | ||
public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8', $document_mode = WP_HTML_Processor_State::NO_QUIRKS_MODE ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fragment parser would typically inherit the context element's document's compatibility mode.
I suspect we'll need to change how the context element is passed, but I don't think it will include information about it's document mode and this argument will remain helpful.
See #7141.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think this is worth supporting right now? do we have enough warrant to include it? in what cases would we want to create a document fragment in quirks mode, and in those cases, how would we know?
I can definitely see value in having this, but also I wonder if this inclusion will help people understand better what they need to be doing or add confusion.
what do you think the consequences would be of simply not supporting a quirks-mode fragment parser? or at least of not having it in the primary function signature for creating the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was helpful and important for testing while working on these changes.
In what cases would we want to create a document fragment in quirks mode, and in those cases, how would we know?
The fragment should use the same mode as the document for the context element. This should become clear when we work on set_{inner,outer}_html
where we'll be creating fragment parsers that use the parent parser's document mode.
Another option would be to adjust the full parser to handle doctype declarations and set the document_mode correctly according to the full HTML document. Then we could certainly omit these changes.
I'm also reluctant to eagerly add more to this method signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think the consequences would be of simply not supporting a quirks-mode fragment parser?
If the full parser supports quirks mode and we have set_{inner,outer}_html
methods, I think the fragment parser must support quirks mode.
or at least of not having it in the primary function signature for creating the class?
The mode should be based on the context element's document's mode. There's no way to provide that information to the fragment parser right now. The context element is currently passed to the fragment parser as an HTML-like string (only <body>
is allowed right now) which seems insufficient to pass all the information the fragment parser requires.
There are other ways to handle this, for example an instance method could create a fragment from a node and set quirks mode appropriately as well as handle things like reading attributes, namespace, etc. This is all best discussed in #7141.
I am going to explore a change to handle quirks mode in the full parser based on the doctype declaration, that would be sufficient and we could remove this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see some value in having a method like ->set_quirks_mode()
I still suspect that it'll be basically universal that we don't work in the full parser mode within WordPress. almost no HTML-processing code has access to the full document, and so that's what I meant when assuming no-quirks mode. We just don't know even when doing inner_html
operations if the parent document is in quirks mode or not. "We can only assume UTF-8, no-quirks, <body>
context."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be value in allowing the compat mode to be changed on a processor. I'd like to leave those changes for consideration in their own PR. There was no way to change the compat mode before and I don't think we need to address that in this PR.
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
I plan to revisit this when #7195 is complete. That will allow us to test with quirks mode using the full parser and not need to change the fragment factory method signature. |
Full documents can be created in quirks mode now. There's no need to introduce quirks mode to the fragment parser or change its signature in order to tests the quirks mode changes.
Quirks mode changes behavior CSS class functions, namely whether they are ASCII case-insensitive class name matches or byte-for-byte comparisons. It makes sense to move quirks mode into the tag processor so that it can deal with this correctly.
I think the concerns have been addressed and this is ready for another review. |
* @todo When reconstructing active formatting elements with attributes, find a way | ||
* to indicate if the virtually-reconstructed formatting elements contain the | ||
* wanted class name. | ||
* | ||
* @param string $wanted_class Look for this CSS class name, ASCII case-insensitive. | ||
* @return bool|null Whether the matched tag contains the given class name, or null if not matched. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sirreal I've reverted this change so we can consider it separately. I know it will be important during active format reconstruction, but I think that null
is a kind of partially-implemented escape hatch that communicates that this isn't supported rather than indicating that a class definitively doesn't exist on the tag.
$this->assertSame( '<span class="UPPER">', $processor->get_updated_html() ); | ||
|
||
$processor->add_class( 'ANOTHER-UPPER' ); | ||
$this->assertSame( '<span class="UPPER another-upper">', $processor->get_updated_html() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that here we could respect the given casing.
@sirreal I've updated some comments and changed the behavior of it's really quite a surprise, and I hope quirks mode is almost never used, given the conflict between CSS selectors matching the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sirreal I'm going to merge this because I anticipate you would prefer it (once the tests all pass), but if you disagree with the changes I made we can revisit.
@sirreal I've also moved the Trac ticket reference to the top of the description to make it easier to find, and used the shorthand notation instead of the full link. |
The HTML API has been behaving as if CSS class name selectors matched class names in an ASCII case-insensitive manner. This is only true if the document in question is set to quirks mode. Unfortunately most documents processed will be set to no-quirks mode, meaning that some CSS behaviors have been matching incorrectly when provided with case variants of class names. In this patch, the CSS methods have been audited and updated to adhere to the rules governing ASCII case sensitivity when matching classes. This includes `add_class()`, `remove_class()`, `has_class()`, and `class_list()`. Now, it is assumed that a document is in no-quirks mode unless a full HTML parser infers quirks mode, and these methods will treat class names in a byte-for-byte manner. Otherwise, when a document is in quirks mode, the methods will compare the provided class names against existing class names for the tag in an ASCII case insensitive way, while `class_list()` will return a lower-cased version of the existing class names. The lower-casing in `class_list()` is performed for consistency, since it's possible that multiple case variants of the same comparable class name exists on a tag in the input HTML. Developed in #7169 Discussed in https://core.trac.wordpress.org/ticket/61531 Props dmsnell, jonsurrell. See #61531. git-svn-id: https://develop.svn.wordpress.org/trunk@58985 602fd350-edb4-49c9-b593-d223f7449a82
The HTML API has been behaving as if CSS class name selectors matched class names in an ASCII case-insensitive manner. This is only true if the document in question is set to quirks mode. Unfortunately most documents processed will be set to no-quirks mode, meaning that some CSS behaviors have been matching incorrectly when provided with case variants of class names. In this patch, the CSS methods have been audited and updated to adhere to the rules governing ASCII case sensitivity when matching classes. This includes `add_class()`, `remove_class()`, `has_class()`, and `class_list()`. Now, it is assumed that a document is in no-quirks mode unless a full HTML parser infers quirks mode, and these methods will treat class names in a byte-for-byte manner. Otherwise, when a document is in quirks mode, the methods will compare the provided class names against existing class names for the tag in an ASCII case insensitive way, while `class_list()` will return a lower-cased version of the existing class names. The lower-casing in `class_list()` is performed for consistency, since it's possible that multiple case variants of the same comparable class name exists on a tag in the input HTML. Developed in WordPress/wordpress-develop#7169 Discussed in https://core.trac.wordpress.org/ticket/61531 Props dmsnell, jonsurrell. See #61531. Built from https://develop.svn.wordpress.org/trunk@58985 git-svn-id: http://core.svn.wordpress.org/trunk@58381 1a063a9b-81f0-0310-95a4-ce76da25c4cd
The HTML API has been behaving as if CSS class name selectors matched class names in an ASCII case-insensitive manner. This is only true if the document in question is set to quirks mode. Unfortunately most documents processed will be set to no-quirks mode, meaning that some CSS behaviors have been matching incorrectly when provided with case variants of class names. In this patch, the CSS methods have been audited and updated to adhere to the rules governing ASCII case sensitivity when matching classes. This includes `add_class()`, `remove_class()`, `has_class()`, and `class_list()`. Now, it is assumed that a document is in no-quirks mode unless a full HTML parser infers quirks mode, and these methods will treat class names in a byte-for-byte manner. Otherwise, when a document is in quirks mode, the methods will compare the provided class names against existing class names for the tag in an ASCII case insensitive way, while `class_list()` will return a lower-cased version of the existing class names. The lower-casing in `class_list()` is performed for consistency, since it's possible that multiple case variants of the same comparable class name exists on a tag in the input HTML. Developed in WordPress/wordpress-develop#7169 Discussed in https://core.trac.wordpress.org/ticket/61531 Props dmsnell, jonsurrell. See #61531. Built from https://develop.svn.wordpress.org/trunk@58985 git-svn-id: https://core.svn.wordpress.org/trunk@58381 1a063a9b-81f0-0310-95a4-ce76da25c4cd
The HTML API has been behaving as if CSS class name selectors matched class names in an ASCII case-insensitive manner. This is only true if the document in question is set to quirks mode. Unfortunately most documents processed will be set to no-quirks mode, meaning that some CSS behaviors have been matching incorrectly when provided with case variants of class names. In this patch, the CSS methods have been audited and updated to adhere to the rules governing ASCII case sensitivity when matching classes. This includes `add_class()`, `remove_class()`, `has_class()`, and `class_list()`. Now, it is assumed that a document is in no-quirks mode unless a full HTML parser infers quirks mode, and these methods will treat class names in a byte-for-byte manner. Otherwise, when a document is in quirks mode, the methods will compare the provided class names against existing class names for the tag in an ASCII case insensitive way, while `class_list()` will return a lower-cased version of the existing class names. The lower-casing in `class_list()` is performed for consistency, since it's possible that multiple case variants of the same comparable class name exists on a tag in the input HTML. Developed in WordPress#7169 Discussed in https://core.trac.wordpress.org/ticket/61531 Props dmsnell, jonsurrell. See #61531. git-svn-id: https://develop.svn.wordpress.org/trunk@58985 602fd350-edb4-49c9-b593-d223f7449a82
Trac ticket: Core-61531
Testing changes
These are for the full HTML API test suites. There are no changes to the html5lib test suite.
Description
Update that HTML Processor and Tag Processor to handle CSS classes in a case-sensitive way by default.
This aligns with "no-quirks" or "standards" mode behavior for class name handling.
Remove forced lowercasing in
::class_list
.Add a
$document_mode
argument toWP_HTML_Processor::create_fragment()
to enable the fragment parser to be created in quirks mode.When the HTML Processor is in no-quirks mode,
add_class
,remove_class
andhas_class
operate in a case-sensitive way:When the HTML Processor is in quirks mode (only available via the fragment parser at the moment),
add_class
,remove_class
andhas_class
operate in a case-sensitive way:add_class
andremove_class
get similar treatment. Case-insensitive classes matchingremove_class( $class_name )
will be removed. Case insensitive duplicate classes will not be added when callingadd_class( $class_name )
.The case-sensitivity is managed by adding a protected
comparable_class_name
method to the WP_HTML_Tag_Processor. This method is called internally in the CSS class related methods to allow subclasses to customize comparison behavior. This requires minimal changes to the implementation of class name handling in the Tag Processor.In the Tag Processor and the HTML Processor in "no-quirks" mode,
comparable_class_name
method returns the class name as-is, so case-sensitive comparison is performed. The HTML Processor in quirks mode will return the class name in ASCII lowercase so that case-insensitive comparison is performed.class_list
does produce case-insensitive duplicates in all cases. This deduplication would be easy to perform in quirks mode, however it's unclear what form of casing should be yielded. In the case of<div class="aaa AAA AaA">
, which of the (equivalent in quirks-mode)aaa
class names should be yielded?The Tag Processor and HTML Processor classes have several methods for dealing with CSS class names:
class_list
,add_class
,remove_class
andhas_class
.These methods are intended to provide a CSS class selector-like interface with the class attribute.
Class name matching (CSS class selectors
.className {}
orgetElementsByClassName( "className" )
) is case sensitive in no-quirks mode and case-insensitive in quirks-mode.This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.