Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize html[xmlns] and html[xml:lang] attributes #4132

Closed
westonruter opened this issue Jan 18, 2020 · 2 comments · Fixed by #4194
Closed

Normalize html[xmlns] and html[xml:lang] attributes #4132

westonruter opened this issue Jan 18, 2020 · 2 comments · Fixed by #4194
Assignees
Milestone

Comments

@westonruter
Copy link
Member

Feature description

When doing an analysis of WordPress themes, I got validation errors being reported for these attributes:

  • html@xml:lang
  • html@xmlns

These are included in many WordPress themes because WordPress to strive to be valid XHTML. WordPress has since moved to HTML5, and HTML5 specifies that the http://www.w3.org/1999/xhtml namespace is implied.

Since the Document class normalizes HTML documents as HTML5 with the HTML5 doctype, I think it makes sense to also have Document::loadHTML() automatically normalize these attributes.

We should consider normalizing xml:lang to just lang, or removing xml:lang if the lang attribute is already there. The xmlns attribute here is redundant and unnecessary; it can simply be removed.

So given:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

Should be normalized to just:

<html lang="en">

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation brief

QA testing instructions

Demo

Changelog entry

@kienstra
Copy link
Contributor

Implementation

Hi @westonruter,
Hope you're doing great.

What do you think about:

  1. Adding a helper method to Document like:
diff --git a/src/Dom/Document.php b/src/Dom/Document.php
index 5bad55ddd..a33b6db30 100644
--- a/src/Dom/Document.php
+++ b/src/Dom/Document.php
@@ -340,6 +340,8 @@ final class Document extends DOMDocument {
                libxml_use_internal_errors( $libxml_previous_state );
 
                if ( $success ) {
+                       $this->normalize_html_attributes();
+
                        // Remove http-equiv charset again.
                        $meta = $this->head->firstChild;
                        if (
@@ -1021,6 +1023,31 @@ final class Document extends DOMDocument {
                return preg_replace( self::HTML_RESTORE_DOCTYPE_PATTERN, '\1!\3\4>', $html, 1 );
        }
  1. Like you suggested, strip html[xmlns]
  2. Convert html[xml:lang] to html[lang], if there's no html[lang] value yet

Thanks, Weston!

@kienstra kienstra self-assigned this Jan 29, 2020
@westonruter
Copy link
Member Author

Sounds good. @schlessera any other suggestions?

  1. Adding a helper method to Document like:

Yeah, let that method be private.

  1. Like you suggested, strip html[xmlns]

Let's only strip it if it is exactly equal to http://www.w3.org/1999/xhtml

  1. Convert html[xml:lang] to html[lang], if there's no html[lang] value yet

And don't forget to remove the xml:lang attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants