Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally remove html[xmlns], convert html[xml:lang] to html[lang] #4194

Merged
merged 6 commits into from
Jan 30, 2020

Conversation

kienstra
Copy link
Contributor

@kienstra kienstra commented Jan 29, 2020

Summary

  • If http://www.w3.org/1999/xhtml === html[xmlns], this strips the attribute
  • If theres an html[xml:lang] but not html[lang], it moves the html[xml:lang] value to html[lang]
  • This strips the html[xml:lang] attribute if it exists, no matter its value

For example, with this change to Twenty Twenty:

diff --git a/header.php b/header.php
index d66d08f..f65eb8d 100644
--- a/header.php
+++ b/header.php
@@ -11,7 +11,7 @@
 
 ?><!DOCTYPE html>
 
-<html class="no-js" <?php language_attributes(); ?>>
+<html class="no-js" xmlns="http://www.w3.org/1999/xhtml" xml:lang="es">

...the html[xmlns] is stripped, and the html[xml:lang] is moved to html[lang]:

xml-lang-converted

Fixes #4132

Checklist

  • My pull request is addressing an open issue (please create one otherwise).
  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

…lang]

As Weston mentioned,
neither of these is valid HTML5
@googlebot googlebot added the cla: yes Signed the Google CLA label Jan 29, 2020
return;
}

$xmlns = $html->attributes->getNamedItem( 'xmlns' );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd probably be better to do something like $xmlns = $html->getAttribute( 'xmlns' ).

But for some reason, $html->getAttribute( 'xmlns' ) returned "", even when $html->attributes->getNamedItem( 'xmlns' ) returned a DOMAttr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure. I can see that $html->attributes->getNamedItem( 'xmlns' ) works across all PHP versions. I don't have a better option.

Before, if the xml:lang was non-empty and the lang was "",
this would not overwrite it.
@westonruter westonruter added this to the v1.5 milestone Jan 29, 2020
src/Dom/Document.php Outdated Show resolved Hide resolved
src/Dom/Document.php Outdated Show resolved Hide resolved
$html->removeAttributeNode( $xmlns );
}

$xml_lang = $html->attributes->getNamedItem( 'xml:lang' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered hasAttributeNS or getAttributeNS?

It seems these don't work either. I would have thought at least that this would work:

$dom->documentElement->getAttributeNodeNS( 'http://www.w3.org/XML/1998/namespace', 'lang' );

But it doesn't. Perhaps it's due to loadHTML and that the document is not XML in the first place.

Using $dom->documentElement->attributes->getNamedItem( 'xml:lang' ) works in all PHP versions 5+. So seems like a good one to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for looking at this.

return;
}

$xmlns = $html->attributes->getNamedItem( 'xmlns' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure. I can see that $html->attributes->getNamedItem( 'xmlns' ) works across all PHP versions. I don't have a better option.

Co-Authored-By: Weston Ruter <westonruter@google.com>
@kienstra
Copy link
Contributor Author

I'l make the remaining changes soon, maybe in 30 minutes to an hour.

As Weston mentioned, this is only used
in one place an shouldn't be needed.
@kienstra
Copy link
Contributor Author

The outstanding feedback should be addressed now, unless I misunderstood something.

@westonruter
Copy link
Member

Changes look good. Note that it is difficult to test this because without #4190 the validation errors do not show up in the Validated URL screen.

@westonruter westonruter merged commit c4e7f12 into develop Jan 30, 2020
@westonruter westonruter deleted the add/normalize-html-tag branch January 30, 2020 01:25
@kienstra
Copy link
Contributor Author

Ah, good point. Thanks for reviewing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Signed the Google CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalize html[xmlns] and html[xml:lang] attributes
3 participants