Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation broken in Genesis themes by Document::normalize_document_structure() #4104

Closed
westonruter opened this issue Jan 14, 2020 · 5 comments · Fixed by #4107
Closed

Validation broken in Genesis themes by Document::normalize_document_structure() #4104

westonruter opened this issue Jan 14, 2020 · 5 comments · Fixed by #4107
Labels
Bug Something isn't working P0 High priority QA passed Has passed QA and is done
Milestone

Comments

@westonruter
Copy link
Member

Bug Description

I was testing out a Genesis theme and I saw in the admin bar that it was ✅ valid AMP. When I clicked to validate the page, however, I unexpectedly saw two validation errors:

image

This made no sense to me and I was racking my brain trying to figure it out. It turns out to be that the link[rel=canonical] element was located in the body during validation, when for non-validation requests it is in the head as expected.

The issue turns out to be \Amp\AmpWP\Dom\Document::normalize_document_structure() introduced in #3758, namely that it is not currently written to account for HTML comments occuring before the doctype.

Normally, WordPress themes hard-code the <!DOCTYPE html>, <html>, and <head> tags. Genesis, however, uses genesis_doctype() to generate them. When a validation request is being performed, source stack comments are added before and after genesis_doctype() like so:

<!--amp-source-stack {"type":"theme","name":"genesis","file":"lib\/structure\/header.php","line":28,"function":"genesis_do_doctype","hook":"genesis_doctype","priority":10}--><!DOCTYPE html>
<html lang="en-US">
<head >
<meta charset="UTF-8" />
<!--/amp-source-stack {"type":"theme","name":"genesis","file":"lib\/structure\/header.php","line":28,"function":"genesis_do_doctype","hook":"genesis_doctype","priority":10}-->

This has the effect of Document::normalize_document_structure() normalizing the document to start as:

<!DOCTYPE html><html><head></head><body><!--amp-source-stack {"type":"theme","name":"genesis","file":"lib\/structure\/header.php","line":28,"function":"genesis_do_doctype","hook":"genesis_doctype","priority":10}--><!DOCTYPE html>
<html lang="en-US">
<head >

Expected Behaviour

The DOM used for validation should be normalized to be the same as the DOM used for rendering.

Steps to reproduce

  1. Activate Genesis.
  2. Enable Transitional or Standard mode in the AMP plugin.
  3. Validate a URL.
  4. Notice unexpected validation errors for link[rel=canonical] or other tags.

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • Validating a page generated by Genesis should not result in validation errors when there are none on the frontend.
  • HTML comments that occur before and around the <!DOCTYPE>, <html>, and <head> tags must be preserved so that validation errors can be properly sourced.

Implementation brief

  • Perhaps capture the whitespace before/after the <!DOCTYPE>, <html>, and <head> and ensure that they are retained after normalization.

QA testing instructions

Demo

Changelog entry

@westonruter westonruter added Bug Something isn't working P0 High priority labels Jan 14, 2020
@westonruter westonruter added this to the v1.5 milestone Jan 14, 2020
@schlessera schlessera self-assigned this Jan 14, 2020
@schlessera
Copy link
Collaborator

Good news!

It looks like the comments are not actually stripped:
Image 2020-01-14 at 2 21 25 PM

😄

@schlessera
Copy link
Collaborator

So as it turns out, the main obstacle here is (again) PHP's DOM_Document, because when the first tag is not a <!doctype>, it prepends a new one, and then strips the duplicate one later in the document. So the comment ends up after the <!doctype> instead of in front of it.

I'll need to add additional logic to allow for this, it seems.

@westonruter
Copy link
Member Author

Isn't it the normalize method here that is prepending the doctype by prepending <!DOCTYPE html><html><head></head><body> when it doesn't find the document to start with a doctype in the first place?

@schlessera
Copy link
Collaborator

I changed the regexes now to take leading comments into account. But as soon as I do that and keep the first comment, DOM_Document strips mine and prepends a default one.

I'm experimenting with the best way of getting around this, which seems to be a combination of changing the doctype into a comment and using the LIBXML_HTML_NODEFDTD flag. Weirdly enough, LIBXML_HTML_NODEFDTD makes it so that DOM_Document doesn't add a default one if it is missing, but still strips mine and adds a default one if it is not the first element.

@csossi
Copy link

csossi commented Feb 13, 2020

verified in qa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working P0 High priority QA passed Has passed QA and is done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants