Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a html-only gatherer #9756

Closed
benschwarz opened this issue Sep 30, 2019 · 6 comments · Fixed by #9781
Closed

Add a html-only gatherer #9756

benschwarz opened this issue Sep 30, 2019 · 6 comments · Fixed by #9781

Comments

@benschwarz
Copy link
Contributor

Feature request summary

At the moment custom plugins aren't able to create their own gatherers, and there isn't a gatherer to retrieve the final HTML that a page produced.

Ref: ampproject/amp-toolbox#509

My proposal is to add a gatherer that returns document.documentElement.outerHTML, or something like it.

What is the motivation or use case for changing this?

Plugins are unable to retrieve the document HTML from the browser.

How is this beneficial to Lighthouse?

Ergonomic plugin authoring

@brendankenny
Copy link
Member

I think the amp validator would need the initial html, wouldn't it? So this could be calling driver.getRequestContent with the main resource request.

A side benefit is that most of the time the html request would be of pretty reasonable size (at least compared to many ScriptElements), a lot better than the easily monstrous (css in js, etc etc) outerHTML.

@benschwarz
Copy link
Contributor Author

So this could be calling driver.getRequestContent with the main resource request.

Yah. That'd also account for redirections etc etc 👍

I think the amp validator would need the initial html, wouldn't it?

I'm not really qualified to say whether amp would want the initial HTML or the fully realised result, but I'm sure @alabiaga or @cramforce could add some guidance here

@ithinkihaveacat
Copy link

The AMP validator wants the equivalent of Chrome's "view page source." So driver.getRequestContent would be the right thing in this case?

@patrickhulce
Copy link
Collaborator

Ah yeah the initial HTML document is a much easier ask than the final rendered HTML content :)

This sounds doable as a default gatherer IMO!

@alabiaga
Copy link
Contributor

alabiaga commented Sep 30, 2019

I'm not really qualified to say whether amp would want the initial HTML or the fully realised result

It is only the initial HTML that is validated against. That is if I am understanding correctly how the terminology is being used. In the following example: https://playground.amp.dev/?url=https%3A%2F%2Fpreview.amp.dev%2Fdocumentation%2Fexamples%2Fcomponents%2Famp-list&format=websites&_gl=1*1na28s7*_ga*YW1wLWdtNHdvMm5XNUs3dWRqYWFKZHp3eHc.

The markup on the left is what is validated. The realized and final html is the output on the right.

When I created the gatherer, it was because I wasn't aware that there was an async version of the audit call, thus I did things in the gatherer and just passed it to the audit call via an artifact. In the gatherer, I don't recall ever using driver.getRequestContent as I wasn't sure about the ID to pass. If I remember, I went about it two ways. One was was via issuing a command to use the chrome dev tools protocol, via DOM.getDocument. but I definitely went a simpler route, looking at an existing gatherer as example. I think document was just globally available or the html document was definitely a property of some context.

@brendankenny
Copy link
Member

The AMP plugin can now use artifacts.MainDocumentContent on master/next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants