Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new SEO audit: HTTP status code #3311

Merged
merged 12 commits into from
Sep 27, 2017

Conversation

kdzwinel
Copy link
Collaborator

@kdzwinel kdzwinel commented Sep 12, 2017

Closes #3181.

artifacts.HTTPStatusCode <= HTTP_UNSUCCESSFUL_CODE_HIGH) {
return {
rawValue: false,
displayValue: `${artifacts.HTTPStatusCode}`
Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

displayValue is appended to failureDescription:
screen shot 2017-09-12 at 22 39 57

@kdzwinel kdzwinel force-pushed the seo-http-status-code branch from 8217662 to 5c667bd Compare September 12, 2017 21:55
score: false,
}
}
},
Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK current smoke test setup doesn't allow me to easily specify a HTTP code for cases.html file, so I'm pointing smokehouse to non-existent file which ends up returning 404.

This is a hack though. IMO a proper solution for this would be to have an optional xyz.json file for each of xyz-cases.html files where you can specify what HTTP code and what headers should be returned (we will probably need headers for other audits).

Copy link
Collaborator

@patrickhulce patrickhulce Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we use query string for some cases like this delay=2000, the server delays the response by 2 seconds which we could also use for status code, but headers would probably want to go in json direction :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have our own test server, so we can add features in there, requested via query string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. see the ?delay=200 that can be added to delay a response from the server

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've added status_code query param. We will worry about headers later.

class HTTPStatusCode extends Gatherer {

afterPass(options, traceData) {
const mainResource = traceData.networkRecords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this in a computed artifact. gatherers should all require protocol interaction, but looks like this does not.

Also I feel like there are a few cases where we try to sniff out the main resource's network record. If that's true, we might want to break that out into its own.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I replaced this gatherer with a computed artefact MainResource, couldn't find any other audit where it can be reused though.

PTAL

@kdzwinel kdzwinel force-pushed the seo-http-status-code branch from adfc6f6 to b16b37a Compare September 20, 2017 22:38
statusCode <= HTTP_UNSUCCESSFUL_CODE_HIGH) {
return {
rawValue: false,
displayValue: `${statusCode}`,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

displayValue is appended to failureDescription:
screen shot 2017-09-12 at 22 39 57

static audit(artifacts) {
const devtoolsLogs = artifacts.devtoolsLogs[Audit.DEFAULT_PASS];

return artifacts.requestNetworkRecords(devtoolsLogs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be absorbed into the computed artifact so that the call can just be artifacts.requestMainResource(devtoolsLogs).then(... since computed artifacts also have access to other computed artifacts. One example of this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Fixed. BTW not sure if you are aware, but PushedRequests doesn't seem to be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but PushedRequests doesn't seem to be used?

haha, yes. I blame @samccone 🐐 :)

@@ -42,6 +45,9 @@ module.exports = [
'meta-description': {
score: false,
},
'http-status-code': {
score: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like displayValue can also be asserted on this failure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, added!

const HTTP_REDIRECT_CODE_LOW = 300;
const HTTP_REDIRECT_CODE_HIGH = 399;

class MainResource extends ComputedArtifact {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc description would be 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!


/**
* @param {!Array<!WebInspector.NetworkRequest>} networkRecords
* @return {?WebInspector.NetworkRequest}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when can this be null? What should a caller assume in those cases?

Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it could happen in some edge cases (e.g. redirect loop? ssl error page?), but maybe I'm too paranoid? The audit that uses this artifact, in such case, will fail with a message 'Invalid MainResouce or its status code.'.

Copy link
Member

@brendankenny brendankenny Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it could happen in some edge cases (e.g. redirect loop? ssl error page?), but maybe I'm too paranoid?

Didn't mean to imply it was overkill; I think it's just the right amount of kill :) I was hoping we could enumerate possible failure cases so we could put more specific docs on the computed artifact (and in the audit debugString).

In those two cases (redirect loop or ssl error page) I believe LH exit before getting to running audits, so I wonder if we should instead throw an error if no record is found. That would allow outputting the exact error case as a string without any audit that consumes it having to detect those cases and outputting their own debugString

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickhulce has strong feelings about Lighthouse errors vs site errors, but in this case I think we can say that if we don't have a main resource record it's either a pretty bad Lighthouse bug or a pretty bad Chrome bug, so I'm personally ok with throwing in this case (with specific info in error message so we can debug later)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I've changed it to throw when main resource cannot be identified and adjusted tests accordingly.

* @return {?WebInspector.NetworkRequest}
*/
compute_(networkRecords) {
return networkRecords.find(record => record.statusCode < HTTP_REDIRECT_CODE_LOW ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something obvious, but I'm not sure how this check is grabbing the main resource? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, is it that networkRecords is sorted? I guess it would have to be the first non-redirect. Would it be better to have a stronger check for this?

Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, networkRecords are sorted. We can potentially do some additional checks:

  • we can check if request initiator is 'other' and
  • if type of resource is 'Document'.

WDYT?

These data is not available via networkRecords but we can get it directly from DevtoolsLog.

Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, just found @wardpeet PR that deals with redirects too -
#3308 . We could also use CriticalRequestChains here and check for first resource that doesn't have redirected in request ID (instead of using status codes).

Copy link
Member

@brendankenny brendankenny Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to avoid using CriticalRequestChains as they're due for a refactor after more fast-mode dependency graph stuff lands.

@paulirish may want to weigh in on what he imagines as a useful artifact here, but going way back to #605 he had the idea of a redirect-chain artifact, which would work for both here and #3308 (and be much simpler to iterate over compared to the critical-request-chains data structure).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also like 80% sure @paulirish has told me the definitive way to determine the main request record, but I don't recall what that was. I defer to your research in the meantime, but relying on array order alone does make me a little nervous.

Lighthouse inserts into the networkRecords array based on when a request is finished, and it seems like it's possible (maybe? I have no evidence either way :) that the preload scanner (or something else) could trigger a resource download that finishes before the main page finishes loading, and so would be inserted before it and trick the statusCode check. The other checks would help work around this.

Walking the redirect chain (and maybe explicitly matching redirected URLs with redirectSource like driver does for monitoring redirects) also seems like it would solve this problem well

Copy link
Member

@brendankenny brendankenny Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lighthouse inserts into the networkRecords array based on when a request is finished

whoops, I'm wrong. As of #2197 they are inserted as the request begins, so this may be fine.

A redirect-chain artifact -- with the last entry always the main resource -- may still be more useful in more places in LH, though.

Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this may be fine.

so… we are back to "first non-redirect" heuristic?

redirect-chain artifact

NetworkRequest's have redirects property that exposes chain of redirects that led to that resource. So, you can do mainResoruce.redirects and get this:
screen shot 2017-09-25 at 00 00 32
this should be enough for #3308 (@wardpeet WDYT?). We can rework it to redirect-chain if needed, but IMO mainResource.redirects works well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also like 80% sure @paulirish has told me the definitive way to determine the main request record, but I don't recall what that was

looking in DT source i see some really basic approaches for finding "main resources"

      if (networkManager && request.url() === networkManager.target().inspectedURL() &&
          request.resourceType() === Common.resourceTypes.Document)
(roughly what resourcetreemodel is doing...)
let url;
// _resourcesMap populated ...
target.on(SDK.ResourceTreeModel.Events.FrameNavigated, framePayload => url = framePayload.url);
var mainResource = this._resourcesMap[url];

basically just looking at URL match.. and either excluding redirects via selecting just Resources or by only considering type==document. ok then.

so tldr there is no definitive way. what you have here lgtm

*/
'use strict';

const Audit = require('../../../audits/seo/http-status-code.js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename as httpStatusCodeAudit? It gets confusing when reading the tests if just generic Audit. (I regret we haven't switched all the rest of the audit tests off of using Audit for this)

Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I went with HTTPStatusCodeAudit to be consistent with naming convention used in other tests.

assert.equal(output, record);
});
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note we also have some devtoolsLogs and networkRecords in lighthouse-core/test/fixtures/, e.g. one from a wikipedia redirect that might be good for a test too

Copy link
Collaborator Author

@kdzwinel kdzwinel Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm… if we use that fixture, we will be also testing NetworkRecords artifact, so it won't be a fully independent unit test. Also, wikipedia fixture does a bunch of redirects, just like the test above - so, IMO we will be repeating the same test. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm… if we use that fixture, we will be also testing NetworkRecords artifact, so it won't be a fully independent unit test. Also, wikipedia fixture does a bunch of redirects, just like the test above - so, IMO we will be repeating the same test. WDYT?

Wellllll, it depends on if you buy that Lighthouse tests are really unit tests. Testing philosophies aside, our unit tests haven't been pure since the beginning, and are really a mixture of unit and integration/functional tests, so this won't be ruining anything. :)

This mixture is especially useful for something as complicated as talking to Chrome over the debugger protocol, where any mock we write is in danger of simplifying too much (or, conversely, making realistic mocks too much work to write).

Having the tests of the idealized network records above and the unit tests of devtoolsLogs -> networkRecords (in network-recorder-test.js) means that we should be able to isolate future issues in each of those places quickly, while using real networkRecords/devtoolsLogs here means that we can do a test of these two or three units in concert much more cheaply and quickly than firing up a smokehouse test for each case.

And if we change the heuristic for finding the main resource, the unit tests will need to change but a still working wikipedia-redirect.devtoolslog.json test (and maybe the h2 push fixture?) will be a good indicator that the change was successful.

Everyone has different feelings on where these testing borders should be, and maybe we should have set up separate unit vs functional tests from the beginning, but here we are :) WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This absolutely makes sense, I just failed to realize that LH unit tests are a mix of different types of tests. I've added a test based on the wikipedia fixture.

@kdzwinel
Copy link
Collaborator Author

@brendankenny PTAL 🔎

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 📞🚦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants