-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new SEO audit: HTTP status code #3311
new SEO audit: HTTP status code #3311
Conversation
artifacts.HTTPStatusCode <= HTTP_UNSUCCESSFUL_CODE_HIGH) { | ||
return { | ||
rawValue: false, | ||
displayValue: `${artifacts.HTTPStatusCode}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8217662
to
5c667bd
Compare
score: false, | ||
} | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK current smoke test setup doesn't allow me to easily specify a HTTP code for cases.html
file, so I'm pointing smokehouse to non-existent file which ends up returning 404.
This is a hack though. IMO a proper solution for this would be to have an optional xyz.json
file for each of xyz-cases.html
files where you can specify what HTTP code and what headers should be returned (we will probably need headers for other audits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we use query string for some cases like this delay=2000
, the server delays the response by 2 seconds which we could also use for status code, but headers would probably want to go in json direction :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have our own test server, so we can add features in there, requested via query string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. see the ?delay=200
that can be added to delay a response from the server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've added status_code
query param. We will worry about headers later.
class HTTPStatusCode extends Gatherer { | ||
|
||
afterPass(options, traceData) { | ||
const mainResource = traceData.networkRecords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want this in a computed artifact. gatherers should all require protocol interaction, but looks like this does not.
Also I feel like there are a few cases where we try to sniff out the main resource's network record. If that's true, we might want to break that out into its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I replaced this gatherer with a computed artefact MainResource
, couldn't find any other audit where it can be reused though.
PTAL
…e test accordingly.
adfc6f6
to
b16b37a
Compare
statusCode <= HTTP_UNSUCCESSFUL_CODE_HIGH) { | ||
return { | ||
rawValue: false, | ||
displayValue: `${statusCode}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static audit(artifacts) { | ||
const devtoolsLogs = artifacts.devtoolsLogs[Audit.DEFAULT_PASS]; | ||
|
||
return artifacts.requestNetworkRecords(devtoolsLogs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be absorbed into the computed artifact so that the call can just be artifacts.requestMainResource(devtoolsLogs).then(...
since computed artifacts also have access to other computed artifacts. One example of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Fixed. BTW not sure if you are aware, but PushedRequests doesn't seem to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but PushedRequests doesn't seem to be used?
haha, yes. I blame @samccone 🐐 :)
@@ -42,6 +45,9 @@ module.exports = [ | |||
'meta-description': { | |||
score: false, | |||
}, | |||
'http-status-code': { | |||
score: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like displayValue
can also be asserted on this failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, added!
const HTTP_REDIRECT_CODE_LOW = 300; | ||
const HTTP_REDIRECT_CODE_HIGH = 399; | ||
|
||
class MainResource extends ComputedArtifact { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsdoc description would be 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
|
||
/** | ||
* @param {!Array<!WebInspector.NetworkRequest>} networkRecords | ||
* @return {?WebInspector.NetworkRequest} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when can this be null? What should a caller assume in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it could happen in some edge cases (e.g. redirect loop? ssl error page?), but maybe I'm too paranoid? The audit that uses this artifact, in such case, will fail with a message 'Invalid MainResouce or its status code.'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it could happen in some edge cases (e.g. redirect loop? ssl error page?), but maybe I'm too paranoid?
Didn't mean to imply it was overkill; I think it's just the right amount of kill :) I was hoping we could enumerate possible failure cases so we could put more specific docs on the computed artifact (and in the audit debugString
).
In those two cases (redirect loop or ssl error page) I believe LH exit before getting to running audits, so I wonder if we should instead throw an error if no record is found. That would allow outputting the exact error case as a string without any audit that consumes it having to detect those cases and outputting their own debugString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickhulce has strong feelings about Lighthouse errors vs site errors, but in this case I think we can say that if we don't have a main resource record it's either a pretty bad Lighthouse bug or a pretty bad Chrome bug, so I'm personally ok with throwing in this case (with specific info in error message so we can debug later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I've changed it to throw when main resource cannot be identified and adjusted tests accordingly.
* @return {?WebInspector.NetworkRequest} | ||
*/ | ||
compute_(networkRecords) { | ||
return networkRecords.find(record => record.statusCode < HTTP_REDIRECT_CODE_LOW || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something obvious, but I'm not sure how this check is grabbing the main resource? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, is it that networkRecords
is sorted? I guess it would have to be the first non-redirect. Would it be better to have a stronger check for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, networkRecords are sorted. We can potentially do some additional checks:
- we can check if request initiator is 'other' and
- if type of resource is 'Document'.
WDYT?
These data is not available via networkRecords but we can get it directly from DevtoolsLog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to avoid using CriticalRequestChains
as they're due for a refactor after more fast-mode dependency graph stuff lands.
@paulirish may want to weigh in on what he imagines as a useful artifact here, but going way back to #605 he had the idea of a redirect-chain artifact, which would work for both here and #3308 (and be much simpler to iterate over compared to the critical-request-chains data structure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also like 80% sure @paulirish has told me the definitive way to determine the main request record, but I don't recall what that was. I defer to your research in the meantime, but relying on array order alone does make me a little nervous.
Lighthouse inserts into the networkRecords array based on when a request is finished, and it seems like it's possible (maybe? I have no evidence either way :) that the preload scanner (or something else) could trigger a resource download that finishes before the main page finishes loading, and so would be inserted before it and trick the statusCode check. The other checks would help work around this.
Walking the redirect chain (and maybe explicitly matching redirected URLs with redirectSource
like driver does for monitoring redirects) also seems like it would solve this problem well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lighthouse inserts into the networkRecords array based on when a request is finished
whoops, I'm wrong. As of #2197 they are inserted as the request begins, so this may be fine.
A redirect-chain artifact -- with the last entry always the main resource -- may still be more useful in more places in LH, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this may be fine.
so… we are back to "first non-redirect" heuristic?
redirect-chain artifact
NetworkRequest's have redirects
property that exposes chain of redirects that led to that resource. So, you can do mainResoruce.redirects
and get this:
this should be enough for #3308 (@wardpeet WDYT?). We can rework it to redirect-chain
if needed, but IMO mainResource.redirects
works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also like 80% sure @paulirish has told me the definitive way to determine the main request record, but I don't recall what that was
looking in DT source i see some really basic approaches for finding "main resources"
if (networkManager && request.url() === networkManager.target().inspectedURL() &&
request.resourceType() === Common.resourceTypes.Document)
(roughly what resourcetreemodel is doing...)
let url;
// _resourcesMap populated ...
target.on(SDK.ResourceTreeModel.Events.FrameNavigated, framePayload => url = framePayload.url);
var mainResource = this._resourcesMap[url];
basically just looking at URL match.. and either excluding redirects via selecting just Resources or by only considering type==document. ok then.
so tldr there is no definitive way. what you have here lgtm
*/ | ||
'use strict'; | ||
|
||
const Audit = require('../../../audits/seo/http-status-code.js'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename as httpStatusCodeAudit
? It gets confusing when reading the tests if just generic Audit
. (I regret we haven't switched all the rest of the audit tests off of using Audit
for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I went with HTTPStatusCodeAudit to be consistent with naming convention used in other tests.
assert.equal(output, record); | ||
}); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note we also have some devtoolsLogs and networkRecords in lighthouse-core/test/fixtures/
, e.g. one from a wikipedia redirect that might be good for a test too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm… if we use that fixture, we will be also testing NetworkRecords artifact, so it won't be a fully independent unit test. Also, wikipedia fixture does a bunch of redirects, just like the test above - so, IMO we will be repeating the same test. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm… if we use that fixture, we will be also testing NetworkRecords artifact, so it won't be a fully independent unit test. Also, wikipedia fixture does a bunch of redirects, just like the test above - so, IMO we will be repeating the same test. WDYT?
Wellllll, it depends on if you buy that Lighthouse tests are really unit tests. Testing philosophies aside, our unit tests haven't been pure since the beginning, and are really a mixture of unit and integration/functional tests, so this won't be ruining anything. :)
This mixture is especially useful for something as complicated as talking to Chrome over the debugger protocol, where any mock we write is in danger of simplifying too much (or, conversely, making realistic mocks too much work to write).
Having the tests of the idealized network records above and the unit tests of devtoolsLogs -> networkRecords (in network-recorder-test.js
) means that we should be able to isolate future issues in each of those places quickly, while using real networkRecords/devtoolsLogs here means that we can do a test of these two or three units in concert much more cheaply and quickly than firing up a smokehouse test for each case.
And if we change the heuristic for finding the main resource, the unit tests will need to change but a still working wikipedia-redirect.devtoolslog.json
test (and maybe the h2 push fixture?) will be a good indicator that the change was successful.
Everyone has different feelings on where these testing borders should be, and maybe we should have set up separate unit vs functional tests from the beginning, but here we are :) WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This absolutely makes sense, I just failed to realize that LH unit tests are a mix of different types of tests. I've added a test based on the wikipedia fixture.
…ased on wikipedia fixture.
@brendankenny PTAL 🔎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 📞🚦
Closes #3181.