new SEO audit: HTTP status code #3311

kdzwinel · 2017-09-12T20:55:03Z

Closes #3181.

kdzwinel · 2017-09-12T20:57:55Z

lighthouse-core/audits/seo/http-status-code.js

+      artifacts.HTTPStatusCode <= HTTP_UNSUCCESSFUL_CODE_HIGH) {
+      return {
+        rawValue: false,
+        displayValue: `${artifacts.HTTPStatusCode}`


displayValue is appended to failureDescription:

kdzwinel · 2017-09-12T23:48:18Z

lighthouse-cli/test/smokehouse/seo/expectations.js

+        score: false,
+      }
+    }
+  },


AFAIK current smoke test setup doesn't allow me to easily specify a HTTP code for cases.html file, so I'm pointing smokehouse to non-existent file which ends up returning 404.

This is a hack though. IMO a proper solution for this would be to have an optional xyz.json file for each of xyz-cases.html files where you can specify what HTTP code and what headers should be returned (we will probably need headers for other audits).

FWIW we use query string for some cases like this delay=2000, the server delays the response by 2 seconds which we could also use for status code, but headers would probably want to go in json direction :)

We have our own test server, so we can add features in there, requested via query string

e.g. see the ?delay=200 that can be added to delay a response from the server

Thanks! I've added status_code query param. We will worry about headers later.

paulirish · 2017-09-20T01:04:20Z

lighthouse-core/gather/gatherers/seo/http-status-code.js

+  class HTTPStatusCode extends Gatherer {
+
+    afterPass(options, traceData) {
+      const mainResource = traceData.networkRecords


I think we want this in a computed artifact. gatherers should all require protocol interaction, but looks like this does not.

Also I feel like there are a few cases where we try to sniff out the main resource's network record. If that's true, we might want to break that out into its own.

Good call! I replaced this gatherer with a computed artefact MainResource, couldn't find any other audit where it can be reused though.

PTAL

…e test accordingly.

kdzwinel · 2017-09-20T22:40:00Z

lighthouse-core/audits/seo/http-status-code.js

+          statusCode <= HTTP_UNSUCCESSFUL_CODE_HIGH) {
+          return {
+            rawValue: false,
+            displayValue: `${statusCode}`,


displayValue is appended to failureDescription:

brendankenny · 2017-09-20T23:04:33Z

lighthouse-core/audits/seo/http-status-code.js

+  static audit(artifacts) {
+    const devtoolsLogs = artifacts.devtoolsLogs[Audit.DEFAULT_PASS];
+
+    return artifacts.requestNetworkRecords(devtoolsLogs)


this can be absorbed into the computed artifact so that the call can just be artifacts.requestMainResource(devtoolsLogs).then(... since computed artifacts also have access to other computed artifacts. One example of this

Yep! Fixed. BTW not sure if you are aware, but PushedRequests doesn't seem to be used?

but PushedRequests doesn't seem to be used?

haha, yes. I blame @samccone 🐐 :)

brendankenny · 2017-09-20T23:06:07Z

lighthouse-cli/test/smokehouse/seo/expectations.js

@@ -42,6 +45,9 @@ module.exports = [
      'meta-description': {
        score: false,
      },
+      'http-status-code': {
+        score: false,


looks like displayValue can also be asserted on this failure?

good point, added!

brendankenny · 2017-09-20T23:06:47Z

lighthouse-core/gather/computed/main-resource.js

+const HTTP_REDIRECT_CODE_LOW = 300;
+const HTTP_REDIRECT_CODE_HIGH = 399;
+
+class MainResource extends ComputedArtifact {


jsdoc description would be 👍

brendankenny · 2017-09-20T23:07:18Z

lighthouse-core/gather/computed/main-resource.js

+
+  /**
+   * @param {!Array<!WebInspector.NetworkRequest>} networkRecords
+   * @return {?WebInspector.NetworkRequest}


when can this be null? What should a caller assume in those cases?

I assumed it could happen in some edge cases (e.g. redirect loop? ssl error page?), but maybe I'm too paranoid? The audit that uses this artifact, in such case, will fail with a message 'Invalid MainResouce or its status code.'.

I assumed it could happen in some edge cases (e.g. redirect loop? ssl error page?), but maybe I'm too paranoid?

Didn't mean to imply it was overkill; I think it's just the right amount of kill :) I was hoping we could enumerate possible failure cases so we could put more specific docs on the computed artifact (and in the audit debugString).

In those two cases (redirect loop or ssl error page) I believe LH exit before getting to running audits, so I wonder if we should instead throw an error if no record is found. That would allow outputting the exact error case as a string without any audit that consumes it having to detect those cases and outputting their own debugString

@patrickhulce has strong feelings about Lighthouse errors vs site errors, but in this case I think we can say that if we don't have a main resource record it's either a pretty bad Lighthouse bug or a pretty bad Chrome bug, so I'm personally ok with throwing in this case (with specific info in error message so we can debug later)

Makes sense! I've changed it to throw when main resource cannot be identified and adjusted tests accordingly.

brendankenny · 2017-09-20T23:08:10Z

lighthouse-core/gather/computed/main-resource.js

+   * @return {?WebInspector.NetworkRequest}
+   */
+  compute_(networkRecords) {
+    return networkRecords.find(record => record.statusCode < HTTP_REDIRECT_CODE_LOW ||


I may be missing something obvious, but I'm not sure how this check is grabbing the main resource? :)

oh, is it that networkRecords is sorted? I guess it would have to be the first non-redirect. Would it be better to have a stronger check for this?

Yes, networkRecords are sorted. We can potentially do some additional checks:

we can check if request initiator is 'other' and

if type of resource is 'Document'.

WDYT?

These data is not available via networkRecords but we can get it directly from DevtoolsLog.

Hm, just found @wardpeet PR that deals with redirects too -
#3308 . We could also use CriticalRequestChains here and check for first resource that doesn't have redirected in request ID (instead of using status codes).

we may want to avoid using CriticalRequestChains as they're due for a refactor after more fast-mode dependency graph stuff lands.

@paulirish may want to weigh in on what he imagines as a useful artifact here, but going way back to #605 he had the idea of a redirect-chain artifact, which would work for both here and #3308 (and be much simpler to iterate over compared to the critical-request-chains data structure).

I'm also like 80% sure @paulirish has told me the definitive way to determine the main request record, but I don't recall what that was. I defer to your research in the meantime, but relying on array order alone does make me a little nervous.

Lighthouse inserts into the networkRecords array based on when a request is finished, and it seems like it's possible (maybe? I have no evidence either way :) that the preload scanner (or something else) could trigger a resource download that finishes before the main page finishes loading, and so would be inserted before it and trick the statusCode check. The other checks would help work around this.

Walking the redirect chain (and maybe explicitly matching redirected URLs with redirectSource like driver does for monitoring redirects) also seems like it would solve this problem well

Lighthouse inserts into the networkRecords array based on when a request is finished

whoops, I'm wrong. As of #2197 they are inserted as the request begins, so this may be fine.

A redirect-chain artifact -- with the last entry always the main resource -- may still be more useful in more places in LH, though.

so this may be fine.

so… we are back to "first non-redirect" heuristic?

redirect-chain artifact

NetworkRequest's have redirects property that exposes chain of redirects that led to that resource. So, you can do mainResoruce.redirects and get this:

this should be enough for #3308 (@wardpeet WDYT?). We can rework it to redirect-chain if needed, but IMO mainResource.redirects works well.

I'm also like 80% sure @paulirish has told me the definitive way to determine the main request record, but I don't recall what that was

looking in DT source i see some really basic approaches for finding "main resources"

if (networkManager && request.url() === networkManager.target().inspectedURL() && request.resourceType() === Common.resourceTypes.Document)

(roughly what resourcetreemodel is doing...) let url; // _resourcesMap populated ... target.on(SDK.ResourceTreeModel.Events.FrameNavigated, framePayload => url = framePayload.url); var mainResource = this._resourcesMap[url];

basically just looking at URL match.. and either excluding redirects via selecting just Resources or by only considering type==document. ok then.

so tldr there is no definitive way. what you have here lgtm

brendankenny · 2017-09-20T23:11:11Z

lighthouse-core/test/audits/seo/http-status-code-test.js

+ */
+'use strict';
+
+const Audit = require('../../../audits/seo/http-status-code.js');


nit: rename as httpStatusCodeAudit? It gets confusing when reading the tests if just generic Audit. (I regret we haven't switched all the rest of the audit tests off of using Audit for this)

👍 I went with HTTPStatusCodeAudit to be consistent with naming convention used in other tests.

brendankenny · 2017-09-20T23:22:00Z

lighthouse-core/test/gather/computed/main-resource-test.js

+      assert.equal(output, record);
+    });
+  });
+});


note we also have some devtoolsLogs and networkRecords in lighthouse-core/test/fixtures/, e.g. one from a wikipedia redirect that might be good for a test too

Hm… if we use that fixture, we will be also testing NetworkRecords artifact, so it won't be a fully independent unit test. Also, wikipedia fixture does a bunch of redirects, just like the test above - so, IMO we will be repeating the same test. WDYT?

Hm… if we use that fixture, we will be also testing NetworkRecords artifact, so it won't be a fully independent unit test. Also, wikipedia fixture does a bunch of redirects, just like the test above - so, IMO we will be repeating the same test. WDYT?

Wellllll, it depends on if you buy that Lighthouse tests are really unit tests. Testing philosophies aside, our unit tests haven't been pure since the beginning, and are really a mixture of unit and integration/functional tests, so this won't be ruining anything. :)

This mixture is especially useful for something as complicated as talking to Chrome over the debugger protocol, where any mock we write is in danger of simplifying too much (or, conversely, making realistic mocks too much work to write).

Having the tests of the idealized network records above and the unit tests of devtoolsLogs -> networkRecords (in network-recorder-test.js) means that we should be able to isolate future issues in each of those places quickly, while using real networkRecords/devtoolsLogs here means that we can do a test of these two or three units in concert much more cheaply and quickly than firing up a smokehouse test for each case.

And if we change the heuristic for finding the main resource, the unit tests will need to change but a still working wikipedia-redirect.devtoolslog.json test (and maybe the h2 push fixture?) will be a good indicator that the change was successful.

Everyone has different feelings on where these testing borders should be, and maybe we should have set up separate unit vs functional tests from the beginning, but here we are :) WDYT?

This absolutely makes sense, I just failed to realize that LH unit tests are a mix of different types of tests. I've added a test based on the wikipedia fixture.

…ased on wikipedia fixture.

kdzwinel · 2017-09-27T08:04:09Z

@brendankenny PTAL 🔎

brendankenny

LGTM 📞🚦

kdzwinel commented Sep 12, 2017

View reviewed changes

kdzwinel force-pushed the seo-http-status-code branch from 8217662 to 5c667bd Compare September 12, 2017 21:55

kdzwinel commented Sep 12, 2017

View reviewed changes

paulirish requested changes Sep 20, 2017

View reviewed changes

kdzwinel added 9 commits September 21, 2017 00:27

SEO: HTTP Status Code Audit - WIP

9e672e4

Add tests.

4c1b723

Add status code audit.

3c17dc1

Validate copy in tests, remove redundant data check.

7fa834c

Update copy

8c5943d

Add smoke tests.

23373de

Allow returning custom HTTP code from static-server. Update smokehous…

7535b5e

…e test accordingly.

Replace status code gatherer with a main resource computed artifact.

0a2ca8c

Make linter happy.

b16b37a

kdzwinel force-pushed the seo-http-status-code branch from adfc6f6 to b16b37a Compare September 20, 2017 22:38

kdzwinel commented Sep 20, 2017

View reviewed changes

Remove abandoned smoke test.

dbb5f51

brendankenny requested changes Sep 20, 2017

View reviewed changes

brendankenny reviewed Sep 20, 2017

View reviewed changes

Code review fixes.

711ce65

kdzwinel mentioned this pull request Sep 21, 2017

feat(redirects-audit): Adding Redirect audit (PSI Compat) #3308

Merged

Metaway mentioned this pull request Sep 21, 2017

null #3389

Closed

Throw when main resource can't be identified. Ajust tests. Add test b…

8fe5acd

…ased on wikipedia fixture.

paulirish approved these changes Sep 25, 2017

View reviewed changes

paulirish added the new_audit label Sep 27, 2017

brendankenny reviewed Sep 27, 2017

View reviewed changes

brendankenny approved these changes Sep 27, 2017

View reviewed changes

brendankenny merged commit 8e7695d into GoogleChrome:master Sep 27, 2017

brendankenny mentioned this pull request Sep 18, 2018

core(gather-runner): include error status codes in pageLoadError #6051

Merged

new SEO audit: HTTP status code #3311

new SEO audit: HTTP status code #3311

Conversation

kdzwinel commented Sep 12, 2017 • edited Loading

kdzwinel Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

kdzwinel Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

patrickhulce Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

brendankenny Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

kdzwinel Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

brendankenny Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendankenny Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

kdzwinel Sep 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel Sep 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel commented Sep 27, 2017

brendankenny left a comment

Choose a reason for hiding this comment

kdzwinel commented Sep 12, 2017 •

edited

Loading

kdzwinel Sep 12, 2017 •

edited

Loading

kdzwinel Sep 12, 2017 •

edited

Loading

patrickhulce Sep 12, 2017 •

edited

Loading

kdzwinel Sep 21, 2017 •

edited

Loading

brendankenny Sep 21, 2017 •

edited

Loading

kdzwinel Sep 21, 2017 •

edited

Loading

kdzwinel Sep 21, 2017 •

edited

Loading

brendankenny Sep 21, 2017 •

edited

Loading

brendankenny Sep 21, 2017 •

edited

Loading

kdzwinel Sep 24, 2017 •

edited

Loading

kdzwinel Sep 21, 2017 •

edited

Loading

kdzwinel Sep 21, 2017 •

edited

Loading