new_audit(hreflang): document has a valid hreflang code #3815

kdzwinel · 2017-11-12T23:17:34Z

Failing audit:

Successful audit:

kdzwinel · 2017-11-12T23:26:35Z

lighthouse-cli/test/smokehouse/seo/expectations.js

-    initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
-    url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
+    initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
+    url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),


this is getting quite nasty. Maybe we should move headers to an external file (e.g. seo-failure-cases.html?headers=seo-failure-headers.json) or keep them here, but in a separate field (headers: ['Link: bla', 'x-robots-tag:none'])? In the second case we could pass them to the static-server e.g. in a header (which would be a very meta thing to do). @brendankenny WDYT?

url is compared to the LHR's url property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url expectations string too.

@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)

This is a js file, so there's a few ways we could make this better:

save parts of the URL to variables at the top of the file so they are just concatted here

create some kind of headersParam([]) function so that this could be url: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none']) or whatever

use the URL constructor to create the string and rely on implicit toString or save href to a variable up top and invoke down here

some combination of the above

kdzwinel · 2017-11-12T23:28:45Z

lighthouse-core/audits/seo/hreflang.js

+'use strict';
+
+const Audit = require('../audit');
+const LinkHeader = require('http-link-header');


new dependency - https://github.com/jhermsmeier/node-http-link-header. It's a simple (~300LOC), well tested, link header value parser with no dependencies.

kdzwinel · 2017-11-12T23:42:25Z

lighthouse-core/gather/gatherers/seo/hreflang.js

+      .then(nodes => nodes &&
+        Promise.all(
+          nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')]))
+        )


driver.querySelector solution works well for simple getters but with driver.querySelectorAll and multiple getAttribute calls it's getting a bit less readable IMO. Maybe a simple injected script (driver.evaluateAsync) would be a better fit here? Something like:

const links = document.querySelectorAll('head link[rel="alternate" i][hreflang]'); return Array.from(links).map(({href, hreflang}) => ({href, hreflang}));

I don't have strong opinions here, so I'll defer to others

evaluate async seems fine to me, but this also doesn't really seem that bad either

kdzwinel · 2017-11-13T10:59:32Z

lighthouse-core/audits/seo/hreflang.js

+
+const Audit = require('../audit');
+const LinkHeader = require('http-link-header');
+const axeCore = require('axe-core');


turns out I can't import 'axe-core' as it wasn't imported anywhere else before and since it's huge, it blows our bundle budget 🤔

Since I'm using only a tiny part of 'axe-core' it seems like tree shaking should help here, but unfortunately we are not using import/export syntax ATM(which is required for tree shaking to work).

any chance we can just import the languages file?

const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')

meta observation: there's only ~17k possible 3-character codes and ~8k of them are valid

how much value is the list really providing over a /^[a-z]{2,3}$/ regex? are mistakes likely to result in a different invalid 3-character code?

Didn't realize require can import single files that are not in the root folder of the package, thanks! I had to use a slight hack (global variable) to get things working though.

45% of 3-character codes and 25% of 2-character codes are valid - I agree that it's a lot, but IMO it still leaves a huge margin for people to make "catchable" mistakes. We may also extend this audit with region and script codes validation in the future. @rviscomi WDYT?

Not a fan of regex pattern matching as a substitute for whitelist checking. Too many false positives.

One option is putting it in a different data structure when browserifying. A quick test of a pretty naive trie got it down to about 1/2 the size raw, 1/5 the size after gzip (19915 -> 3751 bytes). Being more clever (char codes, exclusions vs inclusions) can get the trie down to about 1/4 the size of the current file, but the gzip size doesn't really budge after that first drop.

Probably not worth pursuing at this point, but maybe worth leaving a note for when we go back looking for the low hanging fruit of shrinking bundle size.

patrickhulce

nice! seems pretty straightforward

patrickhulce · 2017-11-13T17:30:44Z

lighthouse-core/audits/seo/hreflang.js

+
+const Audit = require('../audit');
+const LinkHeader = require('http-link-header');
+const axeCore = require('axe-core');


any chance we can just import the languages file?

const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')

patrickhulce · 2017-11-13T17:32:06Z

lighthouse-core/audits/seo/hreflang.js

+function headerHasValidHreflangs(headerValue) {
+  const linkHeader = LinkHeader.parse(headerValue);
+
+  return linkHeader.get('rel', 'alternate')


always returns an array? strange api 🤔

Single Link header can have multiple rel=alternate links (e.g. <http://es.example.com/>; rel="alternate"; hreflang="es",<http://fr.example.com/>; rel="alternate"; Hreflang="fr-be"), so IMO it makes sense to always return an array.

patrickhulce · 2017-11-13T17:32:48Z

lighthouse-core/audits/seo/hreflang.js

+  static get meta() {
+    return {
+      name: 'hreflang',
+      description: 'Document has a valid `hreflang`',


seems to be a property of an alternate link rather than the entire document, or am I misunderstanding?

Good point! @rviscomi WDYT?

It's been a while since I wrote the text, but I think I was trying to strike a balance between the hreflang being used as both an HTTP header value and HTML attribute value. Other audits also use "document" as the owner of the thing being audited, for example:

Document has a <meta name="viewport"> tag with width or initial-scale.

Document uses legible font sizes.

Document has a <title> element.

Document has a meta description.

Document has a valid rel=canonical.

Document avoids plugins.

So I'm ok with the text as it is currently written but open to suggestions.

patrickhulce · 2017-11-13T17:36:30Z

lighthouse-core/gather/gatherers/seo/hreflang.js

+  afterPass(options) {
+    const driver = options.driver;
+
+    return driver.querySelectorAll('head link[rel="alternate" i][hreflang]')


querySelectorAll returns null instead of []!? we should fix that 😆

My bad, it always returns an array. I've updated the code (and jsdoc) accordingly.

Oh ok, no worries!

patrickhulce · 2017-11-13T17:37:32Z

lighthouse-core/gather/gatherers/seo/hreflang.js

+      .then(nodes => nodes &&
+        Promise.all(
+          nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')]))
+        )


I don't have strong opinions here, so I'll defer to others

evaluate async seems fine to me, but this also doesn't really seem that bad either

patrickhulce

lgtm!

patrickhulce · 2017-11-15T17:09:28Z

lighthouse-core/audits/seo/hreflang.js

+ * Import list of valid languages from axe core without including whole axe-core package
+ */
+function importValidLangs() {
+  const axeCache = global.axe;


patrickhulce · 2017-11-15T17:13:35Z

lighthouse-core/test/audits/seo/hreflang.js

+      'XX-be',
+      'XX-be-Hans',
+      '',
+      '  es',


is this invalid because of the whitespace?

yes, AFAIK it should be invalid in this case

patrickhulce · 2017-11-15T17:17:00Z

assigning to you @brendankenny since konrad's got some good smokehouse points for ya :)

brendankenny · 2017-11-15T22:55:48Z

lighthouse-cli/test/fixtures/seo/seo-failure-cases.html

@@ -12,6 +12,8 @@
  <meta name="viewport" content="invalid-content=should_have_looked_it_up">
  <!-- no <meta name="description" content=""> -->
  <meta name="robots" content="nofollow, NOINDEX, all">
+  <link rel="alternate" hreflang="xx" href="https://xx.example.com" />


do you mind starting to mark these with PASS(audit-name): details if possible and succinct (or FAIL) like e.g. in

lighthouse/lighthouse-cli/test/fixtures/byte-efficiency/tester.html

Lines 69 to 73 in 6cd1b41









<img style="position: absolute; top: -10000px;" src="lighthouse-unoptimized.jpg">

when possible?

helpful for going back later and knowing what touches what test/audit

Good idea! Added two comments here and one in the seo-tester.html

brendankenny · 2017-11-15T23:03:30Z

lighthouse-cli/test/smokehouse/seo/expectations.js

-    initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
-    url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
+    initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
+    url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),


url is compared to the LHR's url property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url expectations string too.

@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)

This is a js file, so there's a few ways we could make this better:

save parts of the URL to variables at the top of the file so they are just concatted here

create some kind of headersParam([]) function so that this could be url: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none']) or whatever

use the URL constructor to create the string and rely on implicit toString or save href to a variable up top and invoke down here

some combination of the above

brendankenny · 2017-11-15T23:22:26Z

lighthouse-core/audits/seo/hreflang.js

+}
+
+/**
+ * @param {String} hreflang 


lowercase string/boolean for all of these (this will soon actually be checked! :)

Updated! Can't wait for your PR to land 👌

brendankenny · 2017-11-15T23:23:07Z

lighthouse-core/audits/seo/hreflang.js

+
+    return artifacts.requestMainResource(devtoolsLogs)
+      .then(mainResource => {
+        /** @type {Array<{source: String|Object}>} */


Array<{source: {type: string, snippet: string}}>?

Updated, although it's actually Array<{source: string|{type: string, snippet: string}}> because array item is an object for failing nodes, and a string for failing headers.

kdzwinel

@brendankenny Thank you for a review! I addressed your comments. PTAL.

kdzwinel · 2017-11-18T00:50:09Z

lighthouse-cli/test/fixtures/seo/seo-failure-cases.html

@@ -12,6 +12,8 @@
  <meta name="viewport" content="invalid-content=should_have_looked_it_up">
  <!-- no <meta name="description" content=""> -->
  <meta name="robots" content="nofollow, NOINDEX, all">
+  <link rel="alternate" hreflang="xx" href="https://xx.example.com" />


Good idea! Added two comments here and one in the seo-tester.html

kdzwinel · 2017-11-18T00:52:04Z

lighthouse-core/audits/seo/hreflang.js

+}
+
+/**
+ * @param {String} hreflang 


Updated! Can't wait for your PR to land 👌

kdzwinel · 2017-11-18T00:54:04Z

lighthouse-core/audits/seo/hreflang.js

+
+    return artifacts.requestMainResource(devtoolsLogs)
+      .then(mainResource => {
+        /** @type {Array<{source: String|Object}>} */


Updated, although it's actually Array<{source: string|{type: string, snippet: string}}> because array item is an object for failing nodes, and a string for failing headers.

kdzwinel · 2017-11-18T00:54:58Z

lighthouse-core/test/audits/seo/hreflang.js

+      'XX-be',
+      'XX-be-Hans',
+      '',
+      '  es',


yes, AFAIK it should be invalid in this case

paulirish

brendan's comments look addressed so i'm going to go ahead and merge.

kdzwinel added 3 commits November 12, 2017 02:16

Hreflang gatherer and audit

8b5f8d1

Add smokehouse test, fix config

fdba9e4

Remove duplicated test, improve test names

c9f1afe

kdzwinel requested review from brendankenny, patrickhulce and paulirish as code owners November 12, 2017 23:17

kdzwinel commented Nov 12, 2017

View reviewed changes

kdzwinel commented Nov 13, 2017

View reviewed changes

patrickhulce reviewed Nov 13, 2017

View reviewed changes

vinamratasingal-zz added this to the Sprint Quatro: November 13-26 milestone Nov 13, 2017

Address code review comments.

6860d84

patrickhulce approved these changes Nov 15, 2017

View reviewed changes

patrickhulce assigned brendankenny Nov 15, 2017

brendankenny requested changes Nov 15, 2017

View reviewed changes

brendankenny added the waiting4committer label Nov 17, 2017

kdzwinel added 3 commits November 18, 2017 01:57

Address code review comments

6c3ff27

move comment

95699f5

Add comment about axe-core import

a55fccd

kdzwinel commented Nov 18, 2017

View reviewed changes

devtools-bot added waiting4reviewer and removed waiting4committer labels Nov 27, 2017

paulirish modified the milestones: Sprint Quatro: November 13-26, Sprint Cinco: November 28 - Dec 9 Nov 27, 2017

paulirish approved these changes Nov 29, 2017

View reviewed changes

paulirish merged commit 6910f5d into GoogleChrome:master Nov 29, 2017

paulirish mentioned this pull request Nov 29, 2017

tests(smokehouse): adopt URLSearchParams for querystring manipulation #3941

Merged

dependencies bot mentioned this pull request Dec 17, 2017

Update lighthouse in / from 2.5.0 to 2.7.0 chauncey-garrett/dotfiles#57

Open

paulirish removed the waiting4reviewer label Mar 6, 2018

	<!-- FAIL(optimized): image is not JPEG optimized -->
	<!-- FAIL(webp): image is not WebP optimized -->
	<!-- PASS(responsive): image is used at full size -->
	<!-- FAIL(offscreen): image is offscreen -->
	<img style="position: absolute; top: -10000px;" src="lighthouse-unoptimized.jpg">

new_audit(hreflang): document has a valid hreflang code #3815

new_audit(hreflang): document has a valid hreflang code #3815

Conversation

kdzwinel commented Nov 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendankenny Nov 17, 2017 • edited Loading

Choose a reason for hiding this comment

patrickhulce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickhulce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickhulce commented Nov 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdzwinel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulirish left a comment

Choose a reason for hiding this comment

kdzwinel Nov 13, 2017 •

edited

Loading

brendankenny Nov 17, 2017 •

edited

Loading