-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new_audit(hreflang): document has a valid hreflang code #3815
Conversation
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none', | ||
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none', | ||
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'), | ||
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is getting quite nasty. Maybe we should move headers to an external file (e.g. seo-failure-cases.html?headers=seo-failure-headers.json
) or keep them here, but in a separate field (headers: ['Link: bla', 'x-robots-tag:none']
)? In the second case we could pass them to the static-server e.g. in a header (which would be a very meta thing to do). @brendankenny WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url
is compared to the LHR's url
property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url
expectations string too.
@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)
This is a js
file, so there's a few ways we could make this better:
- save parts of the URL to variables at the top of the file so they are just concatted here
- create some kind of
headersParam([])
function so that this could beurl: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none'])
or whatever - use the URL constructor to create the string and rely on implicit
toString
or savehref
to a variable up top and invoke down here - some combination of the above
'use strict'; | ||
|
||
const Audit = require('../audit'); | ||
const LinkHeader = require('http-link-header'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new dependency - https://github.com/jhermsmeier/node-http-link-header. It's a simple (~300LOC), well tested, link header value parser with no dependencies.
.then(nodes => nodes && | ||
Promise.all( | ||
nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')])) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
driver.querySelector
solution works well for simple getters but with driver.querySelectorAll
and multiple getAttribute
calls it's getting a bit less readable IMO. Maybe a simple injected script (driver.evaluateAsync
) would be a better fit here? Something like:
const links = document.querySelectorAll('head link[rel="alternate" i][hreflang]');
return Array.from(links).map(({href, hreflang}) => ({href, hreflang}));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions here, so I'll defer to others
evaluate async seems fine to me, but this also doesn't really seem that bad either
|
||
const Audit = require('../audit'); | ||
const LinkHeader = require('http-link-header'); | ||
const axeCore = require('axe-core'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out I can't import 'axe-core' as it wasn't imported anywhere else before and since it's huge, it blows our bundle budget 🤔
Since I'm using only a tiny part of 'axe-core' it seems like tree shaking should help here, but unfortunately we are not using import/export syntax ATM(which is required for tree shaking to work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance we can just import the languages file?
const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta observation: there's only ~17k possible 3-character codes and ~8k of them are valid
how much value is the list really providing over a /^[a-z]{2,3}$/
regex? are mistakes likely to result in a different invalid 3-character code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize require
can import single files that are not in the root folder of the package, thanks! I had to use a slight hack (global variable) to get things working though.
45% of 3-character codes and 25% of 2-character codes are valid - I agree that it's a lot, but IMO it still leaves a huge margin for people to make "catchable" mistakes. We may also extend this audit with region and script codes validation in the future. @rviscomi WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of regex pattern matching as a substitute for whitelist checking. Too many false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option is putting it in a different data structure when browserifying. A quick test of a pretty naive trie got it down to about 1/2 the size raw, 1/5 the size after gzip (19915 -> 3751 bytes). Being more clever (char codes, exclusions vs inclusions) can get the trie down to about 1/4 the size of the current file, but the gzip size doesn't really budge after that first drop.
Probably not worth pursuing at this point, but maybe worth leaving a note for when we go back looking for the low hanging fruit of shrinking bundle size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! seems pretty straightforward
|
||
const Audit = require('../audit'); | ||
const LinkHeader = require('http-link-header'); | ||
const axeCore = require('axe-core'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance we can just import the languages file?
const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')
function headerHasValidHreflangs(headerValue) { | ||
const linkHeader = LinkHeader.parse(headerValue); | ||
|
||
return linkHeader.get('rel', 'alternate') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always returns an array? strange api 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single Link
header can have multiple rel=alternate
links (e.g. <http://es.example.com/>; rel="alternate"; hreflang="es",<http://fr.example.com/>; rel="alternate"; Hreflang="fr-be"
), so IMO it makes sense to always return an array.
static get meta() { | ||
return { | ||
name: 'hreflang', | ||
description: 'Document has a valid `hreflang`', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be a property of an alternate
link rather than the entire document, or am I misunderstanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! @rviscomi WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I wrote the text, but I think I was trying to strike a balance between the hreflang being used as both an HTTP header value and HTML attribute value. Other audits also use "document" as the owner of the thing being audited, for example:
- Document has a
<meta name="viewport">
tag withwidth
orinitial-scale
. - Document uses legible font sizes.
- Document has a
<title>
element. - Document has a meta description.
- Document has a valid
rel=canonical
. - Document avoids plugins.
So I'm ok with the text as it is currently written but open to suggestions.
afterPass(options) { | ||
const driver = options.driver; | ||
|
||
return driver.querySelectorAll('head link[rel="alternate" i][hreflang]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
querySelectorAll
returns null
instead of []
!? we should fix that 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, it always returns an array. I've updated the code (and jsdoc) accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, no worries!
.then(nodes => nodes && | ||
Promise.all( | ||
nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')])) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions here, so I'll defer to others
evaluate async seems fine to me, but this also doesn't really seem that bad either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* Import list of valid languages from axe core without including whole axe-core package | ||
*/ | ||
function importValidLangs() { | ||
const axeCache = global.axe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :)
'XX-be', | ||
'XX-be-Hans', | ||
'', | ||
' es', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this invalid because of the whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, AFAIK it should be invalid in this case
assigning to you @brendankenny since konrad's got some good smokehouse points for ya :) |
@@ -12,6 +12,8 @@ | |||
<meta name="viewport" content="invalid-content=should_have_looked_it_up"> | |||
<!-- no <meta name="description" content=""> --> | |||
<meta name="robots" content="nofollow, NOINDEX, all"> | |||
<link rel="alternate" hreflang="xx" href="https://xx.example.com" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mind starting to mark these with PASS(audit-name): details if possible and succinct
(or FAIL
) like e.g. in
lighthouse/lighthouse-cli/test/fixtures/byte-efficiency/tester.html
Lines 69 to 73 in 6cd1b41
<!-- FAIL(optimized): image is not JPEG optimized --> | |
<!-- FAIL(webp): image is not WebP optimized --> | |
<!-- PASS(responsive): image is used at full size --> | |
<!-- FAIL(offscreen): image is offscreen --> | |
<img style="position: absolute; top: -10000px;" src="lighthouse-unoptimized.jpg"> |
helpful for going back later and knowing what touches what test/audit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Added two comments here and one in the seo-tester.html
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none', | ||
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none', | ||
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'), | ||
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url
is compared to the LHR's url
property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url
expectations string too.
@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)
This is a js
file, so there's a few ways we could make this better:
- save parts of the URL to variables at the top of the file so they are just concatted here
- create some kind of
headersParam([])
function so that this could beurl: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none'])
or whatever - use the URL constructor to create the string and rely on implicit
toString
or savehref
to a variable up top and invoke down here - some combination of the above
} | ||
|
||
/** | ||
* @param {String} hreflang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase string
/boolean
for all of these (this will soon actually be checked! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated! Can't wait for your PR to land 👌
|
||
return artifacts.requestMainResource(devtoolsLogs) | ||
.then(mainResource => { | ||
/** @type {Array<{source: String|Object}>} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array<{source: {type: string, snippet: string}}>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, although it's actually Array<{source: string|{type: string, snippet: string}}>
because array item is an object for failing nodes, and a string for failing headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brendankenny Thank you for a review! I addressed your comments. PTAL.
@@ -12,6 +12,8 @@ | |||
<meta name="viewport" content="invalid-content=should_have_looked_it_up"> | |||
<!-- no <meta name="description" content=""> --> | |||
<meta name="robots" content="nofollow, NOINDEX, all"> | |||
<link rel="alternate" hreflang="xx" href="https://xx.example.com" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Added two comments here and one in the seo-tester.html
} | ||
|
||
/** | ||
* @param {String} hreflang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated! Can't wait for your PR to land 👌
|
||
return artifacts.requestMainResource(devtoolsLogs) | ||
.then(mainResource => { | ||
/** @type {Array<{source: String|Object}>} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, although it's actually Array<{source: string|{type: string, snippet: string}}>
because array item is an object for failing nodes, and a string for failing headers.
'XX-be', | ||
'XX-be-Hans', | ||
'', | ||
' es', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, AFAIK it should be invalid in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brendan's comments look addressed so i'm going to go ahead and merge.
Closes #3177
Failing audit:
Successful audit: