Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(seo): properly handle anchors in SVG #6483

Merged
merged 7 commits into from
Nov 6, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions lighthouse-cli/test/fixtures/seo/seo-failure-cases.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ <h2>Anchor text</h2>
<a href='https://example.com'>click this</a>
<a href='/test.html'> click this </a>
<a href='/test.html'>CLICK THIS</a>
<svg>
<a href='https://example.com/svg-link'>
<circle cx="50" cy="50" r="50"/>
<text x="0" y="0" >click this</text>
</a>
</svg>

<!-- FAIL(plugins): java applet on page -->
<applet code=HelloWorld.class width="200" height="200" id='applet'>
Expand Down
4 changes: 2 additions & 2 deletions lighthouse-cli/test/smokehouse/seo/expectations.js
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,10 @@ module.exports = [
},
'link-text': {
score: 0,
displayValue: '3 links found',
displayValue: '5 links found',
details: {
items: {
length: 3,
length: 5,
},
},
},
Expand Down
11 changes: 10 additions & 1 deletion lighthouse-core/audits/seo/link-text.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
const Audit = require('../audit');
const URL = require('../../lib/url-shim');
const BLOCKLIST = new Set([
'', // no description is also bad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have a big effect on a lot of anchors. Not sure if we want to pull this trigger without doing more research first. For now can we fix the svg > a failure without changing the audit scoring? We can open a separate FR to weigh in on auditing for empty anchor text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll revert 👍

'click here',
'click this',
'go',
Expand Down Expand Up @@ -41,14 +42,22 @@ class LinkText extends Audit {
static audit(artifacts) {
const failingLinks = artifacts.CrawlableLinks
.filter(link => {
const href = link.href.toLowerCase();
if (
link.href.toLowerCase().startsWith('javascript:') ||
href.startsWith('javascript:') ||
href.startsWith('mailto:') ||
URL.equalWithExcludedFragments(link.href, artifacts.URL.finalUrl)
) {
return false;
}

return BLOCKLIST.has(link.text.trim().toLowerCase());
})
.map(link => {
return {
href: link.href,
text: link.text.trim().length ? link.text : '<NO TEXT>',
};
});

const headings = [
Expand Down
19 changes: 15 additions & 4 deletions lighthouse-core/gather/gatherers/seo/crawlable-links.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,30 @@ class CrawlableLinks extends Gatherer {
* @param {LH.Gatherer.PassContext} passContext
* @return {Promise<LH.Artifacts['CrawlableLinks']>}
*/
afterPass(passContext) {
async afterPass(passContext) {
const expression = `(function() {
${pageFunctions.getElementsInDocumentString}; // define function on page
const resolveURLOrNull = url => {
try { return new URL(url, window.location.href).href; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__nativeURL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do ya one better, I'll enable useIsolation :)

catch (_) { return null; }
};

const selector = 'a[href]:not([rel~="nofollow"])';
const elements = getElementsInDocument(selector);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

getElementsInDocument(selector).filter(e => __ElementMatches.call(e, 'svg a'))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure what you're requesting, like we separate them in two different arrays? or just that we use .matches to replace the instanceof check?

Copy link
Member

@paulirish paulirish Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm that was the idea but the fact that resolving these baseVal is so annoying makes it kinda not worth it.

actually how about

const elements = getElementsInDocument(selector).filter(e => __ElementMatches.call(e, 'svg a'))
return elements.map(node => ({
  href: new __URL(node.getAttribute('href'), window.location.href).toString(),  // but with try/catch
  text: // ...
        // oh but then links in SVG don't have innerText
        // node.innerText || node.textContent  ??

okay yeah this is all terrible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree it's all terrible, I started with node.innerText || node.textContent but then got worried about potential accidental fallbacks with script/style tag contents when it's empty and whatnot, wanted to keep it as narrow as possible.

return elements
.map(node => ({
href: node.href,
text: node.innerText
href: node.href instanceof SVGAnimatedString ?
resolveURLOrNull(node.href.baseVal) :
node.href,
text: node.href instanceof SVGAnimatedString ?
node.textContent :
node.innerText,
}));
})()`;

return passContext.driver.evaluateAsync(expression);
/** @type {LH.Artifacts['CrawlableLinks']} */
const links = await passContext.driver.evaluateAsync(expression, {useIsolation: true});
return links.filter(link => typeof link.href === 'string' && link.href);
}
}

Expand Down
3 changes: 2 additions & 1 deletion lighthouse-core/lib/page-functions.js
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,14 @@ function checkTimeSinceLastLongTask() {
*/
/* istanbul ignore next */
function getElementsInDocument(selector) {
const realMatchesFn = window.__ElementMatches || window.Element.prototype.matches;
/** @type {Array<Element>} */
const results = [];

/** @param {NodeListOf<Element>} nodes */
const _findAllElements = nodes => {
for (let i = 0, el; el = nodes[i]; ++i) {
if (!selector || window.__ElementMatches.call(el, selector)) {
if (!selector || realMatchesFn.call(el, selector)) {
results.push(el);
}
// If the element has a shadow root, dig deeper.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,21 @@ describe('SEO: link text audit', () => {
assert.equal(auditResult.rawValue, true);
});

it('ignores mailto: links', () => {
const artifacts = {
URL: {
finalUrl: 'https://example.com/page.html',
},
CrawlableLinks: [
{href: 'mailto:info@example.com', text: 'click here'},
{href: 'mailto:mailmaster@localhost', text: 'click here'},
],
};

const auditResult = LinkTextAudit.audit(artifacts);
assert.equal(auditResult.rawValue, true);
});

it('passes when all links have descriptive texts', () => {
const artifacts = {
URL: {
Expand Down