Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit: json-ld audit #8328

Closed
wants to merge 47 commits into from
Closed
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
47e7427
Structured-data-automatic-audit
mattzeunert Apr 16, 2019
14e5d76
Name tweak
mattzeunert Apr 16, 2019
19e5511
File extension for require
mattzeunert Apr 16, 2019
e407bba
Tweak description
patrickhulce Apr 16, 2019
5be7b74
PR feedback
mattzeunert Apr 16, 2019
7da5fa5
Merge branch 'master' into structured-data-automatic-2
mattzeunert Apr 16, 2019
a1470cf
Fix sample json
mattzeunert Apr 16, 2019
39377d0
Merge branch 'structured-data-automatic-2' of https://github.com/Goog…
mattzeunert Apr 16, 2019
f7cb682
Move type parsing into sd validation module
mattzeunert Apr 16, 2019
b7cdf44
Update sample json
mattzeunert Apr 16, 2019
c9c8672
Fix browserify build
mattzeunert Apr 17, 2019
e0563f9
Fix validTypes test
mattzeunert Apr 17, 2019
b19564c
Merge branch 'master' into structured-data-automatic-2
mattzeunert Apr 17, 2019
3d4633f
Remove rawValue
mattzeunert Apr 17, 2019
ea65f67
Not async anymore
mattzeunert Apr 21, 2019
416d60b
More i18n
mattzeunert Apr 21, 2019
cacabd2
Revert "More i18n"
mattzeunert Apr 21, 2019
0ff4070
Type instead of ||
mattzeunert Apr 21, 2019
1c15871
Explain why no snippet title i18n
mattzeunert Apr 22, 2019
82d2785
Types for JsonLD
mattzeunert Apr 26, 2019
46ad970
Tweak displaystring
mattzeunert Apr 26, 2019
2d42a4c
Merge branch 'master' into structured-data-automatic-2
mattzeunert Apr 26, 2019
534f10f
Rename structured-data-automatic to json-ld
mattzeunert Apr 26, 2019
1243e70
Basic test case for when entity name is an object
mattzeunert Apr 27, 2019
6e9a8d8
Merge branch 'master' into structured-data-automatic-2
mattzeunert May 2, 2019
41c8e28
Merge branch 'master' into structured-data-automatic-2
mattzeunert May 14, 2019
ad58799
Add JSON-LD failure to sample json
mattzeunert May 14, 2019
96cf12d
Revert "Add JSON-LD failure to sample json"
mattzeunert May 15, 2019
4d8198a
Try undo merge
mattzeunert May 15, 2019
fc3b034
Merge branch 'master' into structured-data-automatic-2
mattzeunert May 15, 2019
f65b807
Add JSON-LD failure to sample json
mattzeunert May 14, 2019
991bf3c
Description tweaks
mattzeunert May 17, 2019
87e2897
Merge branch 'master' into structured-data-automatic-2
mattzeunert May 18, 2019
aac8717
Merge branch 'master' into structured-data-automatic-2
paulirish Aug 8, 2019
2ac1ace
sample json
paulirish Aug 8, 2019
8d63bf2
Addresses comments for JSON-LD audit
Aug 13, 2019
055065a
Fixes snippet theme for dark mode
Aug 13, 2019
7a3ad59
Adjusts strings for JSON-LD in sample json
Sep 16, 2019
b578c23
Merge branch 'master' into structured-data-automatic-2
AVGP Sep 16, 2019
53a932d
Fixes SEO smoke test expectations
Sep 16, 2019
72fa128
Merge branch 'master' into structured-data-automatic-2
paulirish Jan 28, 2020
d84fb24
update smoke expectations
devtools-bot Jan 28, 2020
9e74c1f
add expectation for failing jsonld details
devtools-bot Jan 28, 2020
e51903b
eslint
paulirish Jan 28, 2020
0bad7c4
Merge branch 'master' into structured-data-automatic-2
paulirish Apr 14, 2020
88a234d
color fixups with yuin's design
paulirish Apr 14, 2020
8849076
bump bundlesize numbers
paulirish Apr 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion build/build-bundle.js
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,14 @@ async function browserifyFile(entryPath, distPath) {
.ignore('raven')
.ignore('mkdirp')
.ignore('rimraf')
.ignore('pako/lib/zlib/inflate.js');
.ignore('pako/lib/zlib/inflate.js')
.ignore('file') // required by jsonlint-mod
.ignore('system'); // required by jsonlint-mod

// There is no way to add './doug-json-parse' to ignored packages via public API
// w/o browserify resolving the path into an absolute path
AVGP marked this conversation as resolved.
Show resolved Hide resolved
// @ts-ignore
bundle._ignore.push('./doug-json-parse');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a recipe for struggles on next browserify update 😬

maybe it's time to investigate moving to something better maintained/more flexible post 5.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a bit concerned about this but I think whatever we end up doing will be hacky in some way 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a test that the bundle output does not contain some code in douge-json-parse? so we remember to fix this when it break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a test that the bundle output does not contain some code in douge-json-parse? so we remember to fix this when it break.

the build throws if we forget this, so it will definitely fail the tests :)

The issue is that jsonlint-mod's bundled js has a reference to this file, which isn't included in the module. Browserify throws when it tries to resolve it.


// Don't include the desktop protocol connection.
bundle.ignore(require.resolve('../lighthouse-core/gather/connections/cri.js'));
Expand Down
8 changes: 8 additions & 0 deletions lighthouse-cli/test/cli/__snapshots__/index-test.js.snap
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,9 @@ Object {
Object {
"path": "seo/canonical",
},
Object {
"path": "seo/structured-data-automatic",
},
Object {
"path": "seo/manual/structured-data",
},
Expand Down Expand Up @@ -972,6 +975,11 @@ Object {
"id": "canonical",
"weight": 1,
},
Object {
"group": "seo-content",
"id": "structured-data-automatic",
"weight": 1,
},
Object {
"group": "seo-mobile",
"id": "font-size",
Expand Down
8 changes: 8 additions & 0 deletions lighthouse-cli/test/fixtures/seo/seo-failure-cases.html
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@
<link rel="alternate" href="http://example.com/" hreflang=" x-default" />
<!-- FAIL(canonical): multiple canonical URLs provided (another one is in the HTTP header) -->
<link rel="canonical" href="https://example.com/other" />
<!-- FAIL(structured-data-automatic): invalid @type -->
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "CatConvention",
AVGP marked this conversation as resolved.
Show resolved Hide resolved
"name": "Cat Global"
}
</script>
</head>
<body>
<h1>SEO</h1>
Expand Down
8 changes: 8 additions & 0 deletions lighthouse-cli/test/fixtures/seo/seo-tester.html
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@
<link rel="alternate" href="http://example.com/" hreflang="x-default" />
<!-- PASS(canonical): valid canonical URL -->
<link rel="canonical" href="http://localhost:10200/seo/" />
<!-- PASS(structured-data-automatic): valid JSON-LD -->
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Event",
"name": "Cat Global"
}
</script>

<style>
.small {
Expand Down
6 changes: 6 additions & 0 deletions lighthouse-cli/test/smokehouse/seo/expectations.js
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ module.exports = [
score: null,
scoreDisplayMode: 'notApplicable',
},
'structured-data-automatic': {
score: 1,
},
},
}},
{
Expand Down Expand Up @@ -141,6 +144,9 @@ module.exports = [
score: 0,
explanation: 'Multiple conflicting URLs (https://example.com/other, https://example.com/)',
},
'structured-data-automatic': {
score: 0,
},
},
},
},
Expand Down
165 changes: 165 additions & 0 deletions lighthouse-core/audits/seo/structured-data-automatic.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
/**
* @license Copyright 2019 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const Audit = require('../audit.js');
const validateJsonLD = require('../../lib/sd-validation/sd-validation.js');
const i18n = require('../../lib/i18n/i18n.js');

const UIStrings = {
/** Title of a Lighthouse audit that provides detail on whether JSON-LD structured data snippets are valid. This descriptive title is shown when no invalid JSON-LD snippets were found. */
title: 'JSON-LD structured data syntax is valid',
/** Title of a Lighthouse audit that provides detail on whether JSON-LD structured data snippets are valid. This descriptive title is shown when JSON-LD snippets with invalid content were found. */
failureTitle: 'JSON-LD structured data syntax is invalid',
/** Description of a Lighthouse audit that tells the user whether JSON-LD snippets on the page are invalid. This is displayed after a user expands the section to see more. No character length limits. */
/* eslint-disable-next-line max-len */
description: 'Structured data contains rich metadata about a web page. The data is used in search results and social sharing. Invalid metadata will affect how the page appears in these contexts. This audit currently validates a subset of JSON-LD rules. See also the manual audit below to learn how to validate other types of structured data.',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just include the link to the tool here rather than force them to go to the other audit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the message for the manual structured data audit:

Run the Structured Data Testing Tool and the Structured Data Linter to validate structured data. Learn more.

The Structured Data Linter doesn't look very welcoming, so maybe we can remove it and just link the SDTT from both audits.

Do we even need the manual audit? If we removed it we'd need to change the title of the automatic audit to not only mention JSON-LD. But maybe that's misleading, if we say your structured data is fine but we only checked JSON-LD?

@AVGP any thoughts on this?

(Related: should we call the audit structured-data-automatic internally or just json-ld?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @AVGP for removal of the manual one. json-ld seems like a reasonable audit id to me. Do we foresee this growing in scope to include other types of data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the audit to json-ld.

Will bring up description/manual audit with @AVGP again in a week or two, but shouldn't be merge blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates after talking to @AVGP:

  • removed the link to the Structured Data Linter
  • link to SDTT from json-ld audit description

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think there's value in having the manual audit as people get confused with "valid JSON-LD" vs. "Valid SD for rich results" and such, plus the SDTT catches most of that, so linking directly to it is a good idea.

/** Explanatory message stating what percentage of JSON-LD structured data snippets are invalid */
displayValue: '{validSnippetProportion, number, percent} valid snippets',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not positive that percentage is what we care most about here, maybe we just report # invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "N invalid snippets"

};

const str_ = i18n.createMessageInstanceIdFn(__filename, UIStrings);

class StructuredDataAutomatic extends Audit {
/**
* @return {LH.Audit.Meta}
*/
static get meta() {
return {
id: 'structured-data-automatic',
title: str_(UIStrings.title),
failureTitle: str_(UIStrings.failureTitle),
description: str_(UIStrings.description),
requiredArtifacts: ['ScriptElements'],
};
}

/**
* @param {LH.Artifacts} artifacts
* @return {Promise<LH.Audit.Product>}
*/
static async audit(artifacts) {
const jsonLDElements = artifacts.ScriptElements.filter(
script => script.type === 'application/ld+json' && !!script.content);

if (jsonLDElements.length === 0) {
return {
notApplicable: true,
score: 1,
};
}

const validatedSnippets = await Promise.all(jsonLDElements.map(async (element) => {
// We don't want to show empty lines around the snippet
const content = /** @type string */ (element.content).trim();

return {
devtoolsNodePath: element.devtoolsNodePath,
content,
errors: await validateJsonLD(content),
};
}));
// Show invalid snippets at the top
validatedSnippets.sort((a, b) => {
return b.errors.length - a.errors.length;
});

const renderedSnippets = validatedSnippets.map(
snippetWithErrors => renderValidatedSnippet(snippetWithErrors)
);
const details = Audit.makeListDetails(renderedSnippets);

const invalidSnippets = validatedSnippets.filter(vs => vs.errors.length > 0);
const validSnippets = validatedSnippets.filter(vs => vs.errors.length === 0);
const displayValue = str_(UIStrings.displayValue, {
validSnippetProportion: validSnippets.length / jsonLDElements.length,
});

return {
score: invalidSnippets.length === 0 ? 1 : 0,
details,
displayValue,
};
}
}

/**
* @param {{content: string, devtoolsNodePath: string, errors: LH.StructuredData.ValidationError[]}} validatedSnippet
*/
function renderValidatedSnippet(validatedSnippet) {
const {content, devtoolsNodePath, errors} = validatedSnippet;

let parsedContent;
let topLevelType;
let topLevelName;
try {
parsedContent = JSON.parse(content);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse a type for this somewhere from sd-validation types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda like ExpandedSchemaRepresentationItem – is there actually anything special about that type being Expanded?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't totally remember :) buuuuuuut I think it's just that there are more compact forms/shortcuts/references that get resolved to their actual values?

https://json-ld.org/spec/latest/json-ld-api/#expansion

You've been much deeper in this more recently than me though, so I'll leave it to your judgement. We are trying to access the @type and name properties though so have some type would be nice

Copy link
Contributor Author

@mattzeunert mattzeunert Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a type for it now. Turns out we weren't actually returning a valid ExpandedSchemaRepresentation from expand, so fixed that type too.

I don't totally remember :) buuuuuuut I think it's just that there are more compact forms/shortcuts/references that get resolved to their actual values?

That's a good point, ideally we should use the expanded representation and look for "http://schema.org/name" on there. What do you think of having the the validator return an object with errors and expandedSchema properties, instead of just the errors array?

Fwiw, I've never actually seen anyone do anything fancy with JSON-LD in the wild.


For reference, here are some examples of fancy JSON-LD we can use as test cases. (SDTT doesn't think the language map one is valid, but I think it's wrong.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch!! Yeah that sounds reasonable to me, though maybe if it's not particularly common we punt for now. Just file an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed: #8668

topLevelType = parsedContent['@type'];
topLevelName = parsedContent['name'];
} catch (err) {}

let title = '';
if (topLevelName && topLevelType) {
title = `${topLevelType}: ${topLevelName}`;
} else if (topLevelType) {
title = `@type ${topLevelType}`;
} else {
title = 'Invalid JSON-LD element';
}
// No 18n here, because it's tricky to do because of the if statement above. The
// entity type and error messages are in English anyway.
title += ` (${errors.length} Error${errors.length !== 1 ? 's' : ''})`;
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved

/** @type LH.Audit.Details.NodeValue */
const node = {
type: 'node',
path: devtoolsNodePath,
snippet: `<script type="application/ld+json">`,
};

const {lineMessages, generalMessages} = getErrorMessages(errors);

return Audit.makeSnippetDetails({
content: parsedContent ? JSON.stringify(parsedContent, null, 2) : content,
title,
lineMessages,
generalMessages,
node,
});
}

/**
* @param {Array<LH.StructuredData.ValidationError>} errors
*/
function getErrorMessages(errors) {
/** @type {LH.Audit.Details.SnippetValue['lineMessages']} */
const lineMessages = [];
/** @type {LH.Audit.Details.SnippetValue['generalMessages']} */
const generalMessages = [];
errors.forEach(({
message, lineNumber, validTypes, validator,
}) => {
if (validTypes && validator === 'schema-org') {
const typeStrings = validTypes.map(type => {
return `[${type.name}](${type.uri})`;
});
message = `Invalid ${typeStrings.join('/')}: ${message}`;
}

if (lineNumber) {
lineMessages.push({lineNumber, message});
} else {
generalMessages.push({
message,
});
}
});

return {lineMessages, generalMessages};
}

module.exports = StructuredDataAutomatic;
module.exports.UIStrings = UIStrings;
2 changes: 2 additions & 0 deletions lighthouse-core/config/default-config.js
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ const defaultConfig = {
'seo/hreflang',
'seo/plugins',
'seo/canonical',
'seo/structured-data-automatic',
'seo/manual/structured-data',
],

Expand Down Expand Up @@ -465,6 +466,7 @@ const defaultConfig = {
{id: 'robots-txt', weight: 1, group: 'seo-crawl'},
{id: 'hreflang', weight: 1, group: 'seo-content'},
{id: 'canonical', weight: 1, group: 'seo-content'},
{id: 'structured-data-automatic', weight: 1, group: 'seo-content'},
{id: 'font-size', weight: 1, group: 'seo-mobile'},
{id: 'plugins', weight: 1, group: 'seo-content'},
{id: 'tap-targets', weight: 1, group: 'seo-mobile'},
Expand Down
16 changes: 16 additions & 0 deletions lighthouse-core/lib/i18n/en-US.json
Original file line number Diff line number Diff line change
Expand Up @@ -927,6 +927,22 @@
"message": "robots.txt is valid",
"description": "Title of a Lighthouse audit that provides detail on the site's robots.txt file. Note: \"robots.txt\" is a canonical filename and should not be translated. This descriptive title is shown when the robots.txt file is present and configured correctly."
},
"lighthouse-core/audits/seo/structured-data-automatic.js | description": {
"message": "Structured data contains rich metadata about a web page. The data is used in search results and social sharing. Invalid metadata will affect how the page appears in these contexts. This audit currently validates a subset of JSON-LD rules. See also the manual audit below to learn how to validate other types of structured data.",
"description": "Description of a Lighthouse audit that tells the user whether JSON-LD snippets on the page are invalid. This is displayed after a user expands the section to see more. No character length limits."
},
"lighthouse-core/audits/seo/structured-data-automatic.js | displayValue": {
"message": "{validSnippetProportion, number, percent} valid snippets",
"description": "Explanatory message stating what percentage of JSON-LD structured data snippets are invalid"
},
"lighthouse-core/audits/seo/structured-data-automatic.js | failureTitle": {
"message": "JSON-LD structured data syntax is invalid",
"description": "Title of a Lighthouse audit that provides detail on whether JSON-LD structured data snippets are valid. This descriptive title is shown when JSON-LD snippets with invalid content were found."
},
"lighthouse-core/audits/seo/structured-data-automatic.js | title": {
"message": "JSON-LD structured data syntax is valid",
"description": "Title of a Lighthouse audit that provides detail on whether JSON-LD structured data snippets are valid. This descriptive title is shown when no invalid JSON-LD snippets were found."
},
"lighthouse-core/audits/seo/tap-targets.js | description": {
"message": "Interactive elements like buttons and links should be large enough (48x48px), and have enough space around them, to be easy enough to tap without overlapping onto other elements. [Learn more](https://developers.google.com/web/fundamentals/accessibility/accessible-styles#multi-device_responsive_design).",
"description": "Description of a Lighthouse audit that tells the user why buttons and links need to be big enough and what 'big enough' means. This is displayed after a user expands the section to see more. No character length limits. 'Learn More' becomes link text to additional documentation."
Expand Down
17 changes: 12 additions & 5 deletions lighthouse-core/lib/sd-validation/schema-validator.js
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,18 @@ function validateObjectKeys(typeOrTypes, keys) {
// remove Schema.org input/output constraints http://schema.org/docs/actions.html#part-4
.map(key => key.replace(/-(input|output)$/, ''))
.filter(key => !allKnownProps.has(key))
.map(key => ({
message: `Unexpected property "${key}"`,
key,
validTypes: types,
}));
.map(key => {
return ({
message: `Unexpected property "${key}"`,
key,
validTypes: types.map(typeUri => {
const typeNameMatch = typeUri.match(/[\w]+$/);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

const name = typeNameMatch ? typeNameMatch[0] : typeUri;
const uri = typeUri.replace('http://', 'https://');
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
return {name, uri};
}),
});
});
}

/**
Expand Down
Loading