-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(jsonld): add structured data validation #6750
Conversation
Good news: jsonld published a version that uses the new optional native dependency! 🎉 Bad news: npm and yarn both install optional deps by default, which means everyone depending on LH today would still need to change their build configs to continue installing. -- What do we do?
|
package.json
Outdated
@@ -32,8 +32,9 @@ | |||
"unit-cli": "jest --runInBand \"lighthouse-cli/\"", | |||
"unit-cli:ci": "jest --runInBand --coverage --ci \"lighthouse-cli/\"", | |||
"unit-viewer": "mocha --reporter dot \"lighthouse-viewer/test/**/*-test.js\"", | |||
"unit": "yarn unit-core && yarn unit-cli && yarn unit-viewer", | |||
"unit:ci": "yarn unit-core:ci && yarn unit-cli:ci && yarn unit-viewer", | |||
"unit-sd-validation": "mocha --reporter dot \"sd-validation/test/**/*-test.js\"", |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
sd-validation/expand.js
Outdated
|
||
jsonld.expand(inputObject, { | ||
documentLoader, | ||
}, (/** @type {string} */e, /** @type {Object} **/expanded) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we need a few additions to tsconfig?
diff --git a/tsconfig.json b/tsconfig.json
index 6d57b103..dc5e1b14 100644
--- a/tsconfig.json
+++ b/tsconfig.json
@@ -21,11 +21,14 @@
"lighthouse-core/**/*.js",
"clients/**/*.js",
"build/**/*.js",
+ "sd-validation/**/*.js",
"./types/*.d.ts",
],
"exclude": [
"lighthouse-cli/test/**/*.js",
"lighthouse-core/test/**/*.js",
+ "sd-validation/test/**/*.js",
+ "sd-validation/scripts/**/*.js",
"clients/test/**/*.js",
]
}
sd-validation/expand.js
Outdated
/** @type {function(string):void} */ | ||
let reject; | ||
const promise = new Promise((res, rej) => { | ||
resolve = res; reject = rej; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont really follow this pattern much so i tried it using our more typical approach:
module.exports = function expand(inputObject) {
return new Promise((resolve, reject) => {
function documentLoader( /** @type {string} **/ schemaUrl, /** @type {function(null, Object):void} **/ callback) {
try {
return loadDocument(schemaUrl, callback);
} catch (e) {
reject(e.message);
}
}
jsonld.expand(inputObject, {documentLoader}, (/** @type {string} */e, /** @type {Object} **/expanded) => {
if (e) reject('Expansion error: ' + e.toString());
else resolve(expanded);
});
});
};
but I can't say its a clear win so w/e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'm inclined to leave as-is, I actually prefer removing the layer of nesting :)
sd-validation/expand.js
Outdated
try { | ||
return loadDocument(schemaUrl, callback); | ||
} catch (e) { | ||
reject(e.message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return reject IMO
sd-validation/expand.js
Outdated
documentLoader, | ||
}, (/** @type {string} */e, /** @type {Object} **/expanded) => { | ||
if (e) { | ||
reject('Expansion error: ' + e.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return reject
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out they have a promise-api, so I'm just switching to that instead
sd-validation/expand.js
Outdated
if (e) { | ||
reject('Expansion error: ' + e.toString()); | ||
} else { | ||
resolve(expanded); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
We discussed offline that peerDep doesn't totally make semantic sense and will spam warnings when installing. The unpkg solution scares me a bit. I'm favoring this one:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this new dep requirement and consumers have to live with --ignore-optional if they want to use LH and don't have a C++11 compiler
The problem is that almost no one is going to read this or do it. A lot of hard work has gone into this, but it still seems backwards to make lighthouse harder to use (and slower by default) to add this audit.
Maybe we should have a manual SEO audit pointing to a new @lighthouse/structured-data
plugin?
😍 😍 😍 😍 😍 😍 😍 I love this for so many reasons. |
@brendankenny is your suggestion to omit the SD audit from running in the default LH SEO category and include a link under "Manual audits" for users to install a new extension which will run the SD audit? |
Yeah. I realize the issues with that, but in the past we haven't pursued audits that would introduce a new reload pass, or wouldn't work in all clients, or would require calls out to third-party APIs. Landing this as-is is going to break downstream users who won't be reading the install instructions when they do a simple update and are deploying from a container or whatever, and going to slow down everyone else's install as we compile a native library. This isn't a make-or-break audit (there are already existing tools for structured data), and the fact that there's a lot of hard work here and that it would be useful for developers doesn't seem sufficient to justify that downside at this point. That said, plugins aren't really ready yet and we haven't figured out how any will be bundled (if they will be), so maybe there are other solutions we can look at in the meantime. This is the kind of thing |
Alright now that the dependency issue has been resolved, I think this is ready for another go around! 🎉 |
Waiting for this to get merged to move forward with the structured data audit. @brendankenny @paulirish Could you take another look? ❤️ |
sd-validation/json.js
Outdated
try { | ||
// jsonlint-mod always calls console.error when there's an error | ||
// We don't want this behavior, so we stash console.error while it's executing | ||
console.error = () => undefined; // eslint-disable-line no-console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should no longer be needed 🙂 circlecell/jsonlint-mod#2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hurray that's awesome! :D
⛰ |
I see some snapshotted schema.org data in https://github.com/GoogleChrome/lighthouse/tree/75b7a51a0eb19fc289338c799bae9074341f466c/sd-validation/assets ... what is the plan for keeping this up to date? (we've got into trouble before when people have generated artifacts from now-obsolete snapshots, e.g. see https://github.com/google/schemaorg-java ) |
(to be clear, this is all awesome, any nitpicking is in the spirit of wanting to help make it better, sustainable, and to fit in with our development plans for Schema.org) |
🏏 🏏 round 2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is looking good!
I see the value in doing this, and so would like to avoid coupling it too much to LH code, but I also recognize that we don't need to be overly optimizing for this hypothetical future case now that there's no clear permanent owner.
yeah, it feels a bit like that's not going to happen at this point :/
I think it could do fine in lib/sd-validation/
. It's confusing to have in a top level directory (I think), and really for now it's no more an independent library than manifest-parser
, which is just a little longer in loc and was actually also intended to "someday" be an independent module :)
We will have to be vigilant about keeping it componentized, but it has the benefit of being pretty naturally its own thing, so ideally it'll mostly take care of itself.
sd-validation/helpers/walk-object.js
Outdated
* @param {Array<string>} path | ||
*/ | ||
module.exports = function walkObject(obj, callback, path = []) { | ||
if (obj === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we care about guarding against non-objects passed in? we may not, but the type won't help prevent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything being passed in here is already *
or has been filtered out for null
already, so I don't think it matters
const path = require('path'); | ||
const fs = require('fs'); | ||
|
||
const SCHEMA_ORG_URL = 'https://schema.org/docs/jsonldcontext.json'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you fetch with "Accept: application/ld+json" headers, you don't need to use this magic URL. It has been documented before that this URL is for convenience only and should not be treated as an API. Tools will generally fetch the @context
value exactly as specified.
curl -H "Accept: application/ld+json" https://schema.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah neat thanks!! done 👍
Folks, I'm also interested in seeing LH have support for structured data validation. This patch is really great! I left a few small comments, and I'm happy to help review more in-depth. Question: this patch is very hard-coded to support only core schema.org vocab. If I was willing to put in the effort to update, would you accept changes to also support hosted/external schema.org extensions? I think it would be better to do this as a follow-up patch. |
Thanks @mmocny! I'd love to have someone that knows what they're doing invested in this :D (core team currently inherited this without complete context, so expertise would be very much appreciated) I'd definitely like to land this before further increasing the scope since it's been outstanding for quite some time, but supporting beyond schema.org if it's being actively consumed by search engines/crawlers/others sounds good to me! (so long as the tradeoff between usage and cost to Lighthouse users is sufficient of course 😄 ) |
I'm not sure I qualify as "knows whats they are doing", but I can try. How can I help get this finished? (Also, it looks like this patch has some issues with original authors not signing CLA. I don't know the history here...) |
That terminal CLA state happens when @brendankenny or @paulirish's suggested changes get applied to a non-googler's patches for some reason. We haven't yet been able to figure out why 🤷♂️ |
As soon as @brendankenny signs off IMO it's landable. If you'd like to review it for anything that you think would be a showstopper to landing please feel free to help him out :) |
That googlebot CLA warning happened immediately after my trivial doc fix suggestion got applied [1]. So perhaps my fault? I think I've done some Google CLA thing in the past but if I need to do it again here for a one-liner just to clear things up, let me know how. [1] #6750 (comment) |
Oh thanks @davidlehn just a simple "I consent to those edits being released under the Apache 2.0 license in this repo" is fine :) |
@davidlehn May you please give the OK to unblock this merge? |
I consent to my edits being released under the Apache 2.0 license in this repo. |
Thanks! I also think that @brendankenny comments from last review are addressed. I think this patch has addressed issues that have been brought up. I'm new to this feature, but anxious to see it land. Questions:
|
Yeah this was split out from #5446 and addressed anything outstanding at the time.
No it won't be, that will be followup work by @mattzeunert.
Probably, but we don't have one proposed. If you have any insight to share on how frequently these are likely to become out-of-date we'd love to hear it :)
Sounds great! :) |
can confirm that @davidlehn does indeed have a Google CLA |
sweet thanks @brendankenny! would you mind confirming your intention to review or not review? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the index.js
filename change, otherwise let's do this!
@@ -0,0 +1,75 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename sd-validator.js
(or whatever) so we don't add another index.js
file :)
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 |
itshappening.gif Huge thank you to @patrickhulce @brendankenny and everyone else who helped get this out! |
Summary
Split out the structured validation library portion from #5446.
Key changes from Konrad's implementation
Ignoring native dependency instead of modifying our build environmentUpdated to 1.5.0 so we don't need it anymore! 🎉Generated files are now a single-line, I'm not 100% sure these even need to be checked in if we had a prepublish/postinstall hook to generate them? I suppose it's safer though considering they're pulled from URLsReverted through reviewAdded a monkeypatch forUpdated to 1.7.4 so also not needed! 🎉console.error
becausejsonlint-mod
is annoyingNative Dependency Notes
jsonld
depends onrdf-canonize
which in versions<0.2.0
had a native C++ dependency. Personally I consider the existence of this native dependency a major PR blocker. I don't think all users that install Lighthouse should now be blocked on having C++11 on their system. We should try and getjsonld
published withrdf-canonize@0.3.0
which was the first version that made the native portion optional. Alternatives include just pulling in their UMD library from a CDN instead of using the NPM version.This has been resolved and they've removed their native dependency!! Hurray!!
Related Issues/PRs
#5446