-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] and [node] badges; also affects [apm appveyor cdnjs clojars gem] #1743
Conversation
Generated by 🚫 dangerJS |
I've not attempted to do a full review as its still WIP, but here's a few comments based on your original post and a read-through of the code: I think the If the schema stops validating because the upstream API has changed (or because we've made a poor choice of regex and need to tweak it), does this make the error any less visible, or more difficult to debug? Do you think its worth thinking about logging or sending something to sentry when a response fails to validate? That seems like something we want to be aware of in most cases. Is this kind of validation going to be flexible enough to deal with situations where we have to accept multiple different patterns (I'm thinking of stuff like this and this ). Its really annoying, but it happens and we need to deal with it.. Do you think there's any danger that adding the concept of design by contact here introduces additional friction for new contributors? As you rightly point out in #1272 one of the strengths of this project is the wide potential contributor base. I think I'm probably on the side that this is OK because funamentally we're not asking anyone to learn anything they don't need to know in order to write tests. Maybe the perspective of another maintainer would be useful here. (slight side issue: One of the things I've started to think about is updating the tutorial to reference the new structure as I think at the moment we're adding new badges to
The most common case is where we support one badge, or a sufficiently small number that defining them in one file isn't an issue, but it would be nice if we could also easily split large services across multiple files and split the test suites up more easily. It may also be necessary to change the danger.js rule I've defined. It might be a bit naive and need tweaking to account for the changes? You've probably already considered these, but I'll mention them for completeness:
|
services/base.js
Outdated
async _requestJson(url, options = {}, notFoundMessage) { | ||
return this._sendAndCacheRequest(url, | ||
{...{ 'headers': { 'Accept': 'application/json' } }, ...options} | ||
).then( | ||
checkErrorResponse.asPromise( | ||
notFoundMessage ? { notFoundMessage: notFoundMessage } : undefined | ||
) | ||
).then(asJson); | ||
).then(asJson | ||
).then(json => this.constructor.validateResponse(json)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any tradeoff in calling this.constructor.staticMethod()
vs ClassName.staticMethod()
, or is it just a stylistic consideration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it allows subclasses to redefine the class method being invoked, and have the expected thing happen.
# Conflicts: # package.json
Yes, agreed!
That seems reasonable. While I suspect some failures will be transient, probably others will be permanent. Given strict schema validation could cause some new failures, Sentry has tools to basically snooze errors. seems like erring on the side of logging is fine. I wonder if there’s a way we can pipe the useful context, like the end user’s request parameters, to Sentry.
Yea, agreed with all of this. I’m concerned about introducing additional friction, too. I like the argument you make: that we’re not asking people to learn anything they don’t already need to learn in order to write tests. Another reason I think it’s justified is that there isn't a more accessible alternative. Trying to write validation code from scratch may seem easier, but it's harder to get right, and the code reviews would frustrate most contributors. We’re committed to providing useful error messages, and one of the things we need to do is distinguish programmer errors from response errors. In other words, it’s a necessary hurdle, basically endemic to the problem Shields is solving. Would appreciate other perspectives. Definitely think it’s a good idea to start a tutorial for the new-style services. Since they’re not yet feature-complete I’m unsure whether to recommend new contributors try to write them. Maybe we could start by adding a new section in the readme. |
At the moment I think we're still at the stage where each time one the core team does a chunk of refactoring we end up evolving the structure a little bit as we hit more edge cases. I'd like to be at a stage where we've refactored a couple more services without needing to tinker with the base classes first. I think it will also be useful to have a decent number of examples people can use as a base to crib code from. We're close though.. |
Would anyone else like to weigh in about the requirement to write Joi contracts for JSON responses? @platan @PyvesB @RedSparr0w |
services/npm/npm-base.js
Outdated
const registryTag = tag || 'latest'; | ||
const latestVersion = json['dist-tags'][registryTag]; | ||
packageData = json.versions[latestVersion]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about this could be restructured to use fetch, validate, transform, render. I'm realizing also that there's a problem with the way this is written now. When the response comes back invalid, it's possible to crash above during plucking, before we hit the validateResponse
call.
I'm thinking about how to restructure this.
- It seems right that
handle
should invokefetch
andrender
. - Either of these seem okay: that
handle
invokevalidateResponse
, or that it's handled byfetch
. - It seems good for
handle
be be responsible to invoketransform
, and not so good for that to be handled byfetch
. - It would be good to avoid writing boilerplate in
handle
. However the more things happen magically, the more difficult services are to read. This is the basic argument for "code over configuration": everything is more direct.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After mulling this over a while, and playing with a couple different versions, I'm leaning toward this approach:
- Require validation and handle it in
_requestJson()
. - Rather than make fetch–transform–render into a framework, make it a design pattern. That gives us the benefits of a clear structure, including testability, with less magic.
- Invoke
render()
with a props object that is already cleaned and transformed. Do the transform work directly inhandle()
or delegate to a statictransform()
method if there's a testability benefit. - Add
render()
to BaseService for generate static examples without api call [apm appveyor cdnjs clojars gem npm uptimerobot] #1740.
A few things I'd like to improve:
- Since
NpmBase
has a method for creating the URL pattern, also give it a method for unpacking the URL too. This should reduce errors when the signature changes. - As discussed, send invalidated responses to Sentry.
- Easier debugging, especially of the contracts, by adding some kind of logging and debugging instrumentation.
Curious to see how this version reads!
I find that having contracts for API responses is a good idea. Nevertheless, should we try to "hide" all Joi-based checks in a separate class, a bit like we do with our test validators?
Those are my thoughts on this matter. 😉 |
Yea, this makes sense, and is pretty much what this is trying to do. Service implementers have to write the contracts, but the framework will run them.
Most of the shared Joi contracts we have now are "leaf nodes" – a string that could be this value or that value, a positive integer, as you say, etc. I'm cool with continuing to use those for leaf nodes that get repeated. It saves typing and improves readability. Many of these are also for shared aspects of the Shields interface (i.e. the output of our For higher level contracts – structural things like arrays of certain length, choice of two options, and possibly even relationships between objects within the structure, if we re-wrap all of the functionality we need, that's a lot of interface to design and documentation to write. If the problem is that we don't like the Joi documentation or primitives, maybe it would be better to shop around for another library. Or, we could use shared contracts for "leaf nodes" in common. I don't think it's practical to abstract away Joi without creating a lot of extra work for ourselves. Another option would be to write a tutorial for writing simple contracts with Joi. That could ease the learning curve for contract newbies, and make for a nicer transition into the Joi API docs. |
# Conflicts: # lib/all-badge-examples.js
# Conflicts: # lib/all-badge-examples.js # server.js
These are now covered by the test on the base service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it has taken a while to give this another read-though. Its quite a big refactor so it requires carving out a bit of time to look over it properly. I've left a few more comments but this is heading in a great direction.
services/npm/npm.js
Outdated
NpmVersion, | ||
NpmLicense, | ||
NpmTypeDefinitions, | ||
].concat(NpmDownloads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can now be removed in favour of renaming the files as per #1797.
Similarly /node/node.js
wants to be /node/node.service.js
.
} | ||
} | ||
|
||
async _requestJson({ schema, url, options = {}, notFoundMessage }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth explicitly checking schema
has been passed here? If we pass undefined
to Joi.validate()
, it will throw AssertionError [ERR_ASSERTION]: Invalid schema content:
but I wonder if it is worth throwing a 'friendlier' error in that case? I can see that being a common pitfall for new contributors. Would another way to 'enforce' this be for schema
to be a class property instead of a function param. e.g:
static schema() {
throw new Error('must define schema');
}
Also, it seems like there will be a certain amount of cases where we do actually want to pass Joi.any()
(it looks like that is what you are doing for NpmLicense
and NpmTypeDefinitions
in this PR). We definitely don't want that to be the default case but is it worth providing some kind of 'shorthand' for that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, yea, I had the schema in as a class property before (static get responseSchema
) but ended up making it a parameter. The schema compilation step is slow, so we don't want to compile a new schema every time we have a new request. Since we don't have class properties yet, that means we need to memoize it somehow, or else define it outside the class. Passing it in the call to _requestJson
seems like less typing, is just as clear, and is possible to use conditionally.
Joi.any()
seems like a good shorthand to me. Were you thinking something like shouldValidate: false
? I'd be okay with that. However I don't want contributors to think that validating is optional. 😉 Maybe not having a shortcut is okay.
I'm good with adding a check with a better error message. Probably we should check that it's a Joi schema. I think Joi supports compiling from object literals, though constructing them all literally is self-documenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema compilation step is slow, so we don't want to compile a new schema every time we have a new request.
We've got enough performance problems at the moment. Lets not make that any worse ;)
Maybe not having a shortcut is okay.
OK. Lets ignore that for now. Once we've refactored some more stuff we'll have an idea of how common this case is going to be.
I'm good with adding a check with a better error message. Probably we should check that it's a Joi schema.
Yeah, lets ensure that schema
exists and is an instance of Joi
and throw an error which provides a hint if not.
services/base.js
Outdated
@@ -23,6 +25,12 @@ class BaseService { | |||
this._handleInternalErrors = handleInternalErrors; | |||
} | |||
|
|||
static render(props, namedParams, queryParams) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of when we need to pass namedParams, queryParams
to render()
? I don't think they get used anywhere in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number formatting, or work like in #1321 is an example. Would it be better to pass that stuff through props
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how if we went with the 'framework' approach, it would be useful to make these available here, but given we have to manually implement render()
, construct all the stuff we want to pass it and call it ourselves, I think we should probably just explicitly pass what we need into props
.
I think this is a good way to look at it. One of the things I've found helpful as part of the review is to also look also look at what happens if we refactor a very simple service. I think this latest refactor strikes a good balance between giving us the structure to handle complex cases like this but without over-complicating very simple cases. Also it means if we are just doing something non-standard it is easier to deal with.
I think its fine for this to 'evolve' in the same way service test validators have. I wouldn't be in favour of building a complete wrapper or inner platform over |
Thanks for sharing that link! |
Should we drop empty contracts into the remaining services and address those in follow-on PRs? |
Yep - lets not increase the scope of this PR too much. If we do the minimum required to get the other new-style services (APM, Appveyor, Cdnjs, Clojars, Ruby Gems) working with the new base class structure here then we can look at improving those as another job. |
Sounds great. I'll address two remaining comments (schema instanceof Joi; pass only props to render), shim the new-style services, and aim to get this in quickly. |
Addressed remaining items. Should be good for a last review! @chris48s |
Hey @chris48s, would you like me to wait until you have a chance to review again, or shall I go ahead and merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a quick look over the latest commits and I reckon this is good to go 👍 . Lets get this merged. As next steps, I will have a look at fleshing out the validation on the other services, revisit #1740 and continue to refactor some more services in this style.
Great. Thanks for reviewing! I’ll tackle some service refactoring too. Let’s coordinate in #1358 so we don’t duplicate work! |
When JSON responses come back, they are sometimes not in the format expected by the API. As a result we have a lot of defensive coding (expressions like
(data || {}).someProperty
) to avoid exceptions being thrown in badge processing. Often we rely on thetry
blocks that wrap so much of the badge-processing code, which catch all JavaScript exceptions and return some error result, usually invalid. The problem with this is that thesetry
blocks catch all sorts of programmer errors too, so when we see invalid we don't know whether the API returned something unexpected, or we've made a mistake. We also spend a lot of time writing defensive tests around malformed responses, and creating and maintaining the defensive coding.A better solution is to validate the API responses using declarative contracts. Here the programmer says exactly what they expect from the API. That way, if the response isn't what we expect we can just say it's an invalid json response. And if our code then throws an exception, well that's our mistake; when we catch that we can call it a shields internal error. It's also less code and less error-prone. Over time we may be confident enough in the contracts that we won't need so many tests of malformed responses. The contract doesn't need to describe the entire response, only the part that's needed. Unknown keys can simply be dropped, preventing unvalidated parts of the response from creeping into the code. Checking what's in our response before calling values on it also makes our code more secure.
I used Joi here, since we're already using it for testing. There may be another contracts library that's a better fit, though I think we could look at that later.
Those changes are in base.js.
The rest is a rewrite of the remaining NPM badges, including the extraction of an NpmBase class. Inspired by @chris48s's work in #1740, this class splits the service concerns into fetching and plucking, validation, and rendering. There are two URL patterns, one which allows specifying a tag (used by e.g. the version badge
https://img.shields.io/npm/v/npm/next.svg
), and the other which does not accept a tag (e.g. the license badgehttps://img.shields.io/npm/l/express.svg
). Subclasses like NpmLicense and NpmTypeDefinitions can specify the URL fragment, examples, the validation schema for the chunk of the package data they use, and a render function. The NpmVersion subclass uses a different endpoint, so it overrides thehandle
implementation from NpmBase.Rewriting these was curiously difficult. There were a few factors, I think: a large paradigm shift, lots of code duplication to sift through, red herring abstractions (abstractions that already exist but are actually unhelpful in the new paradigm), and new problems to solve (namely validation). I leaned heavily on the thorough tests. Thank goodness for those!
Sort of aside: seeing a service like NPM with a lot of services inside it, I wonder if we should change the file layout of our services a bit. It would be nice not to have to write
npm/npm.js
which just loads up all the other test classes. Maybe instead of e.g.appveyor/appveyor.js
we should call itappveyor/appveyor.service.js
, and when a service has more than one file with service classes, they could be callednpm/npm-license.service.js
,npm/npm-version.service.js
, etc. Also I often wanted to run just the license tests, or just the version tests. Shorter test files – maybe separatenpm/npm-version.tester.js
– would make that easier to manage.Todo before merging: