Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] and [node] badges; also affects [apm appveyor cdnjs clojars gem] #1743

Merged
merged 36 commits into from
Aug 8, 2018

Conversation

paulmelnikow
Copy link
Member

@paulmelnikow paulmelnikow commented Jun 19, 2018

When JSON responses come back, they are sometimes not in the format expected by the API. As a result we have a lot of defensive coding (expressions like (data || {}).someProperty) to avoid exceptions being thrown in badge processing. Often we rely on the try blocks that wrap so much of the badge-processing code, which catch all JavaScript exceptions and return some error result, usually invalid. The problem with this is that these try blocks catch all sorts of programmer errors too, so when we see invalid we don't know whether the API returned something unexpected, or we've made a mistake. We also spend a lot of time writing defensive tests around malformed responses, and creating and maintaining the defensive coding.

A better solution is to validate the API responses using declarative contracts. Here the programmer says exactly what they expect from the API. That way, if the response isn't what we expect we can just say it's an invalid json response. And if our code then throws an exception, well that's our mistake; when we catch that we can call it a shields internal error. It's also less code and less error-prone. Over time we may be confident enough in the contracts that we won't need so many tests of malformed responses. The contract doesn't need to describe the entire response, only the part that's needed. Unknown keys can simply be dropped, preventing unvalidated parts of the response from creeping into the code. Checking what's in our response before calling values on it also makes our code more secure.

I used Joi here, since we're already using it for testing. There may be another contracts library that's a better fit, though I think we could look at that later.

Those changes are in base.js.

The rest is a rewrite of the remaining NPM badges, including the extraction of an NpmBase class. Inspired by @chris48s's work in #1740, this class splits the service concerns into fetching and plucking, validation, and rendering. There are two URL patterns, one which allows specifying a tag (used by e.g. the version badge https://img.shields.io/npm/v/npm/next.svg), and the other which does not accept a tag (e.g. the license badge https://img.shields.io/npm/l/express.svg). Subclasses like NpmLicense and NpmTypeDefinitions can specify the URL fragment, examples, the validation schema for the chunk of the package data they use, and a render function. The NpmVersion subclass uses a different endpoint, so it overrides the handle implementation from NpmBase.

Rewriting these was curiously difficult. There were a few factors, I think: a large paradigm shift, lots of code duplication to sift through, red herring abstractions (abstractions that already exist but are actually unhelpful in the new paradigm), and new problems to solve (namely validation). I leaned heavily on the thorough tests. Thank goodness for those!

Sort of aside: seeing a service like NPM with a lot of services inside it, I wonder if we should change the file layout of our services a bit. It would be nice not to have to write npm/npm.js which just loads up all the other test classes. Maybe instead of e.g. appveyor/appveyor.js we should call it appveyor/appveyor.service.js, and when a service has more than one file with service classes, they could be called npm/npm-license.service.js, npm/npm-version.service.js, etc. Also I often wanted to run just the license tests, or just the version tests. Shorter test files – maybe separate npm/npm-version.tester.js – would make that easier to manage.

Todo before merging:

  • Add support for query string values in rendering of example badges
  • Update other affected services
    • Appveyor
    • Cdnjs
    • Clojars
    • Ruby Gems

@paulmelnikow paulmelnikow added the core Server, BaseService, GitHub auth, Shared helpers label Jun 19, 2018
@shields-ci
Copy link

shields-ci commented Jun 19, 2018

Warnings
⚠️

This PR modified service code for apm but not its test code. That's okay so long as it's refactoring existing code.

⚠️

This PR modified service code for appveyor but not its test code. That's okay so long as it's refactoring existing code.

⚠️

This PR modified service code for cdnjs but not its test code. That's okay so long as it's refactoring existing code.

⚠️

This PR modified service code for clojars but not its test code. That's okay so long as it's refactoring existing code.

⚠️

This PR modified service code for gem but not its test code. That's okay so long as it's refactoring existing code.

Messages
📖

✨ Thanks for your contribution to Shields, @paulmelnikow!

📖

Thanks for contributing to our documentation. We ❤️ our documentarians!

Generated by 🚫 dangerJS

@chris48s
Copy link
Member

I've not attempted to do a full review as its still WIP, but here's a few comments based on your original post and a read-through of the code:

I think the fetch, validate, render pattern here is good in most cases. In some cases where we're doing a lot of processing, it might also be useful for us to think about the steps as fetch, validate, transform, render.

If the schema stops validating because the upstream API has changed (or because we've made a poor choice of regex and need to tweak it), does this make the error any less visible, or more difficult to debug? Do you think its worth thinking about logging or sending something to sentry when a response fails to validate? That seems like something we want to be aware of in most cases.

Is this kind of validation going to be flexible enough to deal with situations where we have to accept multiple different patterns (I'm thinking of stuff like this and this ). Its really annoying, but it happens and we need to deal with it..

Do you think there's any danger that adding the concept of design by contact here introduces additional friction for new contributors? As you rightly point out in #1272 one of the strengths of this project is the wide potential contributor base. I think I'm probably on the side that this is OK because funamentally we're not asking anyone to learn anything they don't need to know in order to write tests. Maybe the perspective of another maintainer would be useful here. (slight side issue: One of the things I've started to think about is updating the tutorial to reference the new structure as I think at the moment we're adding new badges to server.js faster than we're refactoring but I'm also conscious that we're evolving the structure as we go and we need to have the API nailed down a bit more first). Its worth thinking about how to document this in an accessible way, anyway.

seeing a service like NPM with a lot of services inside it, I wonder if we should change the file layout of our services a bit

The most common case is where we support one badge, or a sufficiently small number that defining them in one file isn't an issue, but it would be nice if we could also easily split large services across multiple files and split the test suites up more easily. It may also be necessary to change the danger.js rule I've defined. It might be a bit naive and need tweaking to account for the changes?

You've probably already considered these, but I'll mention them for completeness:

  • Since you're refactoring npm code, [npm] should be in the PR title.
  • In order to merge this, you'll need to update the other service code which extend BaseJsonService, otherwise all the badges we've already refactored will throw Error: Subclasses should define a response schema. For reference, the other services that will need to be updated are:
    • Appveyor
    • Cdnjs
    • Clojars
    • Ruby Gems

services/base.js Outdated
async _requestJson(url, options = {}, notFoundMessage) {
return this._sendAndCacheRequest(url,
{...{ 'headers': { 'Accept': 'application/json' } }, ...options}
).then(
checkErrorResponse.asPromise(
notFoundMessage ? { notFoundMessage: notFoundMessage } : undefined
)
).then(asJson);
).then(asJson
).then(json => this.constructor.validateResponse(json));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any tradeoff in calling this.constructor.staticMethod() vs ClassName.staticMethod(), or is it just a stylistic consideration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it allows subclasses to redefine the class method being invoked, and have the expected thing happen.

@paulmelnikow paulmelnikow changed the title [RFC] Add Joi-based request validation to BaseJsonService and rewrite NPM badges [RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] badges Jul 19, 2018
# Conflicts:
#	package.json
@paulmelnikow
Copy link
Member Author

I think the fetch, validate, render pattern here is good in most cases. In some cases where we're doing a lot of processing, it might also be useful for us to think about the steps as fetch, validate, transform, render.

Yes, agreed!

If the schema stops validating because the upstream API has changed (or because we've made a poor choice of regex and need to tweak it), does this make the error any less visible, or more difficult to debug? Do you think its worth thinking about logging or sending something to sentry when a response fails to validate? That seems like something we want to be aware of in most cases.

That seems reasonable. While I suspect some failures will be transient, probably others will be permanent. Given strict schema validation could cause some new failures, Sentry has tools to basically snooze errors. seems like erring on the side of logging is fine. I wonder if there’s a way we can pipe the useful context, like the end user’s request parameters, to Sentry.

Do you think there's any danger that adding the concept of design by contact here introduces additional friction for new contributors? As you rightly point out in #1272 one of the strengths of this project is the wide potential contributor base. I think I'm probably on the side that this is OK because funamentally we're not asking anyone to learn anything they don't need to know in order to write tests. Maybe the perspective of another maintainer would be useful here. (slight side issue: One of the things I've started to think about is updating the tutorial to reference the new structure as I think at the moment we're adding new badges to server.js faster than we're refactoring but I'm also conscious that we're evolving the structure as we go and we need to have the API nailed down a bit more first). Its worth thinking about how to document this in an accessible way, anyway.

Yea, agreed with all of this. I’m concerned about introducing additional friction, too. I like the argument you make: that we’re not asking people to learn anything they don’t already need to learn in order to write tests. Another reason I think it’s justified is that there isn't a more accessible alternative. Trying to write validation code from scratch may seem easier, but it's harder to get right, and the code reviews would frustrate most contributors. We’re committed to providing useful error messages, and one of the things we need to do is distinguish programmer errors from response errors. In other words, it’s a necessary hurdle, basically endemic to the problem Shields is solving.

Would appreciate other perspectives. Definitely think it’s a good idea to start a tutorial for the new-style services. Since they’re not yet feature-complete I’m unsure whether to recommend new contributors try to write them. Maybe we could start by adding a new section in the readme.

@chris48s
Copy link
Member

Definitely think it’s a good idea to start a tutorial for the new-style services. Since they’re not yet feature-complete I’m unsure whether to recommend new contributors try to write them.

At the moment I think we're still at the stage where each time one the core team does a chunk of refactoring we end up evolving the structure a little bit as we hit more edge cases. I'd like to be at a stage where we've refactored a couple more services without needing to tinker with the base classes first. I think it will also be useful to have a decent number of examples people can use as a base to crib code from. We're close though..

@paulmelnikow
Copy link
Member Author

Would anyone else like to weigh in about the requirement to write Joi contracts for JSON responses? @platan @PyvesB @RedSparr0w

const registryTag = tag || 'latest';
const latestVersion = json['dist-tags'][registryTag];
packageData = json.versions[latestVersion];
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about this could be restructured to use fetch, validate, transform, render. I'm realizing also that there's a problem with the way this is written now. When the response comes back invalid, it's possible to crash above during plucking, before we hit the validateResponse call.

I'm thinking about how to restructure this.

  • It seems right that handle should invoke fetch and render.
  • Either of these seem okay: that handle invoke validateResponse, or that it's handled by fetch.
  • It seems good for handle be be responsible to invoke transform, and not so good for that to be handled by fetch.
  • It would be good to avoid writing boilerplate in handle. However the more things happen magically, the more difficult services are to read. This is the basic argument for "code over configuration": everything is more direct.

🤔

Copy link
Member Author

@paulmelnikow paulmelnikow Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After mulling this over a while, and playing with a couple different versions, I'm leaning toward this approach:

  • Require validation and handle it in _requestJson().
  • Rather than make fetch–transform–render into a framework, make it a design pattern. That gives us the benefits of a clear structure, including testability, with less magic.
  • Invoke render() with a props object that is already cleaned and transformed. Do the transform work directly in handle() or delegate to a static transform() method if there's a testability benefit.
  • Add render() to BaseService for generate static examples without api call [apm appveyor cdnjs clojars gem npm uptimerobot] #1740.

A few things I'd like to improve:

  • Since NpmBase has a method for creating the URL pattern, also give it a method for unpacking the URL too. This should reduce errors when the signature changes.
  • As discussed, send invalidated responses to Sentry.
  • Easier debugging, especially of the contracts, by adding some kind of logging and debugging instrumentation.

Curious to see how this version reads!

@PyvesB
Copy link
Member

PyvesB commented Jul 22, 2018

I find that having contracts for API responses is a good idea.

Nevertheless, should we try to "hide" all Joi-based checks in a separate class, a bit like we do with our test validators?

  • this will help reusability and prevent code duplication.
  • this will allow us to more easily swap out Joi for another library, rather than having to revisit each single service file (related to @paulmelnikow comment "There may be another contracts library that's a better fit, though I think we could look at that later.").
  • this will help out out new contributors and hopefully minimise friction. They will be able to use easily understandable functions (isPositiveInteger, isLicenseList, etc.) without having to necessarily dig into the Joi documentation. If they do have to implement a new validator for an API, it will be much easier for them to have all our existing contracts in one place rather than having to run around the code base to get inspiration.

Those are my thoughts on this matter. 😉

@paulmelnikow
Copy link
Member Author

Nevertheless, should we try to "hide" all Joi-based checks in a separate class, a bit like we do with our test validators?

Yea, this makes sense, and is pretty much what this is trying to do. Service implementers have to write the contracts, but the framework will run them.

  • this will help out out new contributors and hopefully minimise friction. They will be able to use easily understandable functions (isPositiveInteger, isLicenseList, etc.) without having to necessarily dig into the Joi documentation. If they do have to implement a new validator for an API, it will be much easier for them to have all our existing contracts in one place rather than having to run around the code base to get inspiration.

Most of the shared Joi contracts we have now are "leaf nodes" – a string that could be this value or that value, a positive integer, as you say, etc. I'm cool with continuing to use those for leaf nodes that get repeated. It saves typing and improves readability. Many of these are also for shared aspects of the Shields interface (i.e. the output of our metric() or file size functions).

For higher level contracts – structural things like arrays of certain length, choice of two options, and possibly even relationships between objects within the structure, if we re-wrap all of the functionality we need, that's a lot of interface to design and documentation to write.

If the problem is that we don't like the Joi documentation or primitives, maybe it would be better to shop around for another library. Or, we could use shared contracts for "leaf nodes" in common. I don't think it's practical to abstract away Joi without creating a lot of extra work for ourselves.

Another option would be to write a tutorial for writing simple contracts with Joi. That could ease the learning curve for contract newbies, and make for a nicer transition into the Joi API docs.

# Conflicts:
#	lib/all-badge-examples.js
# Conflicts:
#	lib/all-badge-examples.js
#	server.js
These are now covered by the test on the base service.
Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it has taken a while to give this another read-though. Its quite a big refactor so it requires carving out a bit of time to look over it properly. I've left a few more comments but this is heading in a great direction.

NpmVersion,
NpmLicense,
NpmTypeDefinitions,
].concat(NpmDownloads);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can now be removed in favour of renaming the files as per #1797.
Similarly /node/node.js wants to be /node/node.service.js.

}
}

async _requestJson({ schema, url, options = {}, notFoundMessage }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth explicitly checking schema has been passed here? If we pass undefined to Joi.validate(), it will throw AssertionError [ERR_ASSERTION]: Invalid schema content: but I wonder if it is worth throwing a 'friendlier' error in that case? I can see that being a common pitfall for new contributors. Would another way to 'enforce' this be for schema to be a class property instead of a function param. e.g:

static schema() {
  throw new Error('must define schema');
}

Also, it seems like there will be a certain amount of cases where we do actually want to pass Joi.any() (it looks like that is what you are doing for NpmLicense and NpmTypeDefinitions in this PR). We definitely don't want that to be the default case but is it worth providing some kind of 'shorthand' for that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, yea, I had the schema in as a class property before (static get responseSchema) but ended up making it a parameter. The schema compilation step is slow, so we don't want to compile a new schema every time we have a new request. Since we don't have class properties yet, that means we need to memoize it somehow, or else define it outside the class. Passing it in the call to _requestJson seems like less typing, is just as clear, and is possible to use conditionally.

Joi.any() seems like a good shorthand to me. Were you thinking something like shouldValidate: false? I'd be okay with that. However I don't want contributors to think that validating is optional. 😉 Maybe not having a shortcut is okay.

I'm good with adding a check with a better error message. Probably we should check that it's a Joi schema. I think Joi supports compiling from object literals, though constructing them all literally is self-documenting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema compilation step is slow, so we don't want to compile a new schema every time we have a new request.

We've got enough performance problems at the moment. Lets not make that any worse ;)

Maybe not having a shortcut is okay.

OK. Lets ignore that for now. Once we've refactored some more stuff we'll have an idea of how common this case is going to be.

I'm good with adding a check with a better error message. Probably we should check that it's a Joi schema.

Yeah, lets ensure that schema exists and is an instance of Joi and throw an error which provides a hint if not.

services/base.js Outdated
@@ -23,6 +25,12 @@ class BaseService {
this._handleInternalErrors = handleInternalErrors;
}

static render(props, namedParams, queryParams) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of when we need to pass namedParams, queryParams to render()? I don't think they get used anywhere in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number formatting, or work like in #1321 is an example. Would it be better to pass that stuff through props?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how if we went with the 'framework' approach, it would be useful to make these available here, but given we have to manually implement render(), construct all the stuff we want to pass it and call it ourselves, I think we should probably just explicitly pass what we need into props.

@chris48s
Copy link
Member

chris48s commented Aug 4, 2018

Rather than make fetch–transform–render into a framework, make it a design pattern. That gives us the benefits of a clear structure, including testability, with less magic.

I think this is a good way to look at it. One of the things I've found helpful as part of the review is to also look also look at what happens if we refactor a very simple service. I think this latest refactor strikes a good balance between giving us the structure to handle complex cases like this but without over-complicating very simple cases. Also it means if we are just doing something non-standard it is easier to deal with.

For higher level contracts – structural things like arrays of certain length, choice of two options, and possibly even relationships between objects within the structure, if we re-wrap all of the functionality we need, that's a lot of interface to design and documentation to write.

I think its fine for this to 'evolve' in the same way service test validators have. I wouldn't be in favour of building a complete wrapper or inner platform over Joi, but if we find we are looking for the same patterns across multiple badges or services, we can extract that pattern and wrap it. I think it is inevitable that there will be certain checks that do fit this pattern so probably worth thinking about where those should live.

@paulmelnikow
Copy link
Member Author

Thanks for sharing that link!

@paulmelnikow
Copy link
Member Author

Should we drop empty contracts into the remaining services and address those in follow-on PRs?

@chris48s
Copy link
Member

chris48s commented Aug 4, 2018

Should we drop empty contracts into the remaining services and address those in follow-on PRs?

Yep - lets not increase the scope of this PR too much. If we do the minimum required to get the other new-style services (APM, Appveyor, Cdnjs, Clojars, Ruby Gems) working with the new base class structure here then we can look at improving those as another job.

@paulmelnikow
Copy link
Member Author

Sounds great. I'll address two remaining comments (schema instanceof Joi; pass only props to render), shim the new-style services, and aim to get this in quickly.

@paulmelnikow paulmelnikow changed the title [RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] and [node] badges [RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] and [node] badges [apm appveyor cdnjs clojars gem] Aug 7, 2018
@paulmelnikow paulmelnikow changed the title [RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] and [node] badges [apm appveyor cdnjs clojars gem] [RFC] Add Joi-based request validation to BaseJsonService and rewrite [NPM] and [node] badges; also affects [apm appveyor cdnjs clojars gem] Aug 7, 2018
@paulmelnikow
Copy link
Member Author

Addressed remaining items. Should be good for a last review! @chris48s

@paulmelnikow
Copy link
Member Author

Hey @chris48s, would you like me to wait until you have a chance to review again, or shall I go ahead and merge this?

Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a quick look over the latest commits and I reckon this is good to go 👍 . Lets get this merged. As next steps, I will have a look at fleshing out the validation on the other services, revisit #1740 and continue to refactor some more services in this style.

@paulmelnikow
Copy link
Member Author

Great. Thanks for reviewing!

I’ll tackle some service refactoring too. Let’s coordinate in #1358 so we don’t duplicate work!

@paulmelnikow paulmelnikow merged commit e3b1005 into badges:master Aug 8, 2018
@shields-deployment
Copy link

This pull request was merged to master branch. Now this change is waiting for deployment.
Deploys usually happen every few weeks. After deployment changes are copied to gh-pages branch.

This badge displays deployment status:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Server, BaseService, GitHub auth, Shared helpers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants