Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add built-in funtion to support JSON Schema Validation #1449

Closed
rbuckland opened this issue May 23, 2019 · 24 comments
Closed

Add built-in funtion to support JSON Schema Validation #1449

rbuckland opened this issue May 23, 2019 · 24 comments

Comments

@rbuckland
Copy link

Desired Feature

Validate input JSON document against a set of pre-configured schemas.

  • json-schema draft 4 and 7 support
  • json-schema URI to be "local" or remote

The interface I propose is simply a new built in function.

result := schema.validate(<uri to schema>, <data-to-validate>)

the result should be

{ 
   valid: true|false,
   errors: [ 
      ..errors... if it was not valid
    ]
}
@srenatus
Copy link
Contributor

😄 s/Ass/Add/, I guess? (PR title)

@rbuckland rbuckland changed the title Ass support JSON Schema Validation Add support JSON Schema Validation May 23, 2019
@BenderScript
Copy link
Contributor

There is an interesting library for it: https://github.com/xeipuuv/gojsonschema

@rbuckland
Copy link
Author

@repenno yes - this is the library i modelled the "result" above off of.
Three elements for the schema library to support (use) are

  • License compatibility
  • JSON Schema Spec support (eg: draft-04, draft-06 draft-07) etc
  • Active library development (community, supporting entities etc)

@BenderScript
Copy link
Contributor

The feature itself needs some discussion but ok. The issue would be the vendoring of the library. If @tsandall is okay with the dependency it should be ok to implement.

@tsandall
Copy link
Member

We've talked about adding support for input (and contextual data) schema validation for a while. It's probably time to start putting a plan in place.

Before we commit to adding a JSON Schema validation built-in function, we need to figure out (a) how it would relate to the existing type inferencing & checking we do for virtual documents and (b) what syntactic changes we could introduce to elevate schema to a first-class feature.

/cc @timothyhinrichs

@timothyhinrichs
Copy link
Member

timothyhinrichs commented Jun 22, 2019

Some thoughts on schema checking, as a first-class citizen. (Potentially complementary to the builtin approach.)

Declaring schema for input

Schemas can be defined as simply objects under /data, so if you want to use Rego to define that schema (e.g. sharing parts across multiple schemas) you can. (Not sure if JSON-schema already allows this.) Then annotate rego with the proper schema. Here we're considering schema for just the input object.

Examples from k8s use-case, but applicable to all use cases. There are different levels of granularity at which you might apply schema. The most important seems to be Rule-level schema.

  1. Rule-level schema
    This is important for authz cases because there are often multiple rules with allow/deny.
@ruleschema = data.schemas.io.k8s.v1.pod
deny[msg] {
	input.request.kind.kind == “Pod”
	input.request.object.spec.containers[_].image == “nginx”
}  
  1. Document-level schema
@docschema = data.schemas.bar
foo {
	input.a.b
}
foo {
	input.a.b
}

  1. Module-level schema and package level schema are floating in the file. Could have keywords like moduleschema and packageschema; or make them look like annotations for consistency and apply them possibly to the 'package' directive.
@moduleschema = data.schema.bar
@packageschema = data.schema.qux

Multiple Schemas
People often introduce abstractions where input may have multiple different schemas. We could consider allowing multiple schema declarations.

@docschema=data.k8s.workload.daemonset
@docschema=data.k8s.workload.deployment
container[x] {
	x := input.request.spec.template.spec.containers[_]
}

On the other hand, this will happen most often at the helper-level, and type/schema-inference from upper-level decisions may handle this anyway. Typically every top-level decision (or at least every top-level rule) has one kind of input it handles.

Functionality

  • Schema failures could be compile-time failures. The downside would be that you couldn't load/run policies on their own without also having the schemas, which makes sharing difficult. Could have a strict/relaxed mode.
  • Schema failures could be compile-time warnings with an evaluation flag indicating they should turned into errors.
  • Schema failures could be checked at eval time when an input is provided; if none of the decision rules match that schema, the system could return undefined immediately.

Roadmap

  • I'd think about just supporting rule-level schema to start. We could even just use the keyword @Schema and introduce @docschema and @packageschema later if required.
  • Suggestions for alternative syntax other than annotation are worth considering. We don't have annotations today. Perhaps
schema data.foo.bar 
deny[msg] { ... }

@anderseknert
Copy link
Member

With JSON schema support now in OPA, adding a built-in to do this on arbitrary data seems a lot more viable, and might be an alternative approach to do on-the-fly validation of input or data objects as well 🤔

@stale
Copy link

stale bot commented Nov 22, 2021

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@jkulvich
Copy link
Contributor

jkulvich commented Nov 28, 2022

Ok, let me copy my message with explanation from #5417 :)


Hi! Let me explain our case.
We would like to validate some JSON object with JSON schema, and have an error if occurred.
Both these JSONs (schema and doc) passes with input or data.

So, next points are important for us:

  1. Validation flag.
  2. Validation error.
  3. Performance (single function call for both values above).

And, I can image and implement one of next contracts:

  1. jsonschema.match(schema, doc): string | nil - Just like regexp.match. In this case we can use a nil string as a success validation flag.
  2. jsonschema.validate(schema, doc): { valid: bool, error: string | nil } - More comfortable way to get only what we need with single built-in call. But I couldn't find any implemented built-in with similar return signature, and I'm not sure that this is acceptable by OPA conventions.

Of course, we can follow classical way with 2 built-ins .is_valid(): bool, and .validate(): string. But performance.

Can I have any advice? Thanks!

@anderseknert
Copy link
Member

As for performance, I can imagine this would be another built-in that would benefit from caching, in order to avoid having to re-parse the schema with each request. @philipaconrad was that done for some other similar built-in recently, or was there just talk about it? I can't remember 😅

@srenatus
Copy link
Contributor

@anderseknert the issue you're thinking about is probably #5377.

@anderseknert
Copy link
Member

Indeed, thanks @srenatus 👍 Let's keep this one in mind for that enhancement as well.

@olegroom
Copy link

So the function we ok with is jsonschema.match(schema, doc): string | nil
Now, can we start develop it and make a pull request?

@anderseknert
Copy link
Member

Returning null or undefined to signal sucess doesn't align well with the rest of the built-in functions, IMO. I haven't thought too much about it, but I'd probably rather return a set of errors, and have an empty set signal success. I guess we can think more about it as you work on it, as that should be a minor detail to update later, or so I'd assume.

Also, I'd prefer the jsonschema.validate name over jsonschema.match. A jsonschema.is_valid that took only the schema itself and checked it for validity sounds like it could be useful too.

@anderseknert
Copy link
Member

As for where to start, see the docs on contributing, contributing code, development and adding built-in functions.

@jkulvich
Copy link
Contributor

Well, what's our decision? It's ok if I implement next built-ins?

  • jsonschema.validate(schema, doc): []string // List of errors
  • jsonschema.is_valid(schema, doc): bool // OK or Not OK

Next, I will be grateful for examples with OPA built-ins shared cache. Thanks!

@srenatus
Copy link
Contributor

jsonschema.validate(schema, doc): []string // List of errors

Strings have no structure. So, the callers of this method would have to use the error strings as-is. I could imagine having a set of objects instead, pointing to a part of the schema or doc...? But that depends on what the underlying json schema library gives us, and on what a caller actually needs.

You've clearly got a use case in mind here -- is it really sufficient for you to have an array of error messages?

💭 Array or set? WDYT?

I will be grateful for examples with OPA built-ins shared cache.

There are two relevant caches in OPA:

  • intra-query cache: two calls in a single policy evaluation should not need two validation runs
  • inter-query cache: two calls in two subsequent policy evaluations should not need two validation runs
    both of these would only apply to the same schema input of course. Concretely, they would cache the parsed schema, and re-use that to speed up schema checking on subsequent calls.

However, using the inter-query cache for this has no precedent, and is still something we're trying to figure out. (That was in the context of the graphql discussion.)

We should also check if the underlying jsonschema lib already does some sort of caching on its own.

@jkulvich
Copy link
Contributor

Ok, list of objects instead of strings looks like a better decision.

Above proposal with gojsonschema looks like fine lib for me.
So, I made a bit example with possible fields which can be returned for validation built-in.
Thus, this is a built-in proposal usage example:

allow {
  jsonschema.is_valid(data.schema)
  # ...

  errors := jsonschema.validate(data.schema, input.document)
  # [
  #   {
  #     "error": "vegetables.0: veggieName is required",
  #     "type": "required",
  #     "field": "vegetables.0",
  #     "desc": "veggieName is required"
  #   },
  #   ...
  # ]

  count(errors) == 0
  # ...
}

What do you think?

@srenatus
Copy link
Contributor

What do you think?

This looks good to me -- but ultimately, you're the ones who'd like to use it. 😎 I assume it would fit to your requirements?

@olegroom
Copy link

olegroom commented Nov 29, 2022

I assume it would fit to your requirements?

Exactly. We use OPA to check different user's json files and in case of errors, return understandable error message for the user.

@srenatus
Copy link
Contributor

@anderseknert any objections from your end? ☝️

@anderseknert
Copy link
Member

LGTM 👍

@rbuckland
Copy link
Author

This would be awesome. Spot on

@anderseknert
Copy link
Member

4 years later 😅 But I'm happy to see this fixed in #5486

Will be included in the next OPA release (v0.50.0) 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

8 participants