-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add built-in funtion to support JSON Schema Validation #1449
Comments
😄 |
There is an interesting library for it: https://github.com/xeipuuv/gojsonschema |
@repenno yes - this is the library i modelled the "result" above off of.
|
The feature itself needs some discussion but ok. The issue would be the vendoring of the library. If @tsandall is okay with the dependency it should be ok to implement. |
We've talked about adding support for input (and contextual data) schema validation for a while. It's probably time to start putting a plan in place. Before we commit to adding a JSON Schema validation built-in function, we need to figure out (a) how it would relate to the existing type inferencing & checking we do for virtual documents and (b) what syntactic changes we could introduce to elevate schema to a first-class feature. /cc @timothyhinrichs |
Some thoughts on schema checking, as a first-class citizen. (Potentially complementary to the builtin approach.) Declaring schema for input Schemas can be defined as simply objects under /data, so if you want to use Rego to define that schema (e.g. sharing parts across multiple schemas) you can. (Not sure if JSON-schema already allows this.) Then annotate rego with the proper schema. Here we're considering schema for just the Examples from k8s use-case, but applicable to all use cases. There are different levels of granularity at which you might apply schema. The most important seems to be Rule-level schema.
Multiple Schemas
On the other hand, this will happen most often at the helper-level, and type/schema-inference from upper-level decisions may handle this anyway. Typically every top-level decision (or at least every top-level rule) has one kind of input it handles. Functionality
Roadmap
|
With JSON schema support now in OPA, adding a built-in to do this on arbitrary data seems a lot more viable, and might be an alternative approach to do on-the-fly validation of |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. |
Ok, let me copy my message with explanation from #5417 :) Hi! Let me explain our case. So, next points are important for us:
And, I can image and implement one of next contracts:
Of course, we can follow classical way with 2 built-ins Can I have any advice? Thanks! |
As for performance, I can imagine this would be another built-in that would benefit from caching, in order to avoid having to re-parse the schema with each request. @philipaconrad was that done for some other similar built-in recently, or was there just talk about it? I can't remember 😅 |
@anderseknert the issue you're thinking about is probably #5377. |
Indeed, thanks @srenatus 👍 Let's keep this one in mind for that enhancement as well. |
So the function we ok with is |
Returning null or undefined to signal sucess doesn't align well with the rest of the built-in functions, IMO. I haven't thought too much about it, but I'd probably rather return a set of errors, and have an empty set signal success. I guess we can think more about it as you work on it, as that should be a minor detail to update later, or so I'd assume. Also, I'd prefer the |
As for where to start, see the docs on contributing, contributing code, development and adding built-in functions. |
Well, what's our decision? It's ok if I implement next built-ins?
Next, I will be grateful for examples with OPA built-ins shared cache. Thanks! |
Strings have no structure. So, the callers of this method would have to use the error strings as-is. I could imagine having a set of objects instead, pointing to a part of the schema or doc...? But that depends on what the underlying json schema library gives us, and on what a caller actually needs. You've clearly got a use case in mind here -- is it really sufficient for you to have an array of error messages? 💭 Array or set? WDYT?
There are two relevant caches in OPA:
However, using the inter-query cache for this has no precedent, and is still something we're trying to figure out. (That was in the context of the graphql discussion.) We should also check if the underlying jsonschema lib already does some sort of caching on its own. |
Ok, list of objects instead of strings looks like a better decision. Above proposal with gojsonschema looks like fine lib for me. allow {
jsonschema.is_valid(data.schema)
# ...
errors := jsonschema.validate(data.schema, input.document)
# [
# {
# "error": "vegetables.0: veggieName is required",
# "type": "required",
# "field": "vegetables.0",
# "desc": "veggieName is required"
# },
# ...
# ]
count(errors) == 0
# ...
} What do you think? |
This looks good to me -- but ultimately, you're the ones who'd like to use it. 😎 I assume it would fit to your requirements? |
Exactly. We use OPA to check different user's json files and in case of errors, return understandable error message for the user. |
@anderseknert any objections from your end? ☝️ |
LGTM 👍 |
This would be awesome. Spot on |
4 years later 😅 But I'm happy to see this fixed in #5486 Will be included in the next OPA release (v0.50.0) 🎉 |
Desired Feature
Validate input JSON document against a set of pre-configured schemas.
The interface I propose is simply a new built in function.
result := schema.validate(<uri to schema>, <data-to-validate>)
the result should be
The text was updated successfully, but these errors were encountered: