Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fastjsonschema for json schema validation #64

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

kiendang
Copy link
Member

@kiendang kiendang commented May 27, 2021

Add default option to use fastjsonschema to validate events. fastjsonschema uses code gen and thus should speed up validation since we don't need to traverse the schema again every time we validate an event, and the event json traversal/validation would be a lot faster too. Will add benchmarks soon.

This is relevant for #59 where there is performance concern about running the validation twice.

Inspired by
https://github.com/jupyter/nbformat/blob/dd4725fb41515c69bc76e1c7f32f3ede0075725f/nbformat/json_compat.py

@kiendang kiendang marked this pull request as draft May 27, 2021 05:29
@kiendang kiendang added this to the 0.2 milestone May 27, 2021
@Zsailer Zsailer marked this pull request as ready for review June 1, 2021 16:37
@Zsailer Zsailer marked this pull request as draft June 1, 2021 16:38
@Zsailer
Copy link
Member

Zsailer commented Jun 1, 2021

@kiendang, are you still working on this PR? This looks good to me. Great work here!

Comment on lines +70 to +78
json_validator = CaselessStrEnum(
JSON_SCHEMA_VALIDATORS.keys(),
'fastjsonschema',
help="""
JSON schema validation implementation.

Valid choices are `jsonschema` and `fastjsonschema`.
"""
).tag(allowed_none=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zsailer Should we allow users to subclass here? They might want to customize the default validator or use their own. Or we can compromise and allow them to past a class name but restricted to only JSONSchemaValidator and FastJSONSchemaValidator and later can consider passing any class if someone asks about that use case. Is there any downside to allowing subclassing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don’t see any downside to allowing custom validators by subclassing EventSchemaValidator. We can make this trait a Type(default_value='FastJSONSchemaValidator', klass='EventSchemaValidator').

@kiendang
Copy link
Member Author

kiendang commented Jun 2, 2021

@kiendang, are you still working on this PR? This looks good to me. Great work here!

Thanks, I'm done. Just that one comment.

I'm done for now except for that one comment. But I converted to draft again since I want to think a bit about how this integrates with the category filtering, especially in the future where draft-2019/draft-2020 is supported.

@kiendang kiendang marked this pull request as ready for review June 2, 2021 03:28
@kiendang kiendang marked this pull request as draft June 2, 2021 08:06
@Zsailer
Copy link
Member

Zsailer commented Jun 2, 2021

Great. Ping me here when you’re ready for a final review.

@kiendang
Copy link
Member Author

kiendang commented Sep 8, 2021

Sorry for not getting back on this sooner.

jsonschema just added support for draft 2020-12 python-jsonschema/jsonschema#817. However I'm not sure if they have error formatting/annotation collection which is what we need.
I'm less enthusiastic about fastjsonschema now since it looks like the library is not being regularly maintained now, and if jsonschema has what we need then it's not necessary.
As mentioned in #66 jschon seems to have exactly what we need (draft 2020-12 and annotation collection), but it is a new library.

My proposal is we defer making decision on this for a bit longer since this is "only" a performance thing. Let me look into jsonschema more. Meanwhile I'll figure out #61, after that we can cut a new release and merge jupyter-server/jupyter_server#364.

@Zsailer
Copy link
Member

Zsailer commented Oct 5, 2021

Thanks, @kiendang. How is it going with #61? I'm diving back into this and would love to see telemetry land in jupyter-server soon. I'm happy to help with anything.

@kiendang
Copy link
Member Author

I do have a WIP on that. Will find time to finish it somewhere next week. I'd love to see telemetry integrated in too.

@yuvipanda
Copy link
Collaborator

I wanted to chime in and say that another reason to be interested in fastjsonschema is that it removes a C dependency! jsonschema now depends on https://pypi.org/project/pyrsistent/ which is not pure python, and since nbformat and friends use it the jupyter stack is not easily installable on pyiodide now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants