-
-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harmonize annotations with python/typeshed#8608 #1027
base: main
Are you sure you want to change the base?
Conversation
python/typeshed#8608 introduced annotations for `create` which are not fully reflected here. In order to reflect that state into `jsonschema`, a new module, `jsonschema._typing` is introduced. The purpose of `_typing` is to be a singular place for the library to define type aliases and any typing-related utilities which may be needed. This will let us use aliases like `_typing.JsonValue` in many locations where any valid JSON datatype is accepted. The definitions can be refined over time as necessary. Initial definitions in `_typing` are: - Schema (any JSON object) - JsonObject (any JSON object) - JsonValue (any JSON value, including objects or sequences) `Schema` is just another name for `JsonObject`. Perhaps it is not needed, but the name may help clarify things to a reader. It is not obvious at present whether or not it is a good or bad idea to notate it as such, but a similar Schema alias is defined in typeshed and seems to be working there to annotate things accurately. These types are using `Mapping` and `Sequence` rather than `dict` or `list`. The rationale is that `jsonschema`'s logic does not dictate that the objects used *must* be defined in stdlib types or subtypes thereof. For example, a `collections.UserDict` could be used and should be accepted by the library (`UserDict` wraps but does not inherit from `dict`.)
Thanks! No worries on the time shortage, I hear you! Very much appreciate the PR, this is a great plan overall. Will leave some minor comments. (There's also #1022 from @DanielNoord outstanding to look at, which is driven by typing-related concerns, just in case you haven't seen it). |
@@ -0,0 +1,14 @@ | |||
""" | |||
Type aliases and utilities to help manage `typing` usage and type annotations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Definitely support adding this, thanks!)
from collections.abc import Mapping, Sequence | ||
import typing | ||
|
||
Schema = Mapping[str, typing.Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema is tricky to nail down for multiple reasons. Its precise type depends on the specific dialect of JSON Schema involved, and (speaking off the top of my head without double checking, which counts for lots of the comments I'm about to make --)
applicable_validators
means that technically its type is literallyAny
(or some dependent type), because someone can decide they're inventing a JSON Schema dialect where the number17
is a schema and means something, andapplicable_validators
will give the library what it needs to deal with that as a schema. I'm honestly not sure what to do about that.- Ignoring the above, the "internal" type to all JSON Schema "official" drafts is -- for later drafts,
bool | Mapping[str, typing.Any]
, and for earlier onesMapping[str, typing.Any]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely going to have to circle back to try to better grasp what's sensible here.
In line with my other comment about maybe accepting some technical inaccuracy in the annotations as a pragmatic approach, I'm instinctively inclined towards bool | Mapping[...]
.
|
||
JsonObject = Mapping[str, typing.Any] | ||
|
||
JsonValue = typing.Union[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incomplete -- someone can come along and invent validator callbacks which only work if someone deserialized into Decimal
, and as long as they ensure they've done so in their loading of JSON, their code works. In other words, I'm somewhat afraid this is really just Any
in "real life".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might have a tension here between annotations which are useful vs ones which are completely accurate.
If we annotate things as Any
, we ensure that nothing which is correct at runtime would be flagged by a type checker. But I'm concerned that making this accurate with Any
will also mean that we can't catch common errors.
My example would be some flask-like framework, and checking validate(request)
vs validate(request.json)
.
Is it okay to suggest that anyone using type checking needs to use typing.cast
or type: ignore
in these cases?
The type really is Any
for a lot of methods like json.load
, but I hope my desire to provide a more tightly scoped type is understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to think about what I think honestly.
To me, all I am "convinced" is useful of at the minute (and I'm quite convinced it's very useful indeed) is that typing forces documentation to have correctly documented types. That's the reason I'm quite picky about making sure that things which work "accidentally" because we abuse some type internally do not in fact get represented as the type -- because later down the line I will not support those internal abuses if we need to change them for whatever reason. They're not part of the public API, they don't have stability guarantees.
Here, the opposite is true -- not documenting the type here actually means "regressing" something that is indeed public API, and which I very much do intend folks to rely on (by which I don't mean I have no reservations that it will bite me in the ass, but I do mean it's something I think people should consider public API at this point, type annotation or not, and I don't intend to break it).
And I know we don't actually regress, since I'm pretty sure there are tests covering the Any
behavior at least in part -- but if the goal (at least the goal in my head) is "be able to say every object in jsonschema
has its type documented explicitly", it complicates that goal if the documentation again needs tweaking to broaden types into their real public API.
(So... "I don't know yet" :D)
@@ -114,13 +120,15 @@ def _store_schema_list(): | |||
|
|||
|
|||
def create( | |||
meta_schema, | |||
validators=(), | |||
meta_schema: _typing.Schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to double check (which I will be lazy and not do here) but Unlike _typing.Schema
, meta-schemas (in the library, but possibly also according to the spec, I forget) must be objects I think, not bool | thing
, so you'll need two types I suspect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. I need to think harder about what to do with schemas more broadly, but we now have a clean place where we can define _typing.MetaSchema
. I'll make that tweak.
meta_schema, | ||
validators=(), | ||
meta_schema: _typing.Schema, | ||
validators: Mapping[str, _ValidatorCallback] | tuple[()] = (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public API is just Mapping[str, _ValidatorCallback]
, not tuple. I know mypy or whatever is unhappy with that, so if that means just adding type: ignore
to get it to shut up about the default arg let's do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm cool with that way of handling these!
I had some other ideas which I like less, but another option just occured to me.
With _typing
we could do this:
# in the definition
validators: Mapping[str, _ValidatorCallback] = _typing.EmptySequence
# in _typing
EmptySequence = typing.cast(typing.Any, ())
Because Any
is a subtype of everything, the rules get wonky, and mypy
and other checkers should be okay with this.
That way, _typing.EmptySequence
can be used in any location where the empty tuple is used today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems also fine with me (although I think the fact that ()
works is more a quirk of dict
than a generic empty sequence type, but I'll leave pedantry out until/unless it comes back to rear its head I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it help that I also considered
EmptyMap = pyrsistent.freeze({})
?
😁
I just know that this pattern occurs several times in the code, with ()
as an empty default. So it seems like it might be nice to sweep any messiness associated with it into _typing
.
format_checker: _format.FormatChecker = _format.draft202012_format_checker, | ||
id_of: typing.Callable[[_typing.Schema], str] = _id_of, | ||
applicable_validators: typing.Callable[ | ||
[_typing.Schema], typing.Iterable[tuple[str, _ValidatorCallback]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This has to do with the above, the type for applicable_validators
is really Any -> tuple(...)
)
Nice to see you have found some more time this! One question though, why did you choose to make the module private? A lot of the typing ends up in the public API as well, so users would then need to import from a private module to make their own annotations be correct. |
I appreciate the question -- and it's a bit of a balancing act, IMO. For example, we have in this PR
being used in Is that particular type alias a public property of the library? I would say no -- and I think my inclination not to expose that particular type is agreeable to most developers. It's an internal convenience for authors here to be able to write things more clearly and succinctly, just like any other What about validate(typing.cast(jsonschema.typing.JsonValue, my_object)) But that grows the scope of this changeset! Suddenly, I'm not just trying to make some annotations accurate, but I'm also proposing a new public API which I hope that makes sense -- and I do intend to get you a good reply on some other threads (notably #1017)! But until I do, thanks for the interest in and contributions towards a well-typed |
I haven't had time to focus on
jsonschema
typing much recently, but I want to get things back on track. The first thing to do is to correct any drift betweenjsonschema
andtypeshed
.I'm going to try to produce a steady stream of very small PRs which get things back in sync.
This is therefore hopefully the first of a few changes.
python/typeshed#8608 introduced annotations for
create
which are not fully reflected here.In order to reflect that state into
jsonschema
, a new module,jsonschema._typing
is introduced. The purpose of_typing
is to be a singular place for the library to define type aliases and any typing-related utilities which may be needed. This will let us use aliases like_typing.JsonValue
in many locations where any valid JSON datatype is accepted.The definitions can be refined over time as necessary.
Initial definitions in
_typing
are:Schema
is just another name forJsonObject
. Perhaps it is not needed, but the name may help clarify things to a reader. It is not obvious at present whether or not it is a good or bad idea to notate it as such, but a similar Schema alias is defined in typeshed and seems to be working there to annotate things accurately.These types are using
Mapping
andSequence
rather thandict
orlist
. The rationale is thatjsonschema
's logic does not dictate that the objects used must be defined in stdlib types or subtypes thereof. For example, acollections.UserDict
could be used and should be accepted by the library (UserDict
wraps but does not inherit fromdict
.)📚 Documentation preview 📚: https://python-jsonschema--1027.org.readthedocs.build/en/1027/