-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernelspec JSON schema #105
Conversation
86ef612
to
1765886
Compare
kernelspec-spec/kernelspec-spec.md
Outdated
to [the current description of the kernelspec](https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs), | ||
and adds an optional `protocol_version` field. | ||
|
||
[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a repo-per-spec might be overkill. However, I would like to have some separation of concerns. The repo for the kernel protocol will also hold the roadmap and additional documentation about the "release" process of the protocol. Also the kernel protocol and the widget protocol are somehow orthogonal, I would find it confusing to have them in the same repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.
I would agree with this. I think it would be more sustainable and easier for folks to follow if we had a single "Jupyter specs" repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we gather specs by topics or "thematics"? Specifications for the kernel protocol, the kernelspec and the connection file could live together in the kernel_protocol
repo. Everything related to the widgets specification (for instance) could live in another one. This way we avoid the multiplication of specs repos, while keeping some separation of concerns. Also this would be consistent with the current situation: a kernel has to implement the kernel protocol, provide a kernel spec and handle a conection file, while supporting the widget protocol is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, widgets are maybe an exception... but the knowledge of how comm
works is not. At the end of the chain, something like the Jupyter Server API references almost everything each other... kernelspec just happens to be very low in the order, with only perhaps a known-mimetypes.json
below it, as that is not a schema-level construct like uri
.
When changes occur, often these things will reference each other, and documentation and generated type packages should reflect the compatibility of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How comm works is part of the Kernel protocol, thus I agree the Widgets protocol references the Kernel protocol, but not vice versa. And I took the Widget protocol as an example, but we can consider other things, like the LSP, which is totally unrelated. I see this as a software distribution, we have low layers upon which are built upper layers, and there should not be circular dependencies. The Jupyter Server API you mentioned is a super high layer that references everything, however lower layers should not reference it.
"title": "Kernelspec", | ||
"description": "Specification of the kernelspec file", | ||
"type": "object", | ||
"properties": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend some top-level definitions
to make individual pieces easier to reference in other schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find a way of gathering pieces, they all appear independent to me. Having a definition per field would just add complexity and not ease their reference in other schemaa, right? Sorry for the naive question, but I'm not used to JSON schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernelspec, and specific fields within it, need to be referenced by other schema.
Have a look at, e.g. nbformat. If the kernelspec is formalized, then conventionally specs would like to point to specific meanings without pointing at a concrete locations:
$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/definitions/display-name"
vs
$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/properties/display_name"
This becomes more relevant, once more complex patterns are used such as additionalProperties
or patternProperties
(useful for e.g. mimetypes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation, I will try to make the appropriate changes.
}, | ||
"metadata": { | ||
"description": "A dictionary of additional attributes about this kernel; used by clients to aid in kernel selection. Metadata added here should be namespaced for the tool reading and writing that metadata.", | ||
"type": "object" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is not widely-used yet, could this already be further constrained to enforce namespacing:
[definitions.metadata]
type = "object"
[definitions.metadata.additionalProperties]
type = "object"
kernelspec-spec/kernelspec-spec.md
Outdated
and adds an optional `protocol_version` field. | ||
|
||
[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of | ||
the kernelspec has been created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Notebook format, precisely nbformat v4, in metadata there is a kernelspec
property, with schema defined here. Previously (in nbformat v3) this was named as kernel_info
(side note: format description is outdated as it still mentions kernel_info
).
Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat
v4 language
is not required.
Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)?
I don't know actually, would it make sense to store the arguments used to launch the kernel in the nbformat for instance? If all the mandatory fields of the kernel spec make sense for the notebook format, then we should probably update it so that it refers this new kernelspec. Otherwise, maybe the name could be reverted to kernel_info
or something else to avoid confusion? But that would be out of the scope of this JEP.
Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?
The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "kernelspec" entry in nbformat.v4 is a misnomer. It is not a kernelspec—sure, it shares some of the same fields, but there is a subtle difference here.
The kernelspec file does not have a "name" field. The (locally) unique name for a kernelspec comes from the directory on disk where the kernel.json file is located—it is not stored in the kernelspec file.
In the nbformat, this "name" field is necessary to locate the kernelspec on disk (this has always been brittle, because it means this information becomes useless the moment the notebook leaves the current context).
It really should be called something like "kernelspec_info".
Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.
This would be great for notebooks that always run in the same Python/conda/virtual environment (this field isn't helpful for notebooks that move around, but that's a separate issue). I would be in favor of changing this field, but I think this is out-of-scope for this JEP. We should make a follow-up JEP to update nbformat if we'd like to pursue that.
Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?
The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.
I believe this is out-of-scope for the current JEP.
That said, they aren't entirely orthogonal. Yes, the kernelspec shouldn't depend on the nbformat, but the nbformat loosely depends on the kernelspec today. I think that's what @krassowski is alluding to... if we can drop nbformat's current conflicting definition of the kernelspec and point at this definition, we gain some standardization.
}, | ||
"kernel_protocol_version": { | ||
"description": "The version of protocol this kernel implements. If not specified, the client will assume the version is <5.5 until it can get it via the kernel_info request.", | ||
"type": "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this stays, should have some pattern. but as mentioned, hoisting this to and $id
might be more robust over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be included in the first version of the kernelspec.
I believe the first published version should only include the spec currently documented in the protocol documentation.
Then, we can quickly publish a second version with kernel_protocol_version added as a key. That way, we don't immediately invalidate everyone kernelspecs.
Also, creates a circular dependency on JEP #66.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way, we don't immediately invalidate everyone kernelspecs.
I realize this actually isn't true, since "kernel_protocol_version" wouldn't be a required field. That said, we still would need to settle the naming in #66 to merge this. I think "kernel_protocol_version" is a fine name for the field, so maybe this isn't any issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's say that #66 depends on this one and will be updated accordingly (if needed) when this one is accepted.
Thank you for doing this @JohanMabille! You beat me to it. I had spent some time yesterday writing the exact same JEP as an action item from the SSC meeting. 😅 |
@Zsailer arf sorry for that! This came out when discussing the Handshake pattern, we need an additional field in the kernelspec and I thought it could be the opportunity to add a schema for it. I'll be more careful next time to avoid doing the work twice! |
No no no—this is fantastic! I really appreciate you taking this over! No need to apologize 😃 |
I think it's worth saying for voting purposes—I'm 👍 for this JEP getting accepted. We can continue to copy-edit, but wanted to explicitly submit my support. 😃 |
More changes to come, but I need to educate myself more to JSON schemas 😅 |
4ae0f81
to
707e492
Compare
- [ ] BLD,SCH: CI task to run e.g. js2shacl to generate W3C SHACL from the
kernelspec JSON schema
https://github.com/ThiagoCComelli/JS2SHACL-JSON-Schema-to-SHACL-conversor/blob/master/conversor.js
IDK why SHACL also yet, but there's a converter
…On Mon, Nov 20, 2023, 10:03 AM Frédéric Collonval ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In kernelspec-spec/kernelspec.schema.json
<#105 (comment)>
:
> + "required": ["language"],
+ "properties": {
+ "language": {
+ "description": "The name of the language of the kernel. When loading notebooks, if no matching kernelspec key (may differ across machines) is found, a kernel with a matching language will be used. This allows a notebook written on any Python or Julia kernel to be properly associated with the user’s Python or Julia kernel, even if they aren’t listed under the same name as the author’s.",
+ "type": "string"
+ }
+ }
+ },
+ "kernel_protocol_version": {
+ "type": "object",
+ "required": ["kernel_protocol_version"],
+ "properties": {
+ "kernel_protocol_version": {
+ "description": "The version of protocol this kernel implements. If not specified, the client will assume the version is <5.5 until it can get it via the kernel_info request. The kernel protocol uses semantic versioning (SemVer).",
+ "type": "string",
+ "pattern": "^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"
⬇️ Suggested change
- "pattern": "^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"
+ "pattern": "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$"
—
Reply to this email directly, view it on GitHub
<#105 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMNS7LNV3B4W35OUUIJZDYFNWNVAVCNFSM6AAAAAAXEDPL7CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTONBQGAZTINJWGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Co-authored-by: Jason Grout <jasongrout@users.noreply.github.com>
Co-authored-by: Frédéric Collonval <fcollonval@gmail.com>
Kind ping on this, I think this is ready for a final review:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made one small suggestion on minItems, but 👍
The JEP has been approved. Yes: 9 Merging |
Kernelspec JSON schema
This proposes to specify the kernelspec through a JSON schema, and have the specification and documentation live in a dedicated repo.
The JSON schema is joined as a standalone file.
Voting from @jupyter/software-steering-council
Like #106 , let's move to the voting phase. As said in this comment, future schema should not be JEPs, but PR open to the Jupyter Schema repo