-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Kernel Provisioning #608
Comments
Thx a lot @kevin-bates This is a great proposal taking into account backwards compatibly that was I think missing from the previous attempts. I am planning some time to review, comment and contribute on this. cc/ @goanpeca @Carreau |
@MSeal @captainsafia FYI |
@minrk @ellisonbg @rgbkrk, @takluyver too! |
cc/ @trallard FYI re: your new role at Quansight Labs |
Thanks for the ping, I will have a look. |
Great write-up @kevin-bates ! Thanks for putting that all together and incorporating feedback from prior kernel management change efforts here. I'll drop a few comments here if it's helpful. Overall I think this is a solid proposal to move things forward and unite some efforts in a backwards compatible manner.
Yeah I agree and should probably be abstract.
Perhaps with a warning echo'd? I'd also be ok with erroring here unless an override is passed in to ignore the kernel defined provisioner.
That's a really good way to introduce the change imo. Thanks for putting thought into this aspect of the change.adop
I think most private Notebook products subclass KernelManager if they use jupyter_client. So while there's not public visibility to that impact that might dictate that the change be associated with a major version bump so consumers in those corners can know they should look at what changed / is new and how to adapt custom extensions to the new class pattern here.
Maybe also the gitter -- some folks still only ask / monitor there. I think moving most of this PR description into the docs as a new-to-version-x section / page would be a good idea as well.
Naming is hard ... how would |
Thank you for your response and questions @MSeal - they are much appreciated. Here are some responses...
I was planning on logging an info message on each startup. Should the default provisioner be used because the kernelspec didn't specify a provisioner, I suppose we could add an indication of that. However, since that's the 90% use case because virtually every kernelspec will not specify a provisioner and there's nothing that should require folks to update their kernelspecs, I'm inclined to remain silent. I think it's important we maintain the status quo as much as we can. Even when functionalities like parameterized kernels and kernel metric gathering are in place, folks could still get the benefits - albeit things like rich parameterization would likely require subclassing of the default provisioner.
Good idea about gitter. I'll fire off a query or two shortly.
I like this name over both I hope to have a minimal DRAFT PR soon so things are a bit more tangible. Thanks again. |
👋🏻 I was notified about this via your message to the jupyter listserv, so great job on the wider communication :) This is a wonderful proposal. I'm one of the people who did some work on custom subclasses of KernelManager for my own project -- what I was/am trying to build is what I refer to as a "hydra" kernel, a very wacky thing, which essentially is a proxy Kernel that can spawn additional kernels on-demand. My use case is I am writing/attempting to write some capability whereby each code cell can execute on a different remote host (our users are CS experimenters who are orchestrating research experiments across multiple provisioned hosts.) From what I read here, this would be a welcome improvement as I'd have a more clear integration point when it comes to provisioning the kernel (I am targeting Ansible for this right now.) 👍🏻 |
Thank you Jason! This is exactly the kind of information that is extremely helpful. I see you're overriding |
Hey, thanks for opening this. I have reservations about extending the kernelspec with Python-specific information such as
By Python-specific, I mean that it is specific to For example, @JohanMabille is actively working on a new C++ client and Jupyter server, with the motivation of faster handling of websocket and zeromq messages by the server. It is very unlikely that his implementation will be able to make sense of I think that it makes a lot of sense for |
Hi @SylvainCorlay - thanks for raising your concern. It will be good to try to hash this out.
I recall this coming up in the handshaking proposal but never found this documented anywhere - although I completely understand your point. What is documented is this regarding the
So, although applications are free to ignore anything in the metadata stanza, applications are also free to add application-specific items. As a result, one solution would be to simply rename the stanza Another approach that can get the python-specific class name out of the Proposal Update: Unless strong objection, I would like to adopt this change for this proposal - replacing Please remember that the vast majority of kernelspecs will continue to not have an Looking forward, the kernel parameter schema will also be specific to the provisioner, not the kernel, since parameters need to span both, and provisioners are kernel-aware anyway. As such, I believe this same stanza should contain the parameter schema or a reference to it. |
Hey @kevin-bates sorry for the late reply, I thought I had done this already. Regarding your remark that parameterized kernels cannot be sustainably implemented, I would disagree with it.
|
Hi @SylvainCorlay - no worries.
I completely agree. We'd definitely want a schema that describes the available parameters pertaining to that kernel specification and the kernel start-request body would contain appropriate name/value pairs. This schema would identify required values, specify the enumerated types (where applicable) and include default values - in particular, for any required properties.
I think this is too restrictive. Not all parameters can be plugged nicely into a jinja template. In fact, I think we'll find that a majority of the parameters involved apply to the kernel's environment. Parameters like which docker/singularity image to use, what Kubernetes namespace to create, or how many workers to specify for your Spark-based kernel(driver). All of these are examples in which the desired parameter is not associated with the kernel but, instead, with the environment in which it will run. Without kernel environment provisioning, support for these kinds of parameters (which I would argue will be the majority of parameters) cannot be implemented in a sustainable manner. When we get to parameterization, I suspect we may need separate schemas - one that pertains directly to the kernel itself, in which command-line jinja and env values make sense because they are consumed directly by the kernel. But another pertaining to the provisioned environment in which the provisioner is responsible for interpreting and provisioning the desired environment in which the kernel will run. |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/subclasses-of-jupyter-client-manager-kernelmanager/7793/1 |
Hi, I'm encountering difficulties wrt b/c support for sub-classed Kernel Managers (although I suspect few are implemented). Given that, I think that Kernel Environment Provisioners should target a major release boundary (preferably 7.0), rather than a minor release. The release would include the following:
As far as backward compatibility support for the kernelspecs themselves, we could add a check in (Note: We'd also want to add the same kind of check in the non-async I would like to proceed on #612 with these changes in mind unless there are objections (both relative to the above text and the proposal in general). |
UpdatesRather than update the original description with changes, I thought I'd list them here to preserve the original proposal. (If others feel the issue's description should also hold the truth, I'm happy to apply these edits there.) Naming:
Release and implementation changes discussed in the previous comment still hold. |
This issue introduces a proposal named Kernel Provisioning. Its intent is to enable the ability for third-parties to provision the kernel's runtime environment within the current framework of jupyter_client's kernel discovery and lifecycle management.
Problem
The jupyter_client package currently provides a kernel manager class (
KernelManager
) to control the lifecycle of the kernel process. Lifecycle-action methods supported from a kernel manager include start_kernel, shutdown_kernel, interrupt_kernel, restart_kernel , and is_alive. All of these methods interact with the kernel process - which is aPopen
subprocess - to monitor and control its lifecycle. For example,start_kernel
creates thePopen
instance and stores that instance in the kernel manager'skernel
attribute.shutdown_kernel
is implemented to leveragePopen
'skill()
andterminate()
methods (depending on urgency).interrupt_kernel
callsPopen
'ssend_signal()
method (or sends a message if message-based interrupts are configured).is_alive
is based onPopen
'spoll()
method.restart_kernel
is a combination ofshtudown_kernel
andstart_kernel
.Today, applications that wish to launch kernels beyond those of a local
Popen
process (for example, into resource-managed clusters or leverage container-based environments) must instead implement their ownKernelManager
subclass. This introduces a number of issues:KernelManager
is an application-level class. That is, functionality related to the application - across all kernels - are implemented via the kernel manager. Applications such as Notebook extend this class to allow for activity monitoring functionality, for example.KernelManager
is an application-level class, such kernel manager implementations must be a subclass ofKernelManager
and are kernel-specification agnostic. That is, the same kernel manager instance must manage the lifecycles of Python, R, C++ kernels, as well as kernels launched into resource-managed clusters - which is not possible via aPopen
subprocess instance. However, support for the latter types of kernels requires interactions with more than just the kernel process. For example, kernel locations must be discovered within the resource-managed cluster using the resources manager's API and terminated in a similar manner - allowing the resource manager to release resources, update scheduling, etc (examples of such resource managers are Hadoop Yarn or Kubernetes). As a result, a single kernel manager cannot address the needs of the various configurations in which users want their kernels to operate.a) a given kernel manager instance cannot know about what parameters apply to all kernels and
b) a majority of kernel parameters affect the kernel's runtime environment and, therefore, must be applied prior to the kernel's actual launch.
In essence, what is needed is the ability to associate a kernel's lifecycle management to the kernel's specification, where its environment and parameters are defined, while leaving kernel manager implementations to be the responsibility of the application.
Proposed Enhancement
This proposal abstracts the kernel process layer within the existing
KernelManager
implementation thereby providing the ability to create custom kernel environments across all Jupyter applications that usejupyter_client
today.In today's implementation, the
Popen
instance is returned by theKernelManager
's_launch_kernel()
method. Upon return, the method sets the manager'skernel
attribute to thePopen
instance, after which all lifecycle-related methods will call through to interact with the kernel process.Instead, this proposal will introduce a layer or wrapper around the Popen instantiation such that this class instance (let's call it
PopenProvisioner
for now) will contain thePopen
instance and return itself from the_launch_kernel()
method. Because the method signatures of thePopenProvisioner
will be identical to those ofPopen
, the kernel's process management will operate just like today. (Note that Jupyter Enterprise Gateway takes this approach with its process proxies, but this solution is limited to the EG application not generally available to the ecosystem.)Of course,
PopenProvisioner
will derive from a base class that defines the various methods. These methods will look similar to the following:The class will also define other methods for its initialization, launch, cleanup, etc. In addition, these methods will be created with planned support for parameterized kernel launches - since, realistically speaking, a majority of parameters affect the kernel process's environment.
We can decide whether the base class should be abstract (probably) or not along with which methods are abstract themselves as we near implementation.
jupyter_client
will provide the defaultKernelProvisioner
implementation (e.g.,PopenProvisioner
) such that all existing kernels that do not specify a kernel provisioner will utilize an instance of the default class. In addition, this default will be configurable in case a given installation wishes to use a different provisioner for all kernels in which one is not currently specified.Discovery
As noted in the problem statement, we need the ability to associate a kernel's lifecycle management (i.e., its process abstraction instance) to the kernel's specification. It is not sufficient to rely on a single abstraction instance across all configured specifications. However, because this proposal should not affect existing installations using standard kernel specifications, this only becomes an issue when explicit abstractions (i.e., those not based on the default) are necessary.
To explicitly indicate a kernel environment provisioner, one would configure the corresponding kernel specification to include an
environment_provisioner
stanza within themetadata
stanza, similar to the following...The KernelManager instance, with access to the
KernelSpecManager
, will check for the existence of such a stanza and instantiate the class associated with that's stanza'sclass_name
entry. Should the stanza not exist, the default provisioner will be instantiated and used. Should the configured class name not be available, an exception will be raised, thereby failing the startup of the kernel. (I view this as better than deferring to the configured default provisioner since the specification's configuration stanza probably won't apply to that provisioner, etc.)The
config
stanza will be passed to the provisioner's initializer and consist of configuration settings pertaining to the provisioner and its subclasses. We should also leverage whatever config-related functionality traitlets provide (assuming provisioners are subclasses ofLoggingConfigurable
).Provisioner Responsibilites
Once launched, the kernel process's lifecycle-management will then be the responsibility of the instantiated provisioner. The provisioner will also be responsible for:
Impact on existing implementations
If no environment provisioners are configured, there is no impact on existing implementations. They will continue to work, just like today. The difference will be that when the appropriate version of
jupyter_client
is installed, interaction with the kernel's process will go through an additional (nearly pass-thru) layer.In addition, existing implementations will be able to leverage parameterized kernel launches, once available and, if kernel provisioners are configured, be able to leverage their offerings immediately.
When environment provisioners are configured, any kernel specifications they provide will be immediately available to applications.
No additional packages will be necessary - all functionality is baked into
jupyter_client
- and the previously installed KEP provisioning package.Existing KernelManager subclasses
By embracing
jupyter_client
and itsKernelManager
class, this proposal doesn't introduce any migration issues and most subclasses ofKernelManager
should continue to work. Note that someKernelManager
subclasses that completely override lifecycle-action methods will not be able to leverage this functionality - but that's their intent in the first place.What applications subclass
KernelManager
today? I know that Enterprise Gateway already provides its own process abstraction via a subclass ofKernelManager
, and will need to coordinate with appropriatejupyter_client
releases once implemented (but I have an inside scoop on that repo 😄 ).Should I post this question to the Jupyter Google Group, Discourse, anywhere else? I know that nb_conda_kernels subclasses
KernelSpecManager
- as well as others - but they still leverage jupyter_client'sKernelManager
directly - so they should not be an issue.Naming
Here are a few naming suggestions, some of which are more appropriate as a topic (e.g., provisioning) than an implementation (e.g., provider or provisioner).
Because this abstraction is contained within the existing
KernelManager
implementation, the Kernel in the name could be dropped as it's inferred.I prefer Environment Provisioning as a topic and Environment Provisioner as an implementation name but really have no strong affinity to either and am open to suggestions. The acronym
KEP
could be used for abbreviations where necessary (where the 'K' for Kernel makes the inference explicit).Alternate names for
PopenProvisioner
could be:JupyterClientProvisioner
orGenericProvisioner
. I suspect many custom provisioners will derive from this implementation.I've gone ahead and cc'd folks with which I've shared these ideas. Please feel free to add anyone else you think might be interested.
cc: @blink1073, @echarles, @lresende, @Zsailer
The text was updated successfully, but these errors were encountered: