-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at making RemoteKernelManager independent #810
Conversation
Thanks for submitting the pull request. I'd rather not take the trailet approach as this will break existing configurations. Is there an issue with a property/accessor approach?
Can you elaborate on this? Are you referring to
You're welcome to use something like this in your wrapper code, but probably doesn't belong in EG. So the idea would be to set a placeholder for I think the kernelspec issue you brought up on #803 is something we need to address before proceeding. |
I guess the only problem with the property approach is that it's a lot of repeated code. How does using traitlets like this break configuration? I'm not too familiar with traitlets.
The idea there would be to continue as is, removing parent and parent.parent items.
The problem being that the processproxy RKM uses is ignoring config from the kernelspec, right? |
One thing I don't like is calling out to uuid.uuid4, but I see that done in several other places as well. Do you think it would make sense to make a function for everything to use, or keep it here, since technically the way RKM sets kernel_id should be agnostic of everything else. |
Do you mind moving the conversation back to the issue? I'm getting scattered with a bunch of other stuff going on and I'm kinda wondering if we should try to jump on a web session to get on the same page here. Would you be available for something like that? If so, please drop me a note on my github email and I'll forward you a link. |
I was going to ask about switching back to the issue haha. I'll send you an email |
TODO: Fix tests, add tests, squash into nice commit |
OK got it to work with mixins. A tad hackish maybe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@golf-player - these changes look really good. I haven't taken things for a spin, but plan to once updates are completed. I really like the way this cleans up the readability despite the non-intuitive approach taken to accomplish that. Thank you.
@@ -56,7 +54,7 @@ | |||
}) | |||
|
|||
|
|||
class EnterpriseGatewayApp(JupyterApp): | |||
class EnterpriseGatewayApp(JupyterApp, EnterpriseGatewayConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that mixin-kinds of classes should be declared prior to the true superclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
|
||
|
||
class RemoteKernelManager(IOLoopKernelManager): | ||
class RemoteKernelManager(IOLoopKernelManager, EnterpriseGatewayConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous - declaration order.
@@ -224,6 +233,27 @@ def __init__(self, **kwargs): | |||
if hasattr(self, "cache_ports"): | |||
self.cache_ports = False | |||
|
|||
if not self.connection_file and not self.kernel_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe self.kernel_id
will always be None since this is the initializer and we set it to None above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I'll remove the line setting it to None since it doesn't really help.
@@ -423,3 +455,24 @@ def _get_process_proxy(self): | |||
format(self.kernel_spec.display_name, process_proxy_class_name)) | |||
process_proxy_class = import_item(process_proxy_class_name) | |||
self.process_proxy = process_proxy_class(kernel_manager=self, proxy_config=process_proxy_cfg.get('config')) | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a comment block somewhere (thinking prior to these properties) that talks about when self.parent
would not exist. This might warrant an entry in the docs as well - although I don't mind keeping this on the down-low for the time being. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Is there somewhere specific in the docs where something like this might go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay deferring this for now, although if we were to add it, the following locations might be worthy of updates:
- We should add an entry (or two?) to Use Cases
- Perhaps add the details to the end of the deployments section (following IBM Spectrum Conductor)
I'm trying to think of what we'd call the topic. The term "Library Mode" popped into my head just now. I'm not enamored with that, but I'd like the topic to convey a more direct capability that utilizes the heart of EG. "Integration Mode" - something that allows third-parties to use RMK directly. Ideas welcome!
The topic should include an example - much like the one you opened the issue with - based on the final results. (Btw, I'd like to have those details anyway so I could try things out against my Hadoop YARN cluster.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like library mode. Just anything that conveys the meaning that this is JEG but without the gateway webapp.
Standalone Remote Kernel Execution if verbosity isn't an issue.
The example actually becomes really simple.
import nbformat
from nbclient import NotebookClient
from enterprise_gateway.services.kernels.remotemanager import RemoteKernelManager
with open("/home/ish/notebooks/Untitled.ipynb") as fp:
test_notebook = nbformat.read(fp, as_version=4)
client = NotebookClient(nb=test_notebook, kernel_manager_class=RemoteKernelManager)
client.execute()
If you'd like a full docker with kernelspecs and config, I can do that, but I don't have experience with Yarn so it may take a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "Standalone Remote Kernel Execution" as the topic. Perhaps in the opening statement, we can toss in an "a.k.a., Library Mode". 😄 This way we have a way to quickly identify the context when communicating this functionality or discussing it in general.
The sample is excellent - I'm really happy with how this is turning out. Let's not worry about a docker image for the moment. Given the example, I can run this against my YARN cluster, although probably not today (sorry).
Hi - I finally got around to trying this, and I'm not able to make it off the launch pad. Using nbclient 0.1.0 (0.2.0 switches to async kernel management, which EG isn't able to move to yet due to notebook), I get the following because
Are you using a config file of any sorts to set that up? I was able to run EG fine with this build, but get this in 'library mode'. My driver is essentially identical to your example code. Also, I think we're going to want some form of documentation for this. We're you still planning on adding that? |
@kevin-bates apologies for getting back to you so late. Got wrapped up in another project. I will have time this week to debug this issue. I'm guessing it's the nbclient version difference; I'm using the latest nbclient (which works I guess surprisingly, given JEG isn't async) What's the timeline for the JEG deploy btw? I may be able to reprioritize things depending in order to have this ready by then if possible. I will be writing documentation for this. |
Hi @golf-player - thanks for the update. Regarding the timeline, as soon as Notebook 6.1 is available, we will merge #794 and start building the next EG release. I'm hopeful that some of the front-end notebook issues will be reviewed/merged in the next week or two so that release can move forward. I would love to get this into the next EG release, so, should things get to that point, I don't mind waiting to ensure this is ready, although I don't expect that to be necessary. |
Looks like nbclient 0.3.0 also just dropped. I've gotten both 0.3.0 and 0.1.0 to run the kernel, but there's another issue creeping up on both versions where they're not shutting down the kernels after. These may unfortunately need to be addressed on their end, but I'm still looking into it. Will update the MR for now with the changes to just add a kernelspec_manager in the constructor if one isn't gotten from the parent (RMKM explicitly passes a kernelspec_manager) |
@golf-player - thanks for the updates. I took this for another spin against my YARN cluster and got the following...
I went ahead and added a few more entries to In the YARN application log, we should see a shutdown indicator like the last line here. Instead, we see all but that line.
I've pushed the |
Looking at nbclient 0.2.0, if I add this line following L335
I get a graceful shutdown with a It seems like nbclient should also allow the kernel-manager to perform a graceful shutdown prior to forcing it. |
Sorry about that, I didn't go through the other processproxy subclasses. I kind of assumed the baseprocessproxy would be enough Seems like the problem here is that graceful shutdown (kernel client shutdown) isn't shutting down the process. Looking into it. |
The process is shutdown but nbclient isn't giving the kernel manager an initial chance to do it gracefully. This is a difference in behavior from what notebook does. |
https://github.com/jupyter/nbclient/blob/0.2.0/nbclient/client.py#L334-335 Here the client is given the chance to shutdown gracefully rather than the manager (this appears to do the same thing as the manager's graceful shutdown) |
Right. This was my very comment from earlier today: #810 (comment) |
Sorry I'm a bit confused here. Is |
That sends the shutdown message to the kernel, but we leverage the kernel manager for interacting with the hosting resource manager and the shutdown method on the kernel manager includes both graceful and immediate options. For the graceful route we send a comparable shutdown request to the listener but for immediate we enlist the help of the resource manager to do whatever is necessary. On YARN that equates to calling their KILL API. |
So in the case of a docker container running a kernel process, the kernel gets the shutdown message, and at that point the process should exit 0, stopping the running container. I guess what's happening here is that the kill request happens while it's shutting down. I'm still trying to see that that's what's happening in the logs |
Yeah that's what's happening; the shutdown message is received, then the pod gets killed. |
We have the listener as well. It's a sibling to the kernel to handle things like interrupts. I haven't looked at the container process proxies lately but they probably terminate the container when the immediate route is taken. This is probably ok in container land but not in YARN where the status is killed rather than finished. |
i think this warrants a discussion of its own with the nbclient devs. nbclient 0.3.0 I believe also does a force shutdown
Agreed. distributed mode as well. And even in container land, if it's being |
jupyter/nbclient#60 for that. These changes seem to be from the nbclient side. Documentation, IMO, should only be merged after nbclient supports clean shutdowns. The rest can go in after I fix the tests. What do you think? |
Thanks @golf-player. I guess I'm okay with the docs. They're really just an example and the code won't change even after nbclient fixes things. At most I guess we'd update the nbclient version in the example, but since there's already 0.3.0, even that isn't a big deal. I guess one of us should submit a PR to nbclient if we don't hear back from them soon. Aside from that, how are we doing here? Was the conductor endpoint commit the test-fix you were referring to? |
Well as of nbclient 0.3.1, it works at least as well as 0.2.0, so that's good. I'm working on a rough implementation to get some conversation going with them. I think this PR is looking done more or less. Yeah it was a minor config change breaking tests, but no issues now. |
Sounds good. Thanks for opening the nbclient PR. I just ran this both as EG and Independent RMK (like Independent George) and things look great. I plan on performing a squash and merge first thing tomorrow. If you'd prefer to squash to get the comment as you'd like please feel free, otherwise, I'll likely leave the meaningful commit comments in place or derive one myself. I think is going to be really useful! Thank you. |
I've pushed a squashed commit. Thanks so much for all the help! Hopefully the nbclient PR goes well. |
Make RemoteKernelManager (RKM) able to be run without a (grand)parent EnterpriseGatewayApp (EGA) instance so it can be used by nbclient. The config which RKM with EGA was being referred to using traitlet lineage like `self.parent.parent.property_name`. Change both to inherit from a Configurable mixin which contains all of EGA's previous config. Link the attributes of the RKM instance with EGA if available to keep old behaviour. Modify other properties RKM uses that are not traits to become `@property`s which fall back to sane defaults if running independently. Change RKM to be able to generate a kernel id if necessary. Change ProcessProxy to use provided kernel ids. Move kernel id generation logic out of RemoteMappingKernelManager. Resolves jupyter-server#803
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for this great pull request!
This is a WiP.
So first pass, I just wanted to get things working, and they do (with nbclient, not sure if gatewayapp still works, but it should). There's way too much asking for permission in the code right now, and there's probably a lot that could be done to separate code where mappingkernelmanagers are needed. Another option is to make a single kernel mapping manager (AKA a completely useless stupid class) just for convenience.
What do you think about doing traitlets like this? To properly share it with EG App, I'll need to add this mixin and remove the included traitlets from EG App.