Kernel Shutdown Proposal #618

mlucool · 2021-02-05T21:45:40Z

Shutdown kernel states the following in it's doc:

This attempts to shutdown the kernels cleanly by:

Sending it a shutdown message over the control channel.

If that fails, the kernel is shutdown forcibly by sending it a signal.

While this seems like the right thing to do, it does not work well for kernels like IPykernel (but may be fine for kernels like Xeus-Python). The problem is the signal sent via the control channel may be blocked by a running process, so there is no real chance to cleanly shut it down.

I propose that we add a step between 1 and 2 which tries SIGINT and then tries SIGTERM before finally using SIGKILL (step 2 today). What do you think?

kevin-bates · 2021-02-05T23:56:28Z

hi @mlucool, this makes sense to me - thanks for raising it.

I have some questions, comments...

I'm hoping we could just invoke interrupt_kernel() for the SIGINT step. Chances are that a kernel that supports message-based interrupts have probably responded to the message-based termination anyway, and this would be cleaner.
I'm assuming interrupt would occur after the 5-second period to determine if the message-based termination succeeded, correct?
- Is the SIGTERM even necessary at that point? I see that ipykernel and IPython don't handle SIGTERM anyway, so, relative to that kernel, there's really no need, although SIGTERM prior to SIGKILL is more correct and should probably be honored.
- I'd prefer we not add another 5-second period following the SIGTERM and prior to SIGKILL. Perhaps we could split the poll loops in 60/40 or something - so that the overall time for these edge cases isn't necessarily increased.

Another approach would be to unconditionally issue an interrupt_kernel() call prior to request_shutdown() and be done. Although cleaner, this would lengthen the shutdown sequence by a message.

((I'm hoping we could drop the if hasattr(signal, 'SIGKILL'): code and just use self.kernel.kill() and self.kernel.terminate(). Sending the signal and calling the method is the same thing anyway.))

mlucool · 2021-02-06T13:00:18Z

Another approach would be to unconditionally issue an interrupt_kernel() call prior to request_shutdown() and be done. Although cleaner, this would lengthen the shutdown sequence by a message.

I like this idea. Still, it seems like SIGTERM is needed for things inside the kernel to be able to cleanly shutdown.

New proposal:

Interrupt then shutdown message over the control channel (with no delay between the two)
SIGTERM
SIGKILL

I don't have a good intuition as to how long we need to wait between the steps. Is the 60/40 idea to wait 3 seconds after the control channel and 2 seconds after SIGTERM?

cc @Carreau - what do you think w.r.t. ipykernel?

kevin-bates · 2021-02-06T17:02:05Z

Still, it seems like SIGTERM is needed for things inside the kernel to be able to cleanly shutdown.

Assuming the interrupt request would proceed today's graceful shutdown sequence and restore the kernel's ability to respond to control messages, I would argue that the request-shutdown message is tantamount to issuing SIGTERM prior to SIGKILL and we don't need the additional SIGTERM.

Is the 60/40 idea to wait 3 seconds after the control channel and 2 seconds after SIGTERM?

Yes, but only if we don't perform the interrupt as a prerequisite to the request-shutdown message.

I'm not familiar with kernel message handling in general, so would like to hear other opinions as well.

Carreau · 2021-02-06T18:06:16Z

I agree that sending interrupt before request_shutdown make sens.
I think it is ok to also send sigterm shortly after request_shutdown, before sigkill, just in case the kernel is blocked in a compiled-routine that may ignore sigint.

mlucool · 2021-02-06T19:45:10Z

In case xeus authors have any thoughts here as well - cc @martinRenou @SylvainCorlay

SylvainCorlay · 2021-02-06T20:27:30Z

While this seems like the right thing to do, it does not work well for kernels like IPykernel (but may be fine for kernels like Xeus-Python).

Hey @mlucool, thanks for the ping!

FYI, I opened a PR to ipykernel last week to run the control channel in a separate thread, so that ipykernel can properly handle shutdown request while processing execute requests. We are also working on plugging the debugger on top of that PR.

I think it makes sense for kernels to implement signal handlers, but I am not sure it is really necessary from a jupyter_client perspective, although I am not opposed to this change.

mlucool · 2021-02-08T19:26:22Z

Based on above, I think it's still best to send SIGTERM in case some code is blocking the shutdown. What if we set the default to 7s and split the time between proposed steps 1&2 and 1&3 - so 3.5 for the control channel to shut down and 3.5 for graceful SIGTERM?

kevin-bates · 2021-02-09T01:23:39Z

I think this seems fine but would just like to clarify the sequencing. Here's my understanding...

Issue interrupt via interrupt_kernel()
Issue shutdown-request via request_shutdown()
Poll for up 50% of waittime to see if kernel is still alive
If alive:
1. Issue SIGTERM via self.kernel.terminate()
  Poll for up 50% of waittime to see if kernel is still alive
  If alive:
  1. Issue SIGKILL via self.kernel.kill()

Is that the plan?

(Thanks for dealing with this.)

mlucool · 2021-02-09T12:56:34Z

Correct @kevin-bates. Only final question is should we increase the total waittime or not. Moving to 6 or 7 seconds seems like a good idea to me at least, but I don't feel strongly.

kevin-bates · 2021-02-09T14:48:22Z

I'm beginning to think it might be best to keep waittime at its current default of 5. Use that value for the graceful shutdown attempt (after interrupt and request-shutdown), then use, say, 50-60% of that value for the forceful shutdown attempt sequence (after SIGTERM).

The help text for shutdown_wait_time says: "Time to wait for a kernel to terminate before killing it, in seconds." which is still all true. Just that the time to kill can now take 50% more of that time. 😄

Does that sound reasonable?

mlucool · 2021-02-09T15:41:34Z

I think users care about the total max time they would have to wait to have the kernel be stopped (SIGKILL). So long as the language in the documentation is updated to make it clear it's all the same to me.

kevin-bates · 2021-02-09T15:44:06Z

ok - I'm fine with that.

mlucool · 2021-02-16T18:34:49Z

Great. @kevin-bates, in terms of implementing this change, are you planning to do it? If not, I'm happy to have someone on my end contribute it.

kevin-bates · 2021-02-16T18:45:54Z

I wasn't planning on implementing this; a contribution would be welcome. Thanks Marc.

kevin-bates mentioned this issue Feb 22, 2021

Interrupt and sigterm before sigkill kernel. #620

Merged

Carreau added this to the 6.1.11 milestone Mar 10, 2021

Carreau linked a pull request Mar 10, 2021 that will close this issue

Interrupt and sigterm before sigkill kernel. #620

Merged

Carreau modified the milestones: 6.1.11, 6.1.12 Mar 14, 2021

Carreau closed this as completed Mar 14, 2021

Zsailer mentioned this issue Aug 19, 2021

Jupyter Server Notes 2021 jupyter-server/team-compass#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel Shutdown Proposal #618

Kernel Shutdown Proposal #618

mlucool commented Feb 5, 2021

kevin-bates commented Feb 5, 2021

mlucool commented Feb 6, 2021

kevin-bates commented Feb 6, 2021

Carreau commented Feb 6, 2021

mlucool commented Feb 6, 2021

SylvainCorlay commented Feb 6, 2021

mlucool commented Feb 8, 2021

kevin-bates commented Feb 9, 2021

mlucool commented Feb 9, 2021

kevin-bates commented Feb 9, 2021

mlucool commented Feb 9, 2021

kevin-bates commented Feb 9, 2021

mlucool commented Feb 16, 2021

kevin-bates commented Feb 16, 2021

Kernel Shutdown Proposal #618

Kernel Shutdown Proposal #618

Comments

mlucool commented Feb 5, 2021

kevin-bates commented Feb 5, 2021

mlucool commented Feb 6, 2021

kevin-bates commented Feb 6, 2021

Carreau commented Feb 6, 2021

mlucool commented Feb 6, 2021

SylvainCorlay commented Feb 6, 2021

mlucool commented Feb 8, 2021

kevin-bates commented Feb 9, 2021

mlucool commented Feb 9, 2021

kevin-bates commented Feb 9, 2021

mlucool commented Feb 9, 2021

kevin-bates commented Feb 9, 2021

mlucool commented Feb 16, 2021

kevin-bates commented Feb 16, 2021