Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Tornado #876

Closed
wants to merge 1 commit into from
Closed

Drop Tornado #876

wants to merge 1 commit into from

Conversation

davidbrochart
Copy link
Collaborator

Closes #656

ipykernel/ipkernel.py Outdated Show resolved Hide resolved
Copy link
Contributor

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidbrochart for this. I got a question and a comment.

ipykernel/control.py Outdated Show resolved Hide resolved
ipykernel/iostream.py Outdated Show resolved Hide resolved
ipykernel/iostream.py Outdated Show resolved Hide resolved
@davidbrochart
Copy link
Collaborator Author

Thanks @davidbrochart for this. I got a question and a comment.

Thanks for the review @fcollonval.
This is still a WIP so I'll make this PR a draft. I can see that the kernel is functional e.g. in JupyterLab, but there are still some issues, at least with shutdown.


self.log.debug('Stopping shell ioloop')
shell_io_loop = self.shell_stream.io_loop
shell_io_loop.add_callback(shell_io_loop.stop)
# FIXME: shouldn't need to be threadsafe
shell_io_loop.call_soon_threadsafe(shell_io_loop.stop)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that we shouldn't need to call_soon_threadsafe because the shell event loop runs in the main thread, but actually we do because we are processing the shutdown request from the control channel, which runs in another thread.
Correct me if I'm wrong 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right

@@ -382,7 +382,7 @@ def start(self):
self.session.send(self.shell_socket, 'execute_request', content,
None, (self.shell_socket.getsockopt(ROUTING_ID)))

ident, msg = self.session.recv(self.shell_socket, mode=0)
ident, msg = await self.session.async_recv(self.shell_socket, mode=0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# FIXME: original code was:
# self._io_loop.call_later(self.flush_interval, self._flush)
self._io_loop.call_soon_threadsafe(self._flush)
self._io_loop.call_soon_threadsafe(self._io_loop.call_later, self.flush_interval, self._flush)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if call_soon_threadsafe is needed. If not, we could keep the original code.

@@ -367,7 +367,7 @@ def close(self):
if socket and not socket.closed:
socket.close()
self.log.debug("Terminating zmq context")
# FIXME
# FIXME: this line breaks shutdown
# self.context.term()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why terminating the ZMQ context creates issues here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blink1073, I can see that currently, we don't close the heartbeat socket before terminating the ZMQ context, but the other way around: we wait for the context to be terminated before closing the socket.
Maybe this can be the root cause, @minrk ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context.term blocks until all sockets are closed and all unsent messages are delivered or discarded. A socket may close after the context if it is done from another thread. But it can't be done from a coroutine, for instance.

The things to check when a context hangs are always:

  1. are there sockets that haven't been closed, and
  2. was LINGER set on those sockets so they discard messages eventually

You can debug leftover sockets with self.context._sockets which is a set of weakrefs to sockets (this is private and subject to change, but works for debugging).

The heartbeat socket doesn't share this context, so I don't think that's relevant.

def send_multipart(self, msg_list, flags=0, copy=True, track=False, **kwargs):
return self.socket.send_multipart(msg_list, copy=copy)

def flush(self, flag=None, limit = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush has historically been very important behavior for ipykernel to behave appropriately. Making it a no-op doesn't seem great to me (It may not be as relevant today with everything being async, but this is a big change).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But flush seems to be supported only for Tornado's IOLoop and ZMQStream, right? I haven't found any reference of it outside of this context.

Copy link
Member

@minrk minrk Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, because send is async, but not awaitable with ZMQStream.send. We don't need this anymore as long as we await all our sends, since asyncio Socket.send is awaitable.


self.log.debug('Stopping shell ioloop')
shell_io_loop = self.shell_stream.io_loop
shell_io_loop.add_callback(shell_io_loop.stop)
# FIXME: shouldn't need to be threadsafe
shell_io_loop.call_soon_threadsafe(shell_io_loop.stop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right

@@ -367,7 +367,7 @@ def close(self):
if socket and not socket.closed:
socket.close()
self.log.debug("Terminating zmq context")
# FIXME
# FIXME: this line breaks shutdown
# self.context.term()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context.term blocks until all sockets are closed and all unsent messages are delivered or discarded. A socket may close after the context if it is done from another thread. But it can't be done from a coroutine, for instance.

The things to check when a context hangs are always:

  1. are there sockets that haven't been closed, and
  2. was LINGER set on those sockets so they discard messages eventually

You can debug leftover sockets with self.context._sockets which is a set of weakrefs to sockets (this is private and subject to change, but works for debugging).

The heartbeat socket doesn't share this context, so I don't think that's relevant.

@davidbrochart
Copy link
Collaborator Author

Although ipykernel seems functional with these changes, I don't really understand the test failures.
I think the next step is getting rid of the ZMQStream with their on_recv callbacks (which were really specific to Tornado), and just have tasks that receive messages and handle them. I would also like to get rid of the jupyter_client dependency. It seems weird for a "server" to depend on a "client".

@minrk
Copy link
Member

minrk commented Mar 11, 2022

I would also like to get rid of the jupyter_client dependency

I wouldn't do that. jupyter-protocol would have been a better name for that repo, since that's really what it defines. It is meant to be a dependency for any Python use of the Jupyter protocol, and that includes ipykernel.

@minrk
Copy link
Member

minrk commented Mar 11, 2022

I think the next step is getting rid of the ZMQStream with their on_recv callbacks

👍

a simpler while True: poll/recv/process/send coroutine ought to work, and might even make channel priority easier to ensure.

@davidbrochart
Copy link
Collaborator Author

Closing in favor of #1079.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dropping the dependency to tornado
4 participants