-
Notifications
You must be signed in to change notification settings - Fork 0
DM-38279: Do all spawning work in start #14
Conversation
The original idea to have start return immediately once the lab start has been initiated doesn't work with JupyterHub's assumptions about spawners. Its timeouts and error handling expect all of the work to happen in the start method, and progress must not raise exceptions or JupyterHub reports uncaught exceptions and breaks its UI and API. Follow the design of KubeSpawner and have the start method hold a copy of its task and do all the progress monitoring and event creation. The progress method then just waits for that task to complete, reporting all the events it generates as an iterator. Do a bit of refactoring to move spawn events to a separate class, and add severity to the start of the message similar to what KubeSpawner does.
Add tests to the controller mock to ensure that the spawner sends the correct tokens to the different routes.
This temporarily removes the wrapper around httpx errors because I was hoping the stringification of the httpx exception would be enough. It wasn't, so that comes back in an upcoming PR. |
Test failures during spawn, delays in returning the progress messages, and multiple event watchers.
Be kind to our future selves and return more details in unusual spawn situations via the event stream. Every bit of debugging information we can see is helpful. Separate the severity from the message for spawn events for clearer construction. Flesh out the docstrings and document which exceptions may be raised.
Some of the comments in the progress method were out of date or incomplete. Flesh them out.
The actual work is done in `_start`. This is a tiny wrapper to do | ||
bookkeeping on the event stream and record the running task so that | ||
`progress` can notice when the task is complete and return. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, taken from KubeSpawner. I'd really hoped we could leave more of the KubeSpawner grottiness behind with the RESTSpawner but alas.
except Exception: | ||
# We see no end of problems caused by stranded half-created pods, | ||
# so whenever anything goes wrong, try to delete anything we may | ||
# have left behind before raising the fatal exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this design choice.
---------- | ||
spawner | ||
Another copy of the spawner (not used). It's not clear why | ||
JupyterHub passes this into this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I once convinced myself it was something to do with functools and partial application, but I wouldn't swear I was right.
tests/spawner_test.py
Outdated
from rsp_restspawner.spawner import LabStatus, RSPRestSpawner | ||
|
||
from .support.controller import MockLabController | ||
|
||
|
||
async def gather_progress( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the overloading of "gather" a little confusing (yes, even though this is only ever used in a gather()
call). I would have used collect_progress
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
self._events.append(event) | ||
return await self._get_internal_url() | ||
else: | ||
r.raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always forget raise_for_status()
exists
@@ -35,6 +39,63 @@ class LabStatus(str, Enum): | |||
FAILED = "failed" | |||
|
|||
|
|||
@dataclass(frozen=True, slots=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better.
return cls( | ||
progress=90, message=sse.data, severity="info", complete=True | ||
) | ||
elif sse.event == "info": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably have done something silly with functools.partial and then updated the instance object thus created by changing progress or severity if needed, and adding complete/failed if needed, but this is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Avoid confusion with asyncio.gather.
The original idea to have start return immediately once the lab start has been initiated doesn't work with JupyterHub's assumptions about spawners. Its timeouts and error handling expect all of the work to happen in the start method, and progress must not raise exceptions or JupyterHub reports uncaught exceptions and breaks its UI and API.
Follow the design of KubeSpawner and have the start method hold a copy of its task and do all the progress monitoring and event creation. The progress method then just waits for that task to complete, reporting all the events it generates as an iterator.
Do a bit of refactoring to move spawn events to a separate class, and add severity to the start of the message similar to what KubeSpawner does.