Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JupyterLab Plugin #55

Open
jcrist opened this issue Jul 9, 2019 · 6 comments
Open

JupyterLab Plugin #55

jcrist opened this issue Jul 9, 2019 · 6 comments

Comments

@jcrist
Copy link
Member

jcrist commented Jul 9, 2019

In #54 we added an API that allows users to configure admin-defined fields when starting a cluster. This API provides an endpoint for discovering what options exist (name, type, description). In the Python client library we use this information to build a ipywidget interface for configuring the cluster. We could do the same here with a JupyterLab plugin.

I'm not sure what this would look like. I suspect it'd be easier to build a new plugin than modify the existing dask plugin, as we have features they don't need, and vice versa (e.g. we don't need any of the proxying support).

@jcrist
Copy link
Member Author

jcrist commented Oct 7, 2019

cc @ian-r-rose for thoughts here. In summary, the ideal lab extension I'd like for dask-gateway would:

  • Support adding tabs for the dashboards as the existing dask-labextension does. Since we don't need to proxy anything in the extension itself, I think this is just viewing a separate URL in a tab.
  • Provide a list view of all running clusters for the user (using the Gateway.list_clusters method). This could likely be the sidebar in the existing one.
  • Use Gateway.cluster_options (see Allow users to configure clusters #54) to generate a webform for creating a new cluster. This would allow users to change parameters graphically when creating a new cluster.
  • Have the ability to insert a cell for connecting to an existing cluster. The existing template doesn't work as it doesn't forward security information, so the implementation of this would have to be cluster backend specific.
  • Have the ability to leave clusters up upon jupyter-lab shutdown. This is the shutdown_on_close kwarg to new_cluster/GatewayCluster. I'm not sure what the UI experience should be here - perhaps just a checkbox in the creation form?

I think this could be done by modifying the existing extension, but am not sure if it's the best method. Some of these views would be alternative implementations of existing code, which should be doable with some config logic. The input form on cluster creation would be new functionality, but we could likely make this work with other cluster managers as well.

@mrocklin
Copy link
Member

mrocklin commented Oct 7, 2019

My understanding of @ian-r-rose 's original plan was that Dask-Gateway itself might replace the server-side component in dask-labextension. That way JupyterLab might talk directly to routes on the Gateway server.

@jcrist
Copy link
Member Author

jcrist commented Oct 7, 2019

The trick there would be handling auth. If you could handle auth all browser-side then there wouldn't need to be any server-side component for the lab extension.

@ian-r-rose
Copy link

Overall, my preference would be to adapt dask-labextension to be able to place nicely with dask-gateway, rather than create a fork/separate project.

* Support adding tabs for the dashboards as the existing `dask-labextension` does. Since we don't need to proxy anything in the extension itself, I _think_ this is just viewing a separate URL in a tab.

Yes, this should be pretty straightforward. Can you expand on what you mean about not needing to proxy anything in the extension? I should think it would still be valuable to proxy the bokeh server under the notebook server in this case, and most of the existing logic for that could be reused.

* Provide a list view of all running clusters for the user (using the `Gateway.list_clusters` method). This could likely be the sidebar in the existing one.

* Use `Gateway.cluster_options` (see #54) to generate a webform for creating a new cluster. This would allow users to change parameters graphically when creating a new cluster.

I don't know much about Gateway.cluster_options. Is it general enough to cover all the intended use-cases of dask-labextension? Creating a webform out of an arbitrary schema would be tricky, but if it is well-specified, this should be doable, if a bit of work.

* Have the ability to insert a cell for connecting to an existing cluster. The existing template doesn't work as it doesn't forward security information, so the implementation of this would have to be cluster backend specific.

Can a cluster backend send a code snippet over the wire for this? If it is sending security information, can it cache it in some kind of environment variable/store to be referenced later?

* Have the ability to leave clusters up upon jupyter-lab shutdown. This is the `shutdown_on_close` kwarg to `new_cluster`/`GatewayCluster`. I'm not sure what the UI experience should be here - perhaps just a checkbox in the creation form?

It's really difficult to guarantee performing some action upon shutdown of the server. I do think it would be great to be able to have persistent clusters that run independently of the server.

I think this could be done by modifying the existing extension, but am not sure if it's the best method. Some of these views would be alternative implementations of existing code, which should be doable with some config logic. The input form on cluster creation would be new functionality, but we could likely make this work with other cluster managers as well.

Yeah, I think the input form may be the trickiest part, depending upon how much it could be backend-specific. In dask-labextension we customize cluster creation via config, which makes things like auth easier to reason about. The downside is that it is much more static at runtime. Do you think that could be a reasonable first step for integrating dask-gateway? We could make it so that the user can include config for multiple backends, and allow them to select which one to use at creation time.

@ian-r-rose
Copy link

My understanding of @ian-r-rose 's original plan was that Dask-Gateway itself might replace the server-side component in dask-labextension. That way JupyterLab might talk directly to routes on the Gateway server.

If we can use dask-gateway for starting/stopping/scaling clusters instead of the cluster manager in dask-labextension, that would be fantastic. I still think that a server extension is probably worthwhile for proxying and keeping the connection predictable (the errors experienced by many users with non-managed bokeh servers have been a real nightmare to debug). But if we can get the python side of dask-labextension as lean as possible I'd be very happy.

@jcrist
Copy link
Member Author

jcrist commented Oct 8, 2019

Can you expand on what you mean about not needing to proxy anything in the extension? I should think it would still be valuable to proxy the bokeh server under the notebook server in this case, and most of the existing logic for that could be reused.

Dask-gateway runs the dask schedulers on a different node than the notebook server, and already proxies out the dashboards through a common shared proxy (that is intended to be publicly accessible). If the proxy in the labextension can proxy non-localhost routes, then there's no harm in keeping it, but it shouldn't be necessary in the common case.

I don't know much about Gateway.cluster_options. Is it general enough to cover all the intended use-cases of dask-labextension?

This is specific to dask-gateway, but the options model is general enough that I'd expect we could generalize the interface to other cluster managers as well.

Can a cluster backend send a code snippet over the wire for this? If it is sending security information, can it cache it in some kind of environment variable/store to be referenced later?

We wouldn't need to do any of this, the snippet would just need to be different. Something like:

snippet_template = """
from dask_gateway import Gateway

client = Gateway().connect({cluster_name}).get_client()
"""

It's really difficult to guarantee performing some action upon shutdown of the server. I do think it would be great to be able to have persistent clusters that run independently of the server.

To be clear, this isn't something that the labextension would need to worry about, we'd just want to expose the option on cluster creation (probably as a checkbox). This would set the shutdown_on_close kwarg in the cluster constructor. Dask-gateway would then handle managing shutdowns itself.

Do you think that could be a reasonable first step for integrating dask-gateway?

This already works, and is likely sufficient for now. In the long run I'd like to expose a form to change the parameters dynamically, which I suspect would be useful for other cluster managers as well.

I think the main tasks are (ordered by importance/ease):

  • Release dask-labextension, so that allowing asynchronous scale calls works. At this point dask-gateway can be used as any other cluster manager.
  • Add a separate code path for dask-gateway for listing clusters, allowing non-connected clusters to be managed externally and only show up in a list view.
  • Add support for a web-form for configuring clusters on creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants