-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable SSL on forwarded requests #169
base: main
Are you sure you want to change the base?
Conversation
jupyter_server_proxy/__init__.py
Outdated
ssl_options = None | ||
if serverproxy.https: | ||
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile) | ||
ssl_context.load_cert_chain(serverproxy.certfile, serverproxy.keyfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a client certificate and key? If so could it be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was intending that the serverproxy.https
configurable would make it optional but perhaps you mean something else? Thanks for taking a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably me misunderstanding how this works.
Is this so that jupyter-server-proxy can proxy a notebook URL such as https://example.org/notebook/proxied-service/
through to a backend service that's only accessible over https? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concrete case we are thinking of is where a Dask cluster is running on a HPC system and we turn on TLS to encrypt the traffic. The Bokeh dashboard can also be configured to use TLS, and we'd like to proxy that dashboard through jupyter-server-proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not following this up earlier.
For example you'd use it to proxy a site such as https://grafana.mybinder.org/ (though internal) via jupyter? This means:
- jupyter-server-proxy connects to https://grafana.mybinder.org/. For SSL validation this requires the root CA public certificates (either bundled with Python if you're using an externally provided cert, or if it's an internal CA then configured in jupyter-server-proxy).
- This is then passed through to the user's browser via jupyter notebook. If the notebook is protected by HTTPS (e.g. by Nginx) anything proxied by jupyter-server-proxy should also be protected since it all goes through the notebook server.
What I don't understand is where the requirement for a private key serverproxy.keyfile
arises.
This is IMHO an important pull-request. Any chance it can be merged to master? |
There are questions about how this should work/be implemented and what the exact goal/use-case is. I think to get the review process of this moved forward we need an example that people can try out to help understand the context. At least I think that would help sort out some of the questions that have come up. If you can prepare one that we can try at home that would be a great contribution. |
# Configure SSL support | ||
ssl_options = None | ||
if serverproxy.https: | ||
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we setup the context like this, do we also get all the "normal" CAs or only the one in the cafile
?
"Normal" CAs in this case would be those that your OS trusts.
We have a situation where compute nodes on our supercomputer are normally insulated from inbound connections from the outside world. The inside just has 10.x.x.x interfaces. Notebooks normally run outside those batch-scheduled nodes and users can keep notebook servers around persistently on those "outside" nodes, which are really just repurposed login nodes. These nodes do have an interface on the same network as the compute nodes. We use jupyter-server-proxy to forward things through to/from the computes, it's pretty neat. Our users want to run Dask on the compute nodes in job allocations and talk to them from their persistent JupyterLab sessions, and we'd like to help them run SSL on the cluster and the dashboard. Yes it's an option to set up the scheduler and dashboard alongside the notebook server on the front end but network-wise this is less optimal. We are in the process of implementing JupyterHub end-to-end SSL as well. The Dask dashboard is the main focus for us with this pull request. There is a way to run the dashboard with SSL but the requests that jupyter-server-proxy needs to forward also need SSL support, hence the SSL context here (it is not for the scheduler+workers itself). The main goal is for each user to be able to be sure the entire paths are running under SSL. |
Thanks for expanding your use case. I think a diagram of the requests between each of the components would be helpful. In particular I still don't understand why jupyter-server-proxy requires a private key. Typically the key would only be required for SSL if you're running a server. jupyter-server-proxy isn't a server, it's an add-on to jupyter-notebook which is the server. If the aim is to encrypt traffic between the dashboard and jupyter wouldn't that need to be configured on the dashboard side? |
I'll try to get a diagram going. Indeed the aim is to encrypt traffic between the dashboard and Jupyter, and it is configured on the dashboard side. But the client (jupyter-server-proxy) needs an SSL context to talk to it. That's how I envisioned this working at least... |
The hub and hub proxy are running on some servers managed by Rancher (Docker containers), and a load balancer not shown does the front-end termination. The hub starts Jupyter notebooks on a "login" node using a custom spawner. Now suppose a user submits a job with Slurm and starts up a Dask cluster inside the job allocation, including a scheduler, the dashboard, and a bunch of workers. These nodes running Dask are compute nodes and are not publicly accessible, so to see the dashboard running there we need Jupyter server proxy, in fact we needed #154 for that to work. We'd like to be able to talk to the dashboard if it's using TLS, and I think that means we need the SSL context to be set up as a client in jupyter-server-proxy. |
Thanks for the diagram. I agree you only need a SSL client. I'm not convinced it requires a key though. For example suppose you wanted to proxy https://grafana.mybinder.org/ (there's no reason why this would be different from proxying an internal URL), which key and certificate would you use? |
I wasn't thinking about that? My specific use case was for internal-ssl style self-signed certs (similar to how the Hub does it, e.g. via certipy or possibly even trying to re-use some of the infrastructure). At this point I suppose there are a huge number of other use cases we might want to worry about but I didn't realize I'd signed up to handle all those... If it's just a simple change of making an argument optional or something I don't see an issue at all but if we need a more fully-featured PR than that perhaps someone with more experience with SSL needs to do it? |
A self-signed cert would require the internal CA or the public self-signed certificate to verify it, but it still shouldn't require the private key. |
See for example this Stack Overflow when using the requests module: Only the server's public certificate should be required |
OK so you mean
If the keyfile is left as empty it falls back to the default for |
According to the What do you think? |
When originally drafting this PR I investigated how JupyterHub does this. There's a function there called During that read-up I also saw the documentation you're referencing but when I saw that JupyterHub was already doing this one way, I figured if it was good enough for JupyterHub to use the SSL context then maybe it's good enough for jupyter-server-proxy! |
Fair enough! Let's stick with the ssl-context for now then. In addition to the
|
Just wanted to bump this to see where this is :) There's a merge conflict already, would be great to get this in before there are too many of those! Thanks for the thorough review, @manics. |
I can add an |
Instead of another property
|
Thanks @manics I'll try that, probably Wed or Thu for the update. |
I think there might be an issue with keyfile being set but not certfile. Man I hate this SSL stuff. |
jupyter_server_proxy/handlers.py
Outdated
@@ -330,7 +334,11 @@ def proxy_request_headers(self): | |||
def proxy_request_options(self): | |||
'''A dictionary of options to be used when constructing | |||
a tornado.httpclient.HTTPRequest instance for the proxy request.''' | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcthomas this might be your culprit, pesky git 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh bother, thought I got them all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for persevering with this! I've made a couple of minor suggestions, but it generally looks good 👍
I'm having trouble testing this though, could you give me an example config? I tried to add an integration test in a PR against your fork:
rcthomas#1
but it's failing. I think I've traced it down to this line:
url = 'http://localhost:{}'.format(self.port) |
# Configure SSL support | ||
ssl_options = None | ||
if serverproxy.https: | ||
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile) | |
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile or None) |
The default value of serverproxy.cafile
is ""
. If this is handled as None
then it should use the system CA:
cafile, capath, cadata represent optional CA certificates to trust for certificate verification, as in SSLContext.load_verify_locations(). If all three are None, this function can choose to trust the system’s default CA certificates instead.
https://docs.python.org/3.6/library/ssl.html#ssl.create_default_context
help=""" | ||
Whether to use SSL for forwarded client requests. | ||
|
||
If this is set to `True` then you should provide a path to an SSL key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is set to `True` then you should provide a path to an SSL key, | |
If this is set to `True` then you can optionally provide a path to an SSL key, |
I'll see if I can push some changes so that my test will pass. |
I think SSL is cursed! I managed to get my test passing: To do this I moved the SSL configuration from the top-level global config to a per-server config (so https can be enabled for individual proxied servers). I then (finally!?) understood @rcthomas's use case, which is to enable SSL proxying using something like Suggestion: Use https for ports that aren't using a manage process by extend the remote-host format to optionally allow a protocol:
Alternatively instead of Current URL handlers: jupyter-server-proxy/jupyter_server_proxy/handlers.py Lines 562 to 569 in b9b4d22
@yuvipanda what do you think? |
Hey @manics thanks for the help during my absence here, I am back. Sigh. SSL. Sigh. You make a good point about how I would be forcing SSL on every remote_host:port combo. In my case I'm mostly worried (that's too strong a word, concerned maybe?) about one use very dominant use case for me (Dask). That said there are other things out there and I don't want to make things horrible for those users just because. I have no opinion on :// (except that as an emoji it summarizes my feelings about SSL); I do remember that before I encountered the {remote_host}:{port} PR I had just implemented it with a / in between. That is, I don't think we have to make it look like a URL in there though I see how it makes it clearer what's going on. I'd be happy with just /'s between everything. |
To perform some thread necromancy, I think forcing SSL on a per-server basis would mean we don't have to mangle URLs, right? I want folks to be able to use the dask dashboard without having to know if it's over https or http - especially without us redirecting properly. So we can give admins control over this on a per-server basis, and force https whenever it is set. I think that's what I see in https://github.com/jupyterhub/jupyter-server-proxy/compare/master...manics:pr169?expand=1? If so, I think we can just merge this and iterate. |
We would have a similar need for SSL support in jupyter-server-proxy. |
Hi, |
@davidbrochart https is only required for external connections between your browser and mybinder. If your webserver is running inside your Docker container then plain http should work since communication will be fully within the container (between the notebook process and your webserver). External connections will automatically be https. |
But I think this is the case, my server is running in the kernel which is inside the Docker container, but the requests are made from the browser. So I understand that my server must support SSL, and that it should get the certificate from mybinder, but I'm not sure if this PR will solve my issue. |
keyfile = Unicode( | ||
"", | ||
help=""" | ||
Path to optional SSL key. | ||
|
||
Use with `https=True` and `certfile`. | ||
""", | ||
config=True | ||
) | ||
|
||
certfile = Unicode( | ||
"", | ||
help=""" | ||
Path to optional SSL cert. | ||
|
||
Use with `https=True` and `keyfile`. | ||
""", | ||
config=True | ||
) | ||
|
||
cafile = Unicode( | ||
"", | ||
help=""" | ||
Path to optional CA file. | ||
|
||
Use with `https=True`. | ||
""", | ||
config=True | ||
) | ||
|
||
https = Bool( | ||
False, | ||
help=""" | ||
Whether to use SSL for forwarded client requests. | ||
|
||
If this is set to `True` then you should provide a path to an SSL key, | ||
cert, and CA. Use this if the proxied service expects to service | ||
requests over SSL. | ||
""", | ||
config=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to get all these SSL informations from mybinder (which uses SSL)?
@davidbrochart If the server you are talking about is running in the container alongside the notebook server, you shouldn't need to worry about SSL, you just proxy that server's port normally. You only care if the server's off somewhere else and you want to encrypt traffic between your notebook and the service. That's what this was originally about, in particular the case where the network path between jupyter-server-proxy and the back-end service is not going over the internet |
Yes the server is running in the container, and I would be fine with no SSL, but because the browser is going to make requests from an |
@davidbrochart In order to keep this discussion focused on the PR would you mind re-asking your question and adding the additional information on the Jupyter Community Forum https://discourse.jupyter.org/ instead and I'll follow up there? In doing so you'll also be helping others working with jupyter-server-proxy, I'm sure others have run into the same problem so it'd be valuable to have this as a community post. Thanks 😀. |
Sound good @manics, I'll do that. Thanks! |
Are there any plans to merge this PR? I use this plugin to start a Theia backend for each user. It turned out that Theia's webview recently started enforcing https on webviews and Related: #332 |
This is my initial pass at #89. Happy to start iterating.
The main source of uncertainty for me is around
check_hostname
on the SSL context. I tested without that set and then fretted about it for 3 weeks now, then added it as something you could configure. I looked at both what JupyterHub and Dask Distributed do, and they both have actual opinions. I went with making the default look like what Dask does because that's closest to our set of assumptions. I'm not sure the "True" case has everything else needed to work.