-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offloaded HTTPS: Redirect HTTP port traffic to HTTPS #1811
Offloaded HTTPS: Redirect HTTP port traffic to HTTPS #1811
Conversation
{{- else if $offloadHTTPS }} | ||
- --port=8443 | ||
- --redirect-port=8000 | ||
- --redirect-to=443 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What flags should be used here? I figure the goal is really to send back a redirect with a Location header (where someone is redirected) being the same as the request originally but with a https:// scheme.
Flags at our disposal according to the CHP readme
--redirect-port <redirect-port> Redirect HTTP requests on this port to the server on HTTPS
--redirect-to <port> Redirect HTTP requests from --redirect-port to this port
--auto-rewrite Rewrite the Location header host/port in redirect responses
--protocol-rewrite <proto> Rewrite the Location header protocol in redirect responses to the specified protocol
856617f
to
ffa0da2
Compare
I don't have any experience with Z2JH's built-in https, I've always used a cluster ingress controller configured with HTTPS. https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#off-loading-ssl-to-a-load-balancer |
@manics thank you for your time considering this! ❤️ I'm generally in need of review atm so I love it! Thank you! @manics that is a very very good question that I've forgot to raise, but I remember having raised this before and become quite frustrated with the situation. I believe, that the reason is specifically to be prepared for already decrypted HTTPS traffic on 443, as that is the key logic I see implemented in the chart. To me, it sounds like a workaround that didn't belong in the chart, hmm... |
If unencrypted traffic is being sent from the external load balancer to port (8)443 on the proxy that sounds like a bug in one of the components rather than something the helm chart should have to work around. Does anyone have a reproducible example? |
Hi Everyone I originally wrote the part for To listen on port 443 using a Kubernetes managed cloud provider's load balancer we have to define the port in the service resource, but disabling https with ( Setting @manics - When you're using an ingress controller offloading and disabled are essentially the same thing. Rather than having the service object create and set things up you've setup an HTTPS listener and attached the certificate in your ingress controller configuration (my production install uses Traefik as an ingress controller and we do exactly this). We redirect HTTP to HTTPS in our ingress controller because we found it easier than modifying JupyterHub. Another added benefit of using an ingress controller is that we can run multiple applications through a single load balancer and save some money (we host multiple JuptyerHub instances that way too). |
Thank you for helping out @pcfens! ❤️ Currently I don't manage to follow what you are saying @pcfens =/
Oh yikes... I just don't understand anything I feel... I give up for now as I want to head for bed, but I hope to raise this question given that I fail to answer it myself: Is this Helm chart still required to have this kind of k8s Service logic that points two separate service ports to a single pod port in order to support k8s clusters on AWS to use AWS systems to manage TLS termination? |
Let me try and clarify a little bit. I say Kubernetes managed load balancer to differentiate from a load balancer that you configure yourself and point at a Kubernetes cluster. In this case it's any Service object/resource where Annotations are used to attach metadata to objects. Since the Service object doesn't have any Amazon (or other cloud service) specific fields where you'd set things like SSL termination, logging, etc, the authors of most integrations choose to use annotations for the extra config. The pattern you describe where there are pods that terminate SSL and route traffic is what Ingress controllers do when combined with the Ingress resource. The current service logic (or perhaps some simplified version of it) is required to support SSL termination on any cloud load balancer without an Ingress controller, not just on Amazon. |
Thank you for taking the time to help me understand @pcfens, I'm not really understanding still, but closer. I need to process this further, and I'll write some notes in this comment while I'm doing it. Relevant links
Terminology
Rubber duckSo, the traffic arrives to a LB outside the k8s cluster, which is setup as triggered by the k8s Service of I think the k8s external LB has what AWS calls listeners. I see from an example that these can actually listen on 443, terminate TLS, and then send decrypted traffic to port 80. So - my confusion here is why we need port 443 configured on the k8s Service as well as port 80, and not only port 80. But, at the same time, all examples indicate this so far, and I've not seen any annotation that describe something like this for AWS, allowing for example a k8s Service to only have a port: 80, and then have an annotation describing that we want the frontend port to be 443, but map to the k8s service port 80.
Another question in my mind is that if we have a AWS LB configured through a k8s Service like this, does that mean that there will be e2e HTTP traffic arriving at port 80, and HTTPS->HTTP traffic arriving at port 443? Arrrgh... Ah well.. Off to bed now. |
@pcfens Thanks for your additional explanation. Could you perhaps sketch out a diagram of encrypted and unencrypted data flows between the load-balancer, CHP, and JupyterHub, including the relevant ports on each component? I think that would help us understand what the requirements are when As you can tell there's some confusion here, and it'd be good to have a clear explanation we can use as a reference for future maintainers. |
The link between the CHP and the JuptyerHub service always runs over http, so I'm going to ignore everything beyond the proxy. Let's start with what things look like when When HTTPS is enabled and you're terminating SSL on the CHP (not offloaded), we end up with things looking like this.
Setting
In this case, the CHP can't tell whether the client is connected to the load balancer using HTTP or HTTPS without using headers, which the CHP can't do today. To redirect HTTP to HTTPS when we offload SSL termination we need to modify the CHP to look at headers. The options used in this PR ( @consideRatio - Can you help us understand your use case a little bit better? If we have a better idea of what you're trying to do I might be able to help with a patch. |
Thanks for the diagram!
Is this something we could fix in CHP and avoid the complexity of the Helm chart workaround? CHP has some support for X-Forwarded headers, I've got an open PR to modify the behaviour slightly due to how how Tornado (and therefore JupyterHub) handle the header: jupyterhub/configurable-http-proxy#267 |
Adding the ability to handle |
#1813 adds the ability to specify additional CHP arguments, which I think could cover everything this PR does and more? If we can't figure out what the correct behaviour should be here this could be an alternative? |
Thanks for pointing that one out - I didn't realize there was an option to do that already. #1813 exposes the options here and more. Since redirecting to HTTPS should be a common use case we might want to add some documentation around the redirect flags to the HTTPS section. |
Ah hmmm, my interpretation is that the Cloud Load Balancer is network infrastructure external to k8s which maps into k8s through a k8s Service with NodePort and ClusterIP configured automatically to work well with the externally setup LoadBalancer. Interpretation evidence 1
Interpretation evidence 2
Interpretation overviewThis is how I think of it.
ConclusionI think AWS external load balancer doesn't provide sufficient confiugration options to avoid being forced to map both port 80 and 443 of the k8s proxy-public service to the proxy pod, and that we are forced to support having the proxy.https.type=offload setting due to this. This seems like a shortcoming of AWS LB, and I'm not sure if other cloud provides have such limitation as well or if they can be configured to only pass onward decrypted HTTPS traffic by doing something like i describe below, which I think is a better configuration. My wish for a external cloud LB
|
We have discussed why we have a HTTP and HTTPS on the k8s Service when I think the redirect this PR suggest though, may be problematic. The issue is that the TLS termination is done outside of the k8s cluster, and if we redirect to https://domain-unchanged:443 then a k8s local request will be wrong, so anyone wanting to sent HTTP directly to proxy-public, then they need to send traffic to http://proxy-public:443 which is quite weird. I think this PR make some sense, but I'm open to closing it unless someone can verify it works properly and champions it. I mostly opened it to close #1190. |
Closing this PR as I don't see it getting merged. |
Summary
In #1190 the question was raised if there was a way for this Helm chart to take care of HTTP -> HTTPS redirections when
proxy.https.type=offload
. I concluded that this was possible in theory, but would require some changes because currently we mash all incoming traffic on port 80 and 443 to port 8000 on the CHP pod, and then we can't rely on the ports to know how to redirect.We don't have CI tests for this, and I have never worked with a
proxy.https.type=offload
deployment, so I'd really love for someone with experience with those to contribute with a review.Closes #1190.
Exploratory notes
I'm not sure if we can do this redirect, all traffic in the application goes through the proxy-public k8s service, and if we have configured offloading the HTTPS part, then the proxy-public service will go to the pod named
proxy
and run the ConfigurableHTTPProxy (CHP) NodeJS server running outside and separate to thehub
pod, so perhaps that can be configured to send out a 301 redirect on HTTP traffic.In the CHP changelog i found indication of this would be possible.
In the CHP readme I found confirmation of this would be possible:
In the configuration of the CHP pod I found the configuration was determined like this:
zero-to-jupyterhub-k8s/jupyterhub/templates/proxy/deployment.yaml
Lines 52 to 78 in a2c3ef8
But there is a crux now that I think of it, and that is that if all traffic arriving to the CHP server is either originally HTTP, or originally HTTPS decrypted and now HTTP, this still requires the traffic to arrive to different ports, because if not, this logic would just redirect even the HTTPS traffic that has been redirected already.
Networking investigation
How does traffic flow to the CHP server within Kubernetes?
We need to consider the
proxy-public
Service, and the Pod, assumingproxy.https.type=offload
.port
andtargetPort
redirections1 -
proxy-public
Service -> CHP Podzero-to-jupyterhub-k8s/jupyterhub/templates/proxy/service.yaml
Lines 44 to 65 in a2c3ef8
2 - CHP Pod -> CHP container
zero-to-jupyterhub-k8s/jupyterhub/templates/proxy/deployment.yaml
Lines 105 to 113 in a2c3ef8
Conclusion
Traffic arriving to both the entrypoint of the Helm chart, the proxy-public service, at either the
443
port and80
port, when the chart is configured withproxy.https.type=offload
is redirecting traffic to the port namehttp
, which is port 8000. Due to this, we cannot configure CHP with flags to redirect HTTP to HTTPS, because all traffic is the same port-wise.Solution idea - PR needed
The gist is to ensure, when proxy.https.type=offload, that we have a network route of traffic that is decrypted HTTPS separate from a network route that is original HTTP traffic, where the decrypted HTTPS traffic flow under the 443 or 8443 ports, and the original HTTP traffic flow under the 80 or 8000 ports.
https
named port8443
on the CHP pod.https
port traffic andhttp
port traffic to the same CHP pod port in theproxy-public
service, but lethttps/443
map tohttps/8443
, andhttp/80
map tohttp/8000
.http/8000
to a location which is the same, but with the HTTPS scheme.This change is a bit sensitive, because we don't have a CI system to verify we don't break functionality for those with
proxy.https.type=offload
configurations. I think the change makes a lot of sense, but since its theory all the way rather than based on practical experience if I would create this PR, it would be really good to have feedback from those with offload deployments already running.Review considerations
Assuming
proxy.https.type=offload
, what happens if...$PROXY_PUBLIC_SERVICE_PORT
? I think nothing would change, instead of traffic going through the k8s Service port 443 to the Pods port 8000 it goes to the Pods port 8443.$PROXY_PUBLIC_SERVICE_HTTP_PORT
? I think this will mean a redirect is added, and assuming the sender tried to send HTTPS traffic it will fail because the actual TLS termination is further out, so unless CHP is configured to know its public URL, it won't be able to redirect to itself properly when working internally. I think this is okay.proxy.https.type=offload
configuration? What does it do? I think the answer is that it simply accepts HTTP traffic on another port in the Service (443). But, what was that meant to accomplish? Is it a workaround for logic outside the Helm chart, or for logic within the Helm chart?