Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to disable tls certificate #1421

Merged
merged 7 commits into from
Sep 1, 2022
Merged

Add option to disable tls certificate #1421

merged 7 commits into from
Sep 1, 2022

Conversation

iameskild
Copy link
Member

Fixes | Closes | Resolves #1418

Please remove anything marked as optional that you don't need to fill in.
Choose one of the keywords preceding to refer to the issue this PR solves, followed by the issue number (e.g Fixes # 666).
If there is no issue, remove the line. Remove this note after reading.

Changes introduced in this PR:

  • This change will allow qhub users to disable TLS certificates altogether. This might be needed for those with unique deployment requirements.

Types of changes

What types of changes does your PR introduce?

Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features to not work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

Requires testing

  • Yes
  • No

In case you checked yes, did you write tests?

  • Yes
  • No

Documentation

Does your contribution include breaking changes or deprecations?
If so have you updated the documentation?

  • Yes, docstrings
  • Yes, main documentation
  • Yes, deprecation notices

Further comments (optional)

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered and more.

@iameskild iameskild added type: enhancement 💅🏼 New feature or request area: user experience 👩🏻‍💻 needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug labels Aug 27, 2022
@iameskild iameskild added this to the Release v0.4.4 milestone Aug 27, 2022
@iameskild
Copy link
Member Author

Existing functionality is preserved (i.e. adding a certificate of type existing, self-signed or lets-encrypt works as normal). However I have had a some trouble getting past stage 05-kubernetes-keycloak. The helm chart seems to deploy but the keycloak-0 pod fails the readiness probe with the following error message:

Startup probe failed: Get "http://10.244.0.7:8080/auth/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The logs for this pod also don't seem to indicate any problem in particular either.

Trying to change or update the http redirects here doesn't seem to resolve the issue either unfortunately. I've attempted a few other workarounds to no avail.

@costrouc
Copy link
Member

costrouc commented Sep 1, 2022

@iameskild bummer this looks like the blocking issue that we need to solve. It may be that keycloak requires https.

@iameskild
Copy link
Member Author

iameskild commented Sep 1, 2022

@costrouc this might be a keycloak issue but I'm also starting to wonder if this is a traefik issue (or possibly both).

@viniciusdc and I worked on this together (and async), here are some of our findings thus far:

  1. Removing the certificates altogether

Remove:

--entrypoints.websecure.http.tls.certResolver=default
--entrypoints.minio.http.tls.certResolver=default

Endpoints become unreachable, although the servers/pods are running.

404 page not found
  1. Remove the certificates and http redirects

Remove:

--entrypoints.web.http.redirections.entryPoint.to=websecure
--entrypoints.web.http.redirections.entryPoint.scheme=https

--entrypoints.websecure.http.tls.certResolver=default
--entrypoints.minio.http.tls.certResolver=default

Failure.

  1. Remove the certificates, http redirects, 443 entrypoint, then update the ingressroute entrypoints from websecure to web

Remove:

--entryPoints.websecure.address=:443

--entrypoints.web.http.redirections.entryPoint.to=websecure
--entrypoints.web.http.redirections.entryPoint.scheme=https

--entrypoints.websecure.http.tls.certResolver=default
--entrypoints.minio.http.tls.certResolver=default

and

Update:

spec:
  entryPoints:
  - web

Failure but a different, (possibly worse) error message:

ERR_TUNNEL_CONNECTION_FAILED
  1. Set the certResolver=disabled

Update:

--entrypoints.websecure.http.tls.certResolver=disabled
--entrypoints.minio.http.tls.certResolver=disabled

(also tried with =disable)

This seems to work but I'm not confident TLS is truly disabled. This is because if I replace =disabled with anything else (like =asdf), the endpoints are still accessible.

The traefik-ingress deployment does complain though:

time="2022-08-31T22:28:18Z" level=error msg="the router dev-forwardauth-17bf08e1c554eea34bf2@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-keycloak-http-68dca7b5055256300e80@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-argo-workflows-9032f8af14559694bafb@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-conda-store-server-e61454fcee3ec688fda9@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-grafana-ingress-route-d4489275eb142883ebf6@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-minio-api-56d485c35e2316f9d1d9@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-dask-gateway-9f2bc98852d42df99b96@kubernetescrd uses a non-existent resolver: disabled"
time="2022-08-31T22:28:18Z" level=error msg="the router dev-jupyterhub-b9049a2cdcdad7a0e1cd@kubernetescrd uses a non-existent resolver: disabled"
  1. Remove the certificates and add --serverstransport.insecureskipverify=true
  • A few permutations of this:
    • Remove http redirect
    • Remove 443 entrypoint
    • update ingressroute from websecure to web

This also resulted in a 404 error.

@viniciusdc
Copy link
Contributor

viniciusdc commented Sep 1, 2022

If I am correct, this is the current structure of the routes, without any changes
Blank diagram

I guess that by disabling the TLS configuration, we are communicating all the ports to expect a non-secure connection, though because of the main redirection caused by the domain, we are reaching those resources using an HTTPS connection (which, in my current guess, is being neglected) then the 404 pages.

  • it it's worth commenting that the TCP connection does succeed as well as port-forwarding the resources to localhost:8080
  • Also, the redirection still happens even after removing the http -> https redirection config for :80

I would like to know if it is possible to create a direct route from <domain>/auth to keycloak-headless directly. I also tried creating a new port and rule for it to avoid TLS completely, but the redirection from still holds:

From a basic deployment with TLS enabled:

Request URL: https://github-actions.qhub.dev/auth/
Request Method: GET
Status Code: 200
Remote Address: 172.18.1.100:443
Referrer Policy: strict-origin-when-cross-origin
cache-control: no-cache, must-revalidate, no-transform, no-store
content-length: 4080
content-security-policy: frame-src 'self'; frame-ancestors 'self'; object-src 'none';
content-type: text/html;charset=utf-8
date: Wed, 31 Aug 2022 23:45:41 GMT
referrer-policy: no-referrer
strict-transport-security: max-age=31536000; includeSubDomains
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-robots-tag: none
x-xss-protection: 1; mode=block
:authority: github-actions.qhub.dev
:method: GET
:path: /auth/
:scheme: https

after removing redirection and defining a new port/entrypoint for the service:

Request URL: https://github-actions.qhub.dev/auth
Request Method: GET
Status Code: 404 
Remote Address: 172.18.1.100:443
Referrer Policy: strict-origin-when-cross-origin
content-length: 19
content-type: text/plain; charset=utf-8
date: Thu, 01 Sep 2022 02:26:31 GMT
x-content-type-options: nosniff
:authority: github-actions.qhub.dev
:method: GET
:path: /auth
:scheme: https

Copy link
Member

@costrouc costrouc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that your PR #1421 is good. I don't have any way to test this but I believe it will work for the client (I am confident that this PR will not break anything existing). Since in their case the load-balancer automatically enables a certificate (for our tests we don't have a good way to do this). I'm happy with approving your PR and merging it. I think if we go down the road of making the keycloak connection without https. We should likely just include in the docs what this setting is for and that is assumes that the user is responsible for providing tls via the loadbalancer e.g. https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-tls-listener.html.

@@ -14,6 +14,7 @@ class CertificateEnum(str, enum.Enum):
letsencrypt = "lets-encrypt"
selfsigned = "self-signed"
existing = "existing"
disabled = "disabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you've integrated this into the existing configuration options.

Copy link
Contributor

@viniciusdc viniciusdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, all the available options are working as expected, and we don't have any issues with the prior behavior for self-assigned nor let-encrypt.

When using disabled we get an error with openID connection of Keycloak, but this is to be expected due the the way the routing currently works (I guess):

[terraform]: │ Error: error initializing keycloak provider
[terraform]: │ 
[terraform]: │   with provider["registry.terraform.io/mrparkers/keycloak"],
[terraform]: │   on providers.tf line 1, in provider "keycloak":
[terraform]: │    1: provider "keycloak" {
[terraform]: │ 
[terraform]: │ failed to perform initial login to Keycloak: error sending POST request to
[terraform]: │ https://github-actions.qhub.dev/auth/realms/master/protocol/openid-connect/token:
[terraform]: │ 404 Not Found

@costrouc costrouc merged commit 2654b47 into main Sep 1, 2022
@costrouc costrouc deleted the fix_1418 branch September 1, 2022 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: user experience 👩🏻‍💻 needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug type: enhancement 💅🏼 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update terraform scripts to improve deployment
3 participants