Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change from kubernetes-alpha provider to kubernetes provier #978

Closed
tylerpotts opened this issue Dec 17, 2021 · 13 comments
Closed

Change from kubernetes-alpha provider to kubernetes provier #978

tylerpotts opened this issue Dec 17, 2021 · 13 comments
Assignees
Labels
needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug provider: Azure type: bug 🐛 Something isn't working

Comments

@tylerpotts
Copy link
Contributor

GCP: Keycloak has been verified working for both Github and Auth0

Azure: When deploying from the 0c21c8c commit in main I experienced the following error on deployment:

module.kubernetes-keycloak-config.keycloak_openid_user_property_protocol_mapper.user_property_mapper: Refreshing state... [id=627e9160-c15e-4407-8807-b086ca2604b9]
╷
│ Error: Failed to determine GroupVersionResource for manifest
│ 
│   with module.qhub.module.kubernetes-dask-gateway.kubernetes_manifest.main,
│   on modules/kubernetes/services/dask-gateway/crds.tf line 1, in resource "kubernetes_manifest" "main":
│    1: resource "kubernetes_manifest" "main" {
│ 
│ cannot select exact GV from REST mapper

It seems that this may be related to this issue but I haven't had a chance to investigate further.

@danlester
Copy link
Contributor

I have also been able to reproduce this error on Azure.

BTW it's not Keycloak-related - that's the last successful 'refresh'. The problem is somewhere in dask gateway.

If I cut out the dask gateway stuff from QHub then it deploys successfully.

The "GroupVersionResource" problem seems familiar... I'll see if I can work out where I've seen it before!

@danlester
Copy link
Contributor

@shannon saw it here private link

It's also came up here.

@danlester
Copy link
Contributor

Possibly related to Kubernetes versions (which have recently been updated for Azure).

We might need to be clearer on which versions we are targetting.

@danlester
Copy link
Contributor

danlester commented Dec 18, 2021

The original version was 1.22.2 (on which it failed).

I'll try on Azure's Central US default version (1.20.9) to see if that works. Plus maybe 1.21.7 (default) on East US.

@danlester
Copy link
Contributor

danlester commented Dec 19, 2021

Azure's Central US default version (1.20.9) gives similar but different errors:
azureerrors.txt

@danlester
Copy link
Contributor

Kubernetes 1.21.7 (default) on East US works fine...

@danlester
Copy link
Contributor

danlester commented Dec 19, 2021

And the integration test succeeded 12 days ago on:
region: Central US
kubernetes_version: 1.19.11

Of course these changes to versions have come about now that we are querying Azure to get the 'best' Kubernetes version.

@danlester danlester changed the title Keycloak testing Kubernetes versions causing problems in deploying to Azure Dec 19, 2021
@trallard trallard added provider: Azure type: bug 🐛 Something isn't working needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug labels Dec 22, 2021
@trallard trallard added this to the Release v0.4.0 milestone Jan 4, 2022
@costrouc costrouc changed the title Kubernetes versions causing problems in deploying to Azure Change from kubernetes-alpha provider to kubernetes provier Jan 4, 2022
@costrouc
Copy link
Member

costrouc commented Jan 4, 2022

For history the reason we adopted the kubernetes alpha provider is it had support for custom resource definitions. Since about 6 ish months ago terraform kubernetes provider now supports crds. Will also need to restrict the version of kubernetes provider to be at least a certain version.

@costrouc
Copy link
Member

costrouc commented Jan 4, 2022

https://registry.terraform.io/providers/hashicorp/kubernetes/latest

@iameskild
Copy link
Member

@tylerpotts and I were able to get qhub to deploy successfully on minikube by removing most of the fields under tls in the ingress/crds.tf::resource "kubernetes_manifest" "ingress_route" file, see this commit.

The next step is to test these changes by deploying qhub on Azure.

@danlester, @costrouc do you see any unintended consequences of these changes?

@danlester
Copy link
Contributor

It's helpful to know this worked, but I don't think we should move forward with this until we understand why it helped.

It's possible the CRDs may need updating to match the version of Traefik that gets deployed. I'm not too sure where they were obtained in the first place, but one way would be to take the Traefik YAML definitions and run them through something like tfk8s to convert from native Kubernetes YAML to Terraform's HCL language.

Or we could borrow from this repo - or just use that module directly in our code perhaps.

Anyway, I think there was quite a serious error that Tyler showed me, and it would be good to know why it was happening. It is possible that Traefik itself contributed to the problem, but in fact I think that the problem is really isolated from Traefik - this is just a question of CRDs in Kubernetes, and there was a problem setting the CRD record.

I have tried to produce a basic test case here but I don't get the error yet.

You would set up minikube then run terraform init then terraform apply. But it needs to be a multistage apply. After the first apply (which should set the definition of IngressRoute itself), comment out count = 0 on line 216 and then terraform apply again to set the specific IngressRoute record.

@danlester
Copy link
Contributor

What was var.external-url when you were trying to deploy with Minikube? (i.e. just the domain of your QHub site as specified in qhub-config.yaml)

@iameskild iameskild mentioned this issue Jan 27, 2022
92 tasks
@costrouc
Copy link
Member

costrouc commented Feb 4, 2022

Resolved with #1003. Kubernetes-alpha is no longer being used. The primary complication was the crds and how they must be applied in a separate step and targets were making this complicated.

@costrouc costrouc closed this as completed Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug provider: Azure type: bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants