Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create routes on OpenShift and ingresses on Kubernetes #50

Merged
merged 4 commits into from
Apr 17, 2020

Conversation

amisevsk
Copy link
Collaborator

@amisevsk amisevsk commented Apr 9, 2020

What does this PR do?

Change WorkspaceRoutings controller to always create routes on OpenShift and ingresses on Kubernetes.

One of the goals in this PR was to eliminate the need for ingress.global.domain when running on OpenShift by not specifying hostnames when creating routes. However, OpenShift's automatically-generated route hostnames are <route-name>-<namespace>.<routing-suffix>, which means they're frequently invalid (if e.g. deployed in namespace che-workspace-controller. As a result, I've renamed ingress.global.domain to cluster.routing.suffix and added setting the value via the makefile.

Note: the controller does contain logic to automatically set routing suffix on OpenShift; however, I have had it fail/fall out of sync in the past so the separate option is still useful.

Is it tested? How?

Tested on crc with the usual matrix of settings. Have not currently tested on minikube but will before merge.

}
}
}
return "", fmt.Errorf("could not get URL for endpoint %s", endpoint.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really an error? While workspace is starting I got quite a lot of spam
Screenshot_20200414_152116

Are we able to return only that endpoints which are already exposed and schedule requeuing?
other (without ingress/route) could be skipped or returned with status - Exposing (not sure if we have endpoint status now).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- where are you running the operator? It may be a route/ingress exposure issue (i.e. it takes longer than expected). I don't see this on crc.

I'll add this logic to the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this a bit more, I'm not sure what's causing the issue; we only return that error when we cannot find any ingress with the appropriate label. If the url field is empty, we just return empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it on local crc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this a bit more, I'm not sure what's causing the issue; we only return that error when we cannot find any ingress with the appropriate label

Exactly. Now I see the same error but only once
route-not-found

So, instead of failing reconciling loop it would be better to make endpoint as problematic without host and retry later. I think it worths the dedicated PR because it needs introducing some phases,conditions for endpoints.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sleshchenko I think the error you describe initially (could not get URL...) is a failure; we should never reach that point in the reconcile and not have routes/ingresses available. The previous step, prior to attempting this matching, is to make sure routes/ingresses are in sync between the cluster and spec. Not being able to match the two means that we have an endpoint (which defines the spec), that does not have a route/ingress associated with it. Until we have a concrete case where this can occur, it should mark the workspace as failed.

The latter failure, already exists, is a kind of familiar issue -- I haven't gotten around to implementing it, but we shouldn't log that error and just continue. It just means the state of the cluster has changed since we started our reconcile loop. The correct solution there would be to just requeue if errors.IsConflict(err) or errors.IsAlreadyExists(err)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Now I see the same error but only once

This is a different error message, to be clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different error message, to be clear.

My fault. But seems there were two tries to create ingress and it can indicate in the issues in the reconciling loop, because I did not create such a route but hand.

Will test more precisely. BTW it's not a blocker

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's already exists issue and seems routes are not created immediately
Screenshot_20200416_151950

RetryAfter 200 helps me to solve an issue. 100 is not enough...

	routesInSync, clusterRoutes, err := r.syncRoutes(instance, routes)
	if err != nil || !routesInSync {
		reqLogger.Info("Routes not in sync")
		return reconcile.Result{ RequeueAfter: 100 * time.Millisecond}, err
	}

But not sure if it's a really good solution since this limit depends on the infrastructure, I'm OK with leaving this error propagated constantly for time being.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing stopping us implementing this is that it's going to be a fair bit of boilerplate error checking; it might also be improved by eventually filtering reconciles (by determining if changes are necessary)

@amisevsk
Copy link
Collaborator Author

Added:

  • Use annotations for storing endpoint name instead of labels, use original endpoint name in annotation instead of DNS-sanitized version
  • Fix makefile not resetting registry ingress hostname when deploying to k8s
  • Don't try to sync routes when we're running on k8s, as we cannot list Routes
  • Add some validation to controller config: fail startup if we're not in OpenShift but use openshift-oauth as default routing
  • Set workspace routing phase to "Preparing" when we can't get route/ingress URLs.

Copy link
Member

@sleshchenko sleshchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 tested on crc and works pretty fine except one error logged during workspace start.

Left comments can be considered to be addressed/discussed in dedicated issues scopes.

Makefile Show resolved Hide resolved
@@ -252,7 +260,7 @@ func fillOpenShiftRouteSuffixIfNecessary(nonCachedClient client.Client, configMa
host := testRoute.Spec.Host
if host != "" {
prefixToRemove := "che-workspace-controller-test-route-" + configMap.Namespace + "."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may fail with default routing host generation is changes... So, maybe we should detect hostname only when it's missing in the configmap?

Also, che-workspace-controller-test-route-che-workspace-controller is near to drain 64 chars limit ) Consider making test route name shorter.

Copy link
Collaborator Author

@amisevsk amisevsk Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's legacy functionality I didn't even know existed until I worked on this PR; I'm inclined to remove it entirely if there's doubts as to its usefulness. Thus far, the concerns are

  • It depends on deployed namespace being che-workspace-controller, will fail if we change that
  • Will fail if how default hostnames are computed is changed
  • Only works on OpenShift

This kind of points towards "we shouldn't support this at the moment". Executing this only when the entry in the cm is blank means that we just move all those failure cases down the line, except it makes it look like we're changing config requirements rather than the cluster is changing how it resolves things.

Regarding

Also, che-workspace-controller-test-route-che-workspace-controller is near to drain 64 chars limit ) Consider making test route name shorter.

This is actually not a problem; OpenShift will generously generate an invalid hostname and only tell you its a problem later :) :

$ oc get route che-plugin-registry-abcdefghijklmnopqrstuvwxyz -o yaml | grep host:
  host: che-plugin-registry-abcdefghijklmnopqrstuvwxyz-che-workspace-controller.apps-crc.testing
      message: 'host name validation errors: spec.host: Invalid value: "che-plugin-registry-abcdefghijklmnopqrstuvwxyz-che-workspace-controller.apps-crc.testing":
    host: che-plugin-registry-abcdefghijklmnopqrstuvwxyz-che-workspace-controller.apps-crc.testing

However, it might make sense to try and use .status.ingress[].routerCanonicalHostname:

$ oc get route che-plugin-registry-abcdefghijklmnopqrstuvwxyz -o yaml | yq '.status.ingress[].routerCanonicalHostname'
"apps-crc.testing"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not a problem; OpenShift will generously generate an invalid hostname and only tell you its a problem later :) :

:-D It's true and it may be strange but it helps use in this case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into using the status field, but it's not populated immediately. For now I'm leaving the functionality in place, but we should test its reliability in the future and potentially remove/improve it.

@amisevsk amisevsk force-pushed the use-routes-on-openshift branch 2 times, most recently from 1f21471 to a3b95fe Compare April 15, 2020 19:01
Change solvers behavior in workspaceroutings controller to always create
routes when running on OpenShift and ingresses otherwise, regardless of
routingClass.
Add Validate function to controller config to be used during set-up.
Currently only fails if default routing class is openshift-oauth on a
kubernetes cluster

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
- Use preparing state when not all ingresses/routes have URL set on
cluster
- Improve error message when we can't get a URL for an endpoint

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants