-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create routes on OpenShift and ingresses on Kubernetes #50
Conversation
} | ||
} | ||
} | ||
return "", fmt.Errorf("could not get URL for endpoint %s", endpoint.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense -- where are you running the operator? It may be a route/ingress exposure issue (i.e. it takes longer than expected). I don't see this on crc
.
I'll add this logic to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this a bit more, I'm not sure what's causing the issue; we only return that error when we cannot find any ingress with the appropriate label. If the url field is empty, we just return empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it on local crc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this a bit more, I'm not sure what's causing the issue; we only return that error when we cannot find any ingress with the appropriate label
Exactly. Now I see the same error but only once
So, instead of failing reconciling loop it would be better to make endpoint as problematic without host and retry later. I think it worths the dedicated PR because it needs introducing some phases,conditions for endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sleshchenko I think the error you describe initially (could not get URL...
) is a failure; we should never reach that point in the reconcile and not have routes/ingresses available. The previous step, prior to attempting this matching, is to make sure routes/ingresses are in sync between the cluster and spec. Not being able to match the two means that we have an endpoint (which defines the spec), that does not have a route/ingress associated with it. Until we have a concrete case where this can occur, it should mark the workspace as failed.
The latter failure, already exists
, is a kind of familiar issue -- I haven't gotten around to implementing it, but we shouldn't log that error and just continue. It just means the state of the cluster has changed since we started our reconcile loop. The correct solution there would be to just requeue if errors.IsConflict(err)
or errors.IsAlreadyExists(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Now I see the same error but only once
This is a different error message, to be clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different error message, to be clear.
My fault. But seems there were two tries to create ingress and it can indicate in the issues in the reconciling loop, because I did not create such a route but hand.
Will test more precisely. BTW it's not a blocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's already exists
issue and seems routes are not created immediately
RetryAfter 200 helps me to solve an issue. 100 is not enough...
routesInSync, clusterRoutes, err := r.syncRoutes(instance, routes)
if err != nil || !routesInSync {
reqLogger.Info("Routes not in sync")
return reconcile.Result{ RequeueAfter: 100 * time.Millisecond}, err
}
But not sure if it's a really good solution since this limit depends on the infrastructure, I'm OK with leaving this error propagated constantly for time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing stopping us implementing this is that it's going to be a fair bit of boilerplate error checking; it might also be improved by eventually filtering reconciles (by determining if changes are necessary)
Added:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 tested on crc and works pretty fine except one error logged during workspace start.
Left comments can be considered to be addressed/discussed in dedicated issues scopes.
@@ -252,7 +260,7 @@ func fillOpenShiftRouteSuffixIfNecessary(nonCachedClient client.Client, configMa | |||
host := testRoute.Spec.Host | |||
if host != "" { | |||
prefixToRemove := "che-workspace-controller-test-route-" + configMap.Namespace + "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may fail with default routing host generation is changes... So, maybe we should detect hostname only when it's missing in the configmap?
Also, che-workspace-controller-test-route-che-workspace-controller
is near to drain 64 chars limit ) Consider making test route name shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's legacy functionality I didn't even know existed until I worked on this PR; I'm inclined to remove it entirely if there's doubts as to its usefulness. Thus far, the concerns are
- It depends on deployed namespace being
che-workspace-controller
, will fail if we change that - Will fail if how default hostnames are computed is changed
- Only works on OpenShift
This kind of points towards "we shouldn't support this at the moment". Executing this only when the entry in the cm is blank means that we just move all those failure cases down the line, except it makes it look like we're changing config requirements rather than the cluster is changing how it resolves things.
Regarding
Also, che-workspace-controller-test-route-che-workspace-controller is near to drain 64 chars limit ) Consider making test route name shorter.
This is actually not a problem; OpenShift will generously generate an invalid hostname and only tell you its a problem later :) :
$ oc get route che-plugin-registry-abcdefghijklmnopqrstuvwxyz -o yaml | grep host:
host: che-plugin-registry-abcdefghijklmnopqrstuvwxyz-che-workspace-controller.apps-crc.testing
message: 'host name validation errors: spec.host: Invalid value: "che-plugin-registry-abcdefghijklmnopqrstuvwxyz-che-workspace-controller.apps-crc.testing":
host: che-plugin-registry-abcdefghijklmnopqrstuvwxyz-che-workspace-controller.apps-crc.testing
However, it might make sense to try and use .status.ingress[].routerCanonicalHostname
:
$ oc get route che-plugin-registry-abcdefghijklmnopqrstuvwxyz -o yaml | yq '.status.ingress[].routerCanonicalHostname'
"apps-crc.testing"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not a problem; OpenShift will generously generate an invalid hostname and only tell you its a problem later :) :
:-D It's true and it may be strange but it helps use in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into using the status field, but it's not populated immediately. For now I'm leaving the functionality in place, but we should test its reliability in the future and potentially remove/improve it.
1f21471
to
a3b95fe
Compare
Change solvers behavior in workspaceroutings controller to always create routes when running on OpenShift and ingresses otherwise, regardless of routingClass.
Add Validate function to controller config to be used during set-up. Currently only fails if default routing class is openshift-oauth on a kubernetes cluster Signed-off-by: Angel Misevski <amisevsk@redhat.com>
- Use preparing state when not all ingresses/routes have URL set on cluster - Improve error message when we can't get a URL for an endpoint Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
a3b95fe
to
1679b6a
Compare
What does this PR do?
Change WorkspaceRoutings controller to always create routes on OpenShift and ingresses on Kubernetes.
One of the goals in this PR was to eliminate the need for
ingress.global.domain
when running on OpenShift by not specifying hostnames when creating routes. However, OpenShift's automatically-generated route hostnames are<route-name>-<namespace>.<routing-suffix>
, which means they're frequently invalid (if e.g. deployed in namespaceche-workspace-controller
. As a result, I've renamedingress.global.domain
tocluster.routing.suffix
and added setting the value via the makefile.Note: the controller does contain logic to automatically set routing suffix on OpenShift; however, I have had it fail/fall out of sync in the past so the separate option is still useful.
Is it tested? How?
Tested on
crc
with the usual matrix of settings. Have not currently tested on minikube but will before merge.