-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingress: Add mutable-publishing-scope enhancement #876
ingress: Add mutable-publishing-scope enhancement #876
Conversation
7f91fb8
to
5e43cc8
Compare
Last push abbreviates the status condition message in the example ingresscontroller to satisfy the lint check. |
load-balancer between internal and external without deleting and recreating the | ||
load balancer, by setting an annotation on the Kubernetes Service object. On | ||
these platforms, the operator merely sets the annotation to the desired scope, | ||
and the operation of changing it scope is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the operation of changing it scope is complete. | |
and the operation of changing its scope is complete. |
reason: MinimumReplicasAvailable | ||
status: "True" | ||
type: Available | ||
- message: Have load balancer with scope "External", want load balancer with scope "Internal". You can delete the openshift-ingress/router-default service to proceed [...]. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value [...]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is really long and probably too much information. Could we get away with something like: "Scope changed from ___ to ___, requires delete of service ___"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The detailed wording is the result of some discussion here: openshift/cluster-ingress-operator#582 (comment)
The explicit and detailed message is intended to prevent mistakes and reduce support cases. I can try to make this more concise, but I think it is important to communicate the options and their consequences because otherwise users will break their clusters' ingress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change it to not use the "have" and "want" I think that's much better. The rest is ok.
- message: Have load balancer with scope "External", want load balancer with scope "Internal". You can delete the openshift-ingress/router-default service to proceed [...]. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value [...]. | |
- message: You changed load balancer scope from "External", to "Internal" and need to adjust the service. You can delete the openshift-ingress/router-default service to proceed [...]. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value [...]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded the message to avoid the "Have ___, want ___" phrasing.
to its previous value. That means the user must take one of two actions: | ||
|
||
1. Delete the Service referenced in the status condition. | ||
2. Revert the change to the IngressController. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of #2 as a side note, not an option we need to emphasize here or in user messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we introduced an earlier version of this feature (which we since reverted) that automatically deleted and recreated the service, we had at least one user who changed the scope and then realized it would require deprovisioning and recreating the LB and wanted to revert the change. The option to back out the change is closely tied to the motivation for this enhancement; I'll revise the motivation section to call this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
|
||
In addition to deleting the IngressController explicitly, it is possible to | ||
annotate it with the newly defined | ||
`ingress.operator.openshift.io/auto-delete-load-balancer` annotation. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different than the annotation mentioned earlier in Line 59? If not, can you add the name of the annotation the first place it appears? If so, can you name the one on Line 59?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a different annotation. I'll try to clarify that on line 59.
it was before the user first changed the value of | ||
`spec.endpointPublishingStrategy.loadBalancer.scope`. | ||
|
||
In addition to deleting the IngressController explicitly, it is possible to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to deleting the IngressController explicitly, it is possible to | |
In addition to deleting the Service explicitly, it is possible to |
operator observes that this annotation is set and that its | ||
`spec.endpointPublishingStrategy.loadBalancer.scope` field has been changed, the | ||
operator automatically deletes the Service if needed to complete a scope-change | ||
operation. This purpose of this annotation is to simplify automation: A tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation is sort of awkward. Seems like it would be better specifiied as a flag on the oc patch
operation, something like oc patch ... --auto-update
or --propagate
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the oc patch
command have custom logic to add the ingress.operator.openshift.io/auto-delete-load-balancer
annotation when the user provided the --auto-update
command and an IngressController? I'd rather avoid adding IngressController-specific or (OpenShift-specific) logic to oc patch
, and anyway, the annotation is intended for automation such as the cloud-ingress-operator's publishingstrategy controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
operation. This purpose of this annotation is to simplify automation: A tool | ||
can simultaneously update `spec.endpointPublishingStrategy.loadBalancer.scope` | ||
and set the annotation to instruct the operator to perform the operation | ||
automatically. This annotation is not intended for end-users to use directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not intended for end-users, then maybe it should not be an annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an alternative in mind? The annotation was requested by Service Delivery; we need something that cloud-ingress-operator could use.
Again, the cluster administrator can check the IngressController's status | ||
conditions and may need to delete a Service to complete the change in scope. | ||
|
||
#### As a cluster administrator, I want to cancel a change to scope before it is completed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this use case will be of little value, and though it is easy to implement and describe, it somewhat muddies an otherwise straightforward documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important because we've had support cases where users changed the scope on the IngressController and then realized that completing the change would cause the service load-balancer's IP address to change and wanted to back out.
|
||
```shell | ||
oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}' | ||
oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer= | |
oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer=true |
Or some value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you need to add the annotation first, before the change is made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation value doesn't actually matter.
Would you need to add the annotation first, before the change is made?
No, so automation could either add the annotation when creating the IngressController or add it when updating the scope. I'll try to clarify this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dangling =
still looks odd but it's not a show-stopper.
field. | ||
|
||
Additionally, if the operator is running on Azure or GCP, with this enhancement, | ||
the controller updates the annotations on the Kubernetes Service object for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which annotations, just some pre-existing ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right; I'll add clarify that point.
case once the user deletes the Service), the operator recreates the service with | ||
the scope indicated in `status.endpointPublishingStrategy.loadBalancer.scope`. | ||
|
||
Crucially, by default, the operator *never* deletes the Service as long as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crucially, by default, the operator *never* deletes the Service as long as the | |
Crucially, by default, the operator ***never*** deletes the Service as long as the |
the disruptive action of deleting the Service in order to complete the | ||
operation. The only exception is if the | ||
`ingress.operator.openshift.io/auto-delete-load-balancer` annotation is set on | ||
the IngressController, in which case the operator *does* delete the Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the IngressController, in which case the operator *does* delete the Service. | |
the IngressController, in which case the operator **does** delete the Service. |
|
||
If the user changes the scope and then downgrades to a version without this | ||
enhancement while the operator is reporting `Progressing=True`, the downgraded | ||
operator does not take any action (other than possibly setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cleaner if it were guaranteed to properly update the Progessing status in this case (reason and message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some language to clarify that the downgraded operator will continue updating status conditions. Does that make things clearer?
5e43cc8
to
36ab199
Compare
Updated per @candita's comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one question around the wording of a user story but aside from that, this proposal looks reasonable and will likely be the most customer amenable option.
internal (private) as follows: | ||
|
||
```shell | ||
oc -n openshift-ingress-operator patch ingresscontrollers/private --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if its me or the wording but this story seems a little confusing. I readt his has have: internal IngressController, want external IngressController
. If that's accurate, shouldn't the statement be: ...change the scope of an IngressController to be external (public) ...
along with an update to External
in the patch command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! You're right. I've fixed this in the latest push.
/lgtm |
36ab199
to
1d7ffea
Compare
1d7ffea
to
8809e30
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few comments/observations to clear up or clarify.
reason: MinimumReplicasAvailable | ||
status: "True" | ||
type: Available | ||
- message: The IngressController scope was changed from "External" to "Internal". To put this change into effect, you must delete the openshift-ingress/router-default service; the service load-balancer will then be deprovisioned and a new one created. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also emit oc commands that you can simply cut & paste? I'm assuming we have enough context (e.g., namespace and names, et al). Might be helpful should you want to restore "its previous value" too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can emit oc commands, but the patch command alone makes the message too long to pass the markdownlint CI job. I'll try using double quotes with a backslash-escaped newline to see whether the linter allows that. I believe that it'll still be valid yaml, although it won't be exactly what oc get ... -o yaml
would actually output.
the cluster administrator needs to delete the Service, like so: | ||
|
||
```shell | ||
oc -n openshift-ingress delete services/router-default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just stating that if you went from External=>Internal and then issued the delete the operator tracks the the scope you wanted and it will come back as Internal - correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More or less. The main point that I'm trying to convey is that the user must explicitly perform the delete operation if one is needed, but to your question, yes, the operator will recreate the service with the scope specified in the ingresscontroller. Is the current phrasing clear, or does it need some wordsmithing?
annotation](https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer) | ||
on the Kubernetes Service object for the IngressController. Kubernetes's | ||
service controller and cloud-provider implementation perform the necessary | ||
changes to change the service load-balancer's scope, and no further action is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this operation fails where would it be reported? Although the Kubernetes service controller makes the changes I'd like to know where we diagnose failures should that fail in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of failure, the service controller should emit an event with reason "SyncLoadBalancerFailed", which the ingress operator would observe and reflect in the ingresscontroller's status conditions as LoadBalancerReady=False
and ultimately Available=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this information under the new "Support Procedures" section.
4. If the operator is running on AWS and IBM Cloud, verify that the IngressController reports `Progressing=True`. | ||
5. If the operator is running on Azure or GCP, verify that the Service is annotated for internal scope. | ||
6. Set the IngressController's endpoint publishing strategy's scope to "External". | ||
7. Verify that the IngressController reports `Progressing=False`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also feasible to have test setups that make progress but ultimately fail so that Progressing=False
does not occur? I'm asking for both the happy path and the unhappy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progressing=True
indicates that the operator is in an intermediate state: the user has specified one scope, and the service currently has a different scope. Once this discrepancy is resolved (by deleting the service or by updating the ingresscontroller to have the same scope as the service), then the operator returns to its baseline of reporting Progressing=False
. For reference, here is the proof-of-concept implementation of the status reporting: openshift/cluster-ingress-operator@7daeb6f#diff-56b131774a926e7a0e30a9be7dac7bf5c5cec11ff709aa6604cecc9ef117ede2R547-R584. If the service controller raises an error while updating or recreating the service load-balancer, then it will report an event that the operator will observe and report as described in my previous comment. Is that sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new "Operational Aspects of API Extensions" section should clarify this.
version of OpenShift without this enhancement and upgrades to a version of | ||
OpenShift with this enhancement, the operator annotates the Service or sets | ||
`Progressing=True` on the IngressController as appropriate. Thus the operator | ||
may effectuate a latent scope change, but it does not delete the Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/Why will it not delete the service in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The how and why are the same in the upgrade case as in the normal case: If the ingresscontroller and the service specify different scopes and we don't know that the platform supports scope changes in situ, then the operator does not delete the service because doing so could disrupt traffic. The behavior on upgrade or downgrade follows from and is consistent with the logic described in the proposal and implementation details. Should I add something to that effect to the enhancement text? I sought to adhere to the enhancement format without being too verbose or repetitive.
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
8809e30
to
c260fc8
Compare
The CI job failed with no obvious reason:
/test markdownlint |
Wait, those are obvious reasons. |
c260fc8
to
51e0f70
Compare
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: frobware The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This enhancement defines an approach for allowing users to modify the scope of a service load-balancer for an IngressController that uses the LoadBalancerService endpoint publishing strategy type.