Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingress: Add mutable-publishing-scope enhancement #876

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Aug 23, 2021

This enhancement defines an approach for allowing users to modify the scope of a service load-balancer for an IngressController that uses the LoadBalancerService endpoint publishing strategy type.

@openshift-ci openshift-ci bot requested review from juliakreger and sdodson August 23, 2021 14:19
@Miciah Miciah force-pushed the ingress-add-mutable-publishing-scope-enhancement branch from 7f91fb8 to 5e43cc8 Compare September 13, 2021 17:50
@Miciah
Copy link
Contributor Author

Miciah commented Sep 13, 2021

Last push abbreviates the status condition message in the example ingresscontroller to satisfy the lint check.

load-balancer between internal and external without deleting and recreating the
load balancer, by setting an annotation on the Kubernetes Service object. On
these platforms, the operator merely sets the annotation to the desired scope,
and the operation of changing it scope is complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and the operation of changing it scope is complete.
and the operation of changing its scope is complete.

reason: MinimumReplicasAvailable
status: "True"
type: Available
- message: Have load balancer with scope "External", want load balancer with scope "Internal". You can delete the openshift-ingress/router-default service to proceed [...]. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value [...].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is really long and probably too much information. Could we get away with something like: "Scope changed from ___ to ___, requires delete of service ___"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detailed wording is the result of some discussion here: openshift/cluster-ingress-operator#582 (comment)

The explicit and detailed message is intended to prevent mistakes and reduce support cases. I can try to make this more concise, but I think it is important to communicate the options and their consequences because otherwise users will break their clusters' ingress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change it to not use the "have" and "want" I think that's much better. The rest is ok.

Suggested change
- message: Have load balancer with scope "External", want load balancer with scope "Internal". You can delete the openshift-ingress/router-default service to proceed [...]. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value [...].
- message: You changed load balancer scope from "External", to "Internal" and need to adjust the service. You can delete the openshift-ingress/router-default service to proceed [...]. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value [...].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded the message to avoid the "Have ___, want ___" phrasing.

to its previous value. That means the user must take one of two actions:

1. Delete the Service referenced in the status condition.
2. Revert the change to the IngressController.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of #2 as a side note, not an option we need to emphasize here or in user messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we introduced an earlier version of this feature (which we since reverted) that automatically deleted and recreated the service, we had at least one user who changed the scope and then realized it would require deprovisioning and recreating the LB and wanted to revert the change. The option to back out the change is closely tied to the motivation for this enhancement; I'll revise the motivation section to call this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


In addition to deleting the IngressController explicitly, it is possible to
annotate it with the newly defined
`ingress.operator.openshift.io/auto-delete-load-balancer` annotation. If the
Copy link
Contributor

@candita candita Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different than the annotation mentioned earlier in Line 59? If not, can you add the name of the annotation the first place it appears? If so, can you name the one on Line 59?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a different annotation. I'll try to clarify that on line 59.

it was before the user first changed the value of
`spec.endpointPublishingStrategy.loadBalancer.scope`.

In addition to deleting the IngressController explicitly, it is possible to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition to deleting the IngressController explicitly, it is possible to
In addition to deleting the Service explicitly, it is possible to

operator observes that this annotation is set and that its
`spec.endpointPublishingStrategy.loadBalancer.scope` field has been changed, the
operator automatically deletes the Service if needed to complete a scope-change
operation. This purpose of this annotation is to simplify automation: A tool
Copy link
Contributor

@candita candita Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation is sort of awkward. Seems like it would be better specifiied as a flag on the oc patch operation, something like oc patch ... --auto-update or --propagate or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the oc patch command have custom logic to add the ingress.operator.openshift.io/auto-delete-load-balancer annotation when the user provided the --auto-update command and an IngressController? I'd rather avoid adding IngressController-specific or (OpenShift-specific) logic to oc patch, and anyway, the annotation is intended for automation such as the cloud-ingress-operator's publishingstrategy controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

operation. This purpose of this annotation is to simplify automation: A tool
can simultaneously update `spec.endpointPublishingStrategy.loadBalancer.scope`
and set the annotation to instruct the operator to perform the operation
automatically. This annotation is not intended for end-users to use directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not intended for end-users, then maybe it should not be an annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an alternative in mind? The annotation was requested by Service Delivery; we need something that cloud-ingress-operator could use.

Again, the cluster administrator can check the IngressController's status
conditions and may need to delete a Service to complete the change in scope.

#### As a cluster administrator, I want to cancel a change to scope before it is completed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this use case will be of little value, and though it is easy to implement and describe, it somewhat muddies an otherwise straightforward documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important because we've had support cases where users changed the scope on the IngressController and then realized that completing the change would cause the service load-balancer's IP address to change and wanted to back out.


```shell
oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}'
oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer=
oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer=true

Or some value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you need to add the annotation first, before the change is made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation value doesn't actually matter.

Would you need to add the annotation first, before the change is made?

No, so automation could either add the annotation when creating the IngressController or add it when updating the scope. I'll try to clarify this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dangling = still looks odd but it's not a show-stopper.

field.

Additionally, if the operator is running on Azure or GCP, with this enhancement,
the controller updates the annotations on the Kubernetes Service object for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which annotations, just some pre-existing ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; I'll add clarify that point.

case once the user deletes the Service), the operator recreates the service with
the scope indicated in `status.endpointPublishingStrategy.loadBalancer.scope`.

Crucially, by default, the operator *never* deletes the Service as long as the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Crucially, by default, the operator *never* deletes the Service as long as the
Crucially, by default, the operator ***never*** deletes the Service as long as the

the disruptive action of deleting the Service in order to complete the
operation. The only exception is if the
`ingress.operator.openshift.io/auto-delete-load-balancer` annotation is set on
the IngressController, in which case the operator *does* delete the Service.
Copy link
Contributor

@candita candita Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the IngressController, in which case the operator *does* delete the Service.
the IngressController, in which case the operator **does** delete the Service.


If the user changes the scope and then downgrades to a version without this
enhancement while the operator is reporting `Progressing=True`, the downgraded
operator does not take any action (other than possibly setting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner if it were guaranteed to properly update the Progessing status in this case (reason and message).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some language to clarify that the downgraded operator will continue updating status conditions. Does that make things clearer?

@Miciah Miciah force-pushed the ingress-add-mutable-publishing-scope-enhancement branch from 5e43cc8 to 36ab199 Compare October 4, 2021 12:56
@Miciah
Copy link
Contributor Author

Miciah commented Oct 4, 2021

Updated per @candita's comments.

Copy link
Contributor

@brandisher brandisher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one question around the wording of a user story but aside from that, this proposal looks reasonable and will likely be the most customer amenable option.

internal (private) as follows:

```shell
oc -n openshift-ingress-operator patch ingresscontrollers/private --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if its me or the wording but this story seems a little confusing. I readt his has have: internal IngressController, want external IngressController. If that's accurate, shouldn't the statement be: ...change the scope of an IngressController to be external (public) ... along with an update to External in the patch command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You're right. I've fixed this in the latest push.

@candita
Copy link
Contributor

candita commented Oct 18, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@Miciah Miciah force-pushed the ingress-add-mutable-publishing-scope-enhancement branch from 36ab199 to 1d7ffea Compare October 18, 2021 23:09
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@Miciah Miciah force-pushed the ingress-add-mutable-publishing-scope-enhancement branch from 1d7ffea to 8809e30 Compare October 18, 2021 23:15
@candita
Copy link
Contributor

candita commented Oct 19, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2021
Copy link
Contributor

@frobware frobware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few comments/observations to clear up or clarify.

reason: MinimumReplicasAvailable
status: "True"
type: Available
- message: The IngressController scope was changed from "External" to "Internal". To put this change into effect, you must delete the openshift-ingress/router-default service; the service load-balancer will then be deprovisioned and a new one created. Alternatively, you can change the IngressController's spec.endpointPublishingStrategy.loadBalancer.scope field value back to its previous value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also emit oc commands that you can simply cut & paste? I'm assuming we have enough context (e.g., namespace and names, et al). Might be helpful should you want to restore "its previous value" too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can emit oc commands, but the patch command alone makes the message too long to pass the markdownlint CI job. I'll try using double quotes with a backslash-escaped newline to see whether the linter allows that. I believe that it'll still be valid yaml, although it won't be exactly what oc get ... -o yaml would actually output.

the cluster administrator needs to delete the Service, like so:

```shell
oc -n openshift-ingress delete services/router-default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just stating that if you went from External=>Internal and then issued the delete the operator tracks the the scope you wanted and it will come back as Internal - correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More or less. The main point that I'm trying to convey is that the user must explicitly perform the delete operation if one is needed, but to your question, yes, the operator will recreate the service with the scope specified in the ingresscontroller. Is the current phrasing clear, or does it need some wordsmithing?

annotation](https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer)
on the Kubernetes Service object for the IngressController. Kubernetes's
service controller and cloud-provider implementation perform the necessary
changes to change the service load-balancer's scope, and no further action is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this operation fails where would it be reported? Although the Kubernetes service controller makes the changes I'd like to know where we diagnose failures should that fail in any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of failure, the service controller should emit an event with reason "SyncLoadBalancerFailed", which the ingress operator would observe and reflect in the ingresscontroller's status conditions as LoadBalancerReady=False and ultimately Available=False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this information under the new "Support Procedures" section.

4. If the operator is running on AWS and IBM Cloud, verify that the IngressController reports `Progressing=True`.
5. If the operator is running on Azure or GCP, verify that the Service is annotated for internal scope.
6. Set the IngressController's endpoint publishing strategy's scope to "External".
7. Verify that the IngressController reports `Progressing=False`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also feasible to have test setups that make progress but ultimately fail so that Progressing=False does not occur? I'm asking for both the happy path and the unhappy path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progressing=True indicates that the operator is in an intermediate state: the user has specified one scope, and the service currently has a different scope. Once this discrepancy is resolved (by deleting the service or by updating the ingresscontroller to have the same scope as the service), then the operator returns to its baseline of reporting Progressing=False. For reference, here is the proof-of-concept implementation of the status reporting: openshift/cluster-ingress-operator@7daeb6f#diff-56b131774a926e7a0e30a9be7dac7bf5c5cec11ff709aa6604cecc9ef117ede2R547-R584. If the service controller raises an error while updating or recreating the service load-balancer, then it will report an event that the operator will observe and report as described in my previous comment. Is that sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new "Operational Aspects of API Extensions" section should clarify this.

version of OpenShift without this enhancement and upgrades to a version of
OpenShift with this enhancement, the operator annotates the Service or sets
`Progressing=True` on the IngressController as appropriate. Thus the operator
may effectuate a latent scope change, but it does not delete the Service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/Why will it not delete the service in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The how and why are the same in the upgrade case as in the normal case: If the ingresscontroller and the service specify different scopes and we don't know that the platform supports scope changes in situ, then the operator does not delete the service because doing so could disrupt traffic. The behavior on upgrade or downgrade follows from and is consistent with the logic described in the proposal and implementation details. Should I add something to that effect to the enhancement text? I sought to adhere to the enhancement format without being too verbose or repetitive.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2021
@Miciah Miciah force-pushed the ingress-add-mutable-publishing-scope-enhancement branch from 8809e30 to c260fc8 Compare November 17, 2021 03:16
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 17, 2021
@Miciah
Copy link
Contributor Author

Miciah commented Nov 19, 2021

The CI job failed with no obvious reason:

 + markdownlint-cli2 '**/*.md'
markdownlint-cli2 v0.3.2 (markdownlint v0.24.0)
Finding: **/*.md
Linting: 343 file(s)
Summary: 0 error(s)
++ dirname hack/markdownlint.sh
+ hack/template-lint.sh
Checking enhancements/ingress/mutable-publishing-scope.md
enhancements/ingress/mutable-publishing-scope.md missing "### API Extensions"
enhancements/ingress/mutable-publishing-scope.md missing "### Operational Aspects of API Extensions"
enhancements/ingress/mutable-publishing-scope.md missing "#### Failure Modes"
enhancements/ingress/mutable-publishing-scope.md missing "#### Support Procedures"
{"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2021-11-17T03:22:45Z"} 

/test markdownlint

@Miciah
Copy link
Contributor Author

Miciah commented Nov 19, 2021

Wait, those are obvious reasons.

@Miciah Miciah force-pushed the ingress-add-mutable-publishing-scope-enhancement branch from c260fc8 to 51e0f70 Compare November 20, 2021 00:36
@Miciah
Copy link
Contributor Author

Miciah commented Nov 20, 2021

c260fc851e0f70 adds the newly required "API Extensions", "Operational Aspects of API Extensions", "Failure Modes", and "Support Procedures" sections.

@frobware
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 22, 2021
@frobware
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 24, 2021
@openshift-merge-robot openshift-merge-robot merged commit 913cf41 into openshift:master Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants