-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Policy discoverability #2128
Improve Policy discoverability #2128
Conversation
/hold for lots of review on this one. |
Most of the new content is best viewed in the deployment preview: |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
a591567
to
73e1f55
Compare
I've just removed the change to the site page about Policy Attachment, I'll split that into a seperate PR. |
|
||
**Ana**: _Thanks for the help. Wish we’d found better answers._ 😢 | ||
|
||
### The Problem, restated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me the problem is that Ana deployed a buggy version of the service, which needs to be rolled back. The only role that the policy attachment(s) played was to somewhat mitigate the effect of the bad service. Maybe the problem would have been detected more quickly without the retry policy? Given that Ana didn't explicitly set a retry policy for her service and was therefore relying on the implementation default for her service (however that's determined) does she really need to know where exactly the default comes from? Would the easy "fix" have been to explicitly turn off the retries for her service (which is what she was expecting), which would then expose her bug more clearly?
Maybe there's a better parable that would show how a policy attachment can actually break a working service and the service owner can't disable it?
I'm just poking at the question of does it really matter where an inherited policy attachment is, that is causing problems, only that it's affecting the default value of some field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem isn't that the Policy object mitigated the effects, it's that Ana had no way of knowing that's what was happening.
As I say below, there are really three things that any owner of an object that's affected by a Policy wants to know (and the later ones mostly imply that you know the earlier ones:
- That some Policy is affecting a particular object
- Which Policy is (or Policies are) affecting a particular object
- What settings in the Policy are affecting the object.
As we go on to discuss later, there are issues with just dropping these details into the status of objects, and so this is trying to give us a range of options based on how bad the need is, and what resources are available.
the Policy Admin, to Gateway API’s persona list, and without a solution to the | ||
discoverability problem, their actions are largely invisible to the Application | ||
Developer. Not only that, but their concerns cut across the previously established | ||
levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this problem would be less severe if there were certain restrictions on where and how policy attachments are used?
Seems like one thing that should always be possible is that the owner of a resource should be able to override any inherited policy attachments. Does it really make sense that a policy attachment can override an explicit setting in a resource? What is the use case for having explicit settings in the user's resource that get ignored and changed under the covers because of some policy attachment somewhere?
I'm not sure what properties of a resource the spec says can be controlled by inherited policy attachments, but I would say there "should be" only two things:
- default values of fields in the resource (first class API fields)
- fields that can be set with direct policy attachments (these are fields that presumably have been added to support things that are not directly supported in the resources API itself)
If that was the case, then the owner of a resource would always be able to explicitly override any inherited policy attachments that don't work, either in the resource itself or in a direct policy attachment sitting next to the resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to override specific settings can be very useful in clusters with strict rules about things, although I agree that it's probably better in those cases to reject the create or change that would have an undesirable setting using a Validating Admission Controller - faster feedback is much better than silently changing a value.
But, this override need comes from direct experience @robscott and I have had with previous iterations of similar ideas, sysadmins (in Gateway API's personas, Cluster Admins or even Infra Admins) sometimes want to override settings so that users can't take certain actions, and this is a mechanism to allow that to happen.
|
||
(There’s also another use case to consider, in that Charlie should have been able | ||
to see that the Policy on the namespace was in use in many places before deleting | ||
it.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that be a default assumption if there are other resources in the namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but again, it's easy to miss in a stressful situation. Explicitly telling people this is better than leaving it implicit.
Thanks for the comments @frankbu, I'm about to go on leave until July 18, so I will address them when I'm back. I'd encourage others to take a look at this proposal too in the meantime. |
73e1f55
to
8d1535e
Compare
Updated this one, it's ready for more review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the latest updates here.
This is most definitely a separate GEP, but I'm wondering about the concept of a policy implementer/provider that is separate from the gateway provider.
Would that decoupling make sense? Perhaps this has been discussed before.
Asking in the context of seeing the GatewayClass extension types mentioned and how that might work.
For example, if I'm an expert in API caching, I'd like to write a controller for reconciling CachePolicy resources, but I don't want to/not able to provide a full gateway api impelmentation with my caching policy feature.
I'd like to register my controller in a standard way with a Gateway API provider, and then it lists the CachePolicy CRD in the GatewayClass status.
The gateway provider then has a standard mechanism for integrating with my controller (or vice versa).
/check-cla |
/check-cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reality of this effort is that we must move forward because we are in a predicament for delivery. Given the reality and the need and the fact that this change is overall an improvement, I think we should accept this as the next iteration and keep iterating from here.
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @youngnick! Lots of great updates here.
geps/gep-713.md
Outdated
#### On targeted resources | ||
|
||
Implementations that use Policy objects MUST ensure that a Condition exists in | ||
the `status.Conditions` of any objects affected by a Policy. If that Condition is | ||
already set to `status: true`, then the implementation must ensure that | ||
the `observedGeneration` field matches the `metadata.generation` field of the | ||
object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this specific solution might be best represented by a follow-up GEP. There are a lot of edge cases here that are going to be hard to handle. For example it will be fairly straightforward for implementations to determine when they should set this condition, but near impossible for them to determine when they should unset the condition, assuming more than one implementation is active in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the wording here and in the Solution Cookbook section, but I'll also update both with the "this needs a followup GEP" wording and change to Experimental.
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
b1baa1a
to
1c80569
Compare
Signed-off-by: Nick Young <nick@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @youngnick! You've got some great ideas here, really good to seem them all articulated. I've got a few comments, mostly non-blocking. The overall theme is that I think this GEP has gotten too big and I think it would be good to split it out ASAP or at least add new content in new places. That's more of an organizational thing, but I think I'm aligned with all the major points in here.
/approve
@@ -1,4 +1,4 @@ | |||
# GEP-713: Metaresources and Policy Attachment_metaresource_ | |||
# GEP-713: Metaresources and Policy Attachment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general note, this GEP is massive. I think the overall size of it makes an already complicated topic even more overwhelming to understand and/or contribute to. One potential way to mitigate this problem would be to have a separate page that collects and documents problems with policy attachment and links out to potential solutions and how we're trying to improve things here. For example, we could keep something like #2012 as an interactive forum for proposing solutions/improvements that would eventually turn into GEPs of their own, but the problem statements could be clearly stated somewhere in our docs...
This is not blocking feedback, I just think we're going to need find some ways to either split this GEP up and/or cut some of the content out so this can be a bit more manageable + approachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've logged #2307, and assigned it to myself.
@@ -649,6 +1182,33 @@ const ( | |||
) | |||
``` | |||
|
|||
#### On targeted resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the size of this GEP and the provisional nature of this idea, maybe we can just link out to somewhere here instead of duplicating the content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I'd prefer to get this merged, so that we have the text stored somewhere, and then move it around after. This PR does not finish the work to tidy up Policy Attachment, but it's become a bit unwieldy, so I'd like to concentrate on getting it merged and fixing afterwards.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: robscott, shaneutt, youngnick The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Nick Young <nick@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @youngnick!
/lgtm
/hold cancel |
What type of PR is this?
/kind cleanup
/kind documentation
/kind gep
What this PR does / why we need it:
This PR updates GEP-713 - Policy Attachment - to clarify how implementations using the pattern should handle the discoverability problems it creates.
Some solutions need further GEPs - these are marked as EXPERIMENTAL DO NOT USE (to prevent similar problems from arising in the future).
Additionally, this PR removes most content from the existing "Policy Attachment" documentation page and adds a link to this GEP so that there's only one place to keep up to date.
Which issue(s) this PR fixes:
Updates #2014
Does this PR introduce a user-facing change?: