Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Policy discoverability #2128

Merged
merged 8 commits into from
Aug 15, 2023

Conversation

youngnick
Copy link
Contributor

What type of PR is this?
/kind cleanup
/kind documentation
/kind gep

What this PR does / why we need it:
This PR updates GEP-713 - Policy Attachment - to clarify how implementations using the pattern should handle the discoverability problems it creates.

Some solutions need further GEPs - these are marked as EXPERIMENTAL DO NOT USE (to prevent similar problems from arising in the future).

Additionally, this PR removes most content from the existing "Policy Attachment" documentation page and adds a link to this GEP so that there's only one place to keep up to date.

Which issue(s) this PR fixes:

Updates #2014

Does this PR introduce a user-facing change?:

The Policy Attachment pattern has new requirements and recommendations to help with Policy discoverability.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. kind/gep PRs related to Gateway Enhancement Proposal(GEP) labels Jun 19, 2023
@k8s-ci-robot k8s-ci-robot requested review from bowei and robscott June 19, 2023 13:37
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 19, 2023
@youngnick
Copy link
Contributor Author

/hold for lots of review on this one.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2023
@youngnick
Copy link
Contributor Author

Most of the new content is best viewed in the deployment preview:

https://deploy-preview-2128--kubernetes-sigs-gateway-api.netlify.app/geps/gep-713/#status-and-the-discoverability-problem

@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

geps/gep-713.md Outdated Show resolved Hide resolved
@youngnick youngnick force-pushed the policy-discoverability branch from a591567 to 73e1f55 Compare June 20, 2023 16:20
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 20, 2023
@youngnick
Copy link
Contributor Author

I've just removed the change to the site page about Policy Attachment, I'll split that into a seperate PR.


**Ana**: _Thanks for the help. Wish we’d found better answers._ 😢

### The Problem, restated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me the problem is that Ana deployed a buggy version of the service, which needs to be rolled back. The only role that the policy attachment(s) played was to somewhat mitigate the effect of the bad service. Maybe the problem would have been detected more quickly without the retry policy? Given that Ana didn't explicitly set a retry policy for her service and was therefore relying on the implementation default for her service (however that's determined) does she really need to know where exactly the default comes from? Would the easy "fix" have been to explicitly turn off the retries for her service (which is what she was expecting), which would then expose her bug more clearly?

Maybe there's a better parable that would show how a policy attachment can actually break a working service and the service owner can't disable it?

I'm just poking at the question of does it really matter where an inherited policy attachment is, that is causing problems, only that it's affecting the default value of some field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem isn't that the Policy object mitigated the effects, it's that Ana had no way of knowing that's what was happening.

As I say below, there are really three things that any owner of an object that's affected by a Policy wants to know (and the later ones mostly imply that you know the earlier ones:

  • That some Policy is affecting a particular object
  • Which Policy is (or Policies are) affecting a particular object
  • What settings in the Policy are affecting the object.

As we go on to discuss later, there are issues with just dropping these details into the status of objects, and so this is trying to give us a range of options based on how bad the need is, and what resources are available.

the Policy Admin, to Gateway API’s persona list, and without a solution to the
discoverability problem, their actions are largely invisible to the Application
Developer. Not only that, but their concerns cut across the previously established
levels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this problem would be less severe if there were certain restrictions on where and how policy attachments are used?

Seems like one thing that should always be possible is that the owner of a resource should be able to override any inherited policy attachments. Does it really make sense that a policy attachment can override an explicit setting in a resource? What is the use case for having explicit settings in the user's resource that get ignored and changed under the covers because of some policy attachment somewhere?

I'm not sure what properties of a resource the spec says can be controlled by inherited policy attachments, but I would say there "should be" only two things:

  1. default values of fields in the resource (first class API fields)
  2. fields that can be set with direct policy attachments (these are fields that presumably have been added to support things that are not directly supported in the resources API itself)

If that was the case, then the owner of a resource would always be able to explicitly override any inherited policy attachments that don't work, either in the resource itself or in a direct policy attachment sitting next to the resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to override specific settings can be very useful in clusters with strict rules about things, although I agree that it's probably better in those cases to reject the create or change that would have an undesirable setting using a Validating Admission Controller - faster feedback is much better than silently changing a value.

But, this override need comes from direct experience @robscott and I have had with previous iterations of similar ideas, sysadmins (in Gateway API's personas, Cluster Admins or even Infra Admins) sometimes want to override settings so that users can't take certain actions, and this is a mechanism to allow that to happen.


(There’s also another use case to consider, in that Charlie should have been able
to see that the Policy on the namespace was in use in many places before deleting
it.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be a default assumption if there are other resources in the namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but again, it's easy to miss in a stressful situation. Explicitly telling people this is better than leaving it implicit.

@youngnick
Copy link
Contributor Author

Thanks for the comments @frankbu, I'm about to go on leave until July 18, so I will address them when I'm back.

I'd encourage others to take a look at this proposal too in the meantime.

@youngnick youngnick force-pushed the policy-discoverability branch from 73e1f55 to 8d1535e Compare July 26, 2023 05:37
@youngnick
Copy link
Contributor Author

Updated this one, it's ready for more review.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 26, 2023
Copy link
Contributor

@david-martin david-martin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see the latest updates here.

This is most definitely a separate GEP, but I'm wondering about the concept of a policy implementer/provider that is separate from the gateway provider.
Would that decoupling make sense? Perhaps this has been discussed before.

Asking in the context of seeing the GatewayClass extension types mentioned and how that might work.
For example, if I'm an expert in API caching, I'd like to write a controller for reconciling CachePolicy resources, but I don't want to/not able to provide a full gateway api impelmentation with my caching policy feature.
I'd like to register my controller in a standard way with a Gateway API provider, and then it lists the CachePolicy CRD in the GatewayClass status.
The gateway provider then has a standard mechanism for integrating with my controller (or vice versa).

@youngnick
Copy link
Contributor Author

/check-cla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 2, 2023
@youngnick
Copy link
Contributor Author

/check-cla

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reality of this effort is that we must move forward because we are in a predicament for delivery. Given the reality and the need and the fact that this change is overall an improvement, I think we should accept this as the next iteration and keep iterating from here.

/approve

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! Lots of great updates here.

geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated
Comment on lines 1138 to 1144
#### On targeted resources

Implementations that use Policy objects MUST ensure that a Condition exists in
the `status.Conditions` of any objects affected by a Policy. If that Condition is
already set to `status: true`, then the implementation must ensure that
the `observedGeneration` field matches the `metadata.generation` field of the
object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this specific solution might be best represented by a follow-up GEP. There are a lot of edge cases here that are going to be hard to handle. For example it will be fairly straightforward for implementations to determine when they should set this condition, but near impossible for them to determine when they should unset the condition, assuming more than one implementation is active in the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wording here and in the Solution Cookbook section, but I'll also update both with the "this needs a followup GEP" wording and change to Experimental.

geps/gep-713.md Show resolved Hide resolved
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
Signed-off-by: Nick Young <nick@isovalent.com>
@youngnick youngnick force-pushed the policy-discoverability branch from b1baa1a to 1c80569 Compare August 7, 2023 11:19
Signed-off-by: Nick Young <nick@isovalent.com>
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! You've got some great ideas here, really good to seem them all articulated. I've got a few comments, mostly non-blocking. The overall theme is that I think this GEP has gotten too big and I think it would be good to split it out ASAP or at least add new content in new places. That's more of an organizational thing, but I think I'm aligned with all the major points in here.

/approve

geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
@@ -1,4 +1,4 @@
# GEP-713: Metaresources and Policy Attachment_metaresource_
# GEP-713: Metaresources and Policy Attachment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general note, this GEP is massive. I think the overall size of it makes an already complicated topic even more overwhelming to understand and/or contribute to. One potential way to mitigate this problem would be to have a separate page that collects and documents problems with policy attachment and links out to potential solutions and how we're trying to improve things here. For example, we could keep something like #2012 as an interactive forum for proposing solutions/improvements that would eventually turn into GEPs of their own, but the problem statements could be clearly stated somewhere in our docs...

This is not blocking feedback, I just think we're going to need find some ways to either split this GEP up and/or cut some of the content out so this can be a bit more manageable + approachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've logged #2307, and assigned it to myself.

geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Show resolved Hide resolved
@@ -649,6 +1182,33 @@ const (
)
```

#### On targeted resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the size of this GEP and the provisional nature of this idea, maybe we can just link out to somewhere here instead of duplicating the content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I'd prefer to get this merged, so that we have the text stored somewhere, and then move it around after. This PR does not finish the work to tidy up Policy Attachment, but it's become a bit unwieldy, so I'd like to concentrate on getting it merged and fixing afterwards.

geps/gep-713.md Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: robscott, shaneutt, youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Nick Young <nick@isovalent.com>
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2023
@robscott
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 15, 2023
@k8s-ci-robot k8s-ci-robot merged commit a975fd6 into kubernetes-sigs:main Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants