Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approval controller improvements #398

Open
johnbelamaric opened this issue Sep 20, 2023 · 13 comments
Open

Approval controller improvements #398

johnbelamaric opened this issue Sep 20, 2023 · 13 comments
Labels

Comments

@johnbelamaric
Copy link
Member

No description provided.

@johnbelamaric
Copy link
Member Author

cc @arora-sagar

@johnbelamaric
Copy link
Member Author

related: kptdev/kpt#4040

@johnbelamaric
Copy link
Member Author

johnbelamaric commented Oct 11, 2023

/retitle Approval controller improvements

In the long run, we could design a more robust approval controller that has ApprovalPolicy resources that we can attach to packages. However, for now, we can make some iterative improvements. As a quick hack, we could make approval.nephio.org/policy a comma-delimited list (or create multiple annotations?), with the following policies which can be each independently enabled or disabled:

  1. initial - the current policy, which will propose & approve a PackageRevision if there is no published version of that pacakage in the repository.
  2. proposed - a new policy which will auto-approve any PackageRevision in a Proposed state
  3. deletion-proposed - a new policy which will auto-delete any PackageRevision in a DeletionProposed state

See https://github.com/nephio-project/nephio/blob/main/controllers/pkg/reconcilers/approval/ for the code. This could be a nice, well contained piece of functionality for someone wanting to get a taste of controller development.

Another little tweak would be to send an event when we approve. Right now, we only send events for failures or when we are "waiting". It would be a better UX to see an approval event as well. That would be here: https://github.com/nephio-project/nephio/blob/main/controllers/pkg/reconcilers/approval/reconciler.go#L202

cc @adetalhouet

@nephio-prow nephio-prow bot changed the title Auto-deletion approval in approval controller Approval controller improvements Oct 11, 2023
@johnbelamaric
Copy link
Member Author

Oh - for 3) this will allow a cascading delete from PVS, because when you delete the PVS, it deletes the PV resources, which in turn results in the PackageRevision going to DeletionProposed.

@gvbalaji gvbalaji added area/porch Porch related issues area/platform labels Apr 3, 2024
@gvbalaji gvbalaji added this to the R3 milestone Apr 3, 2024
@liamfallon
Copy link
Member

Triaged
Triage Comment: This is to do with the conditions the approval controller is waiting for do not happen, so it's not actually an approval controller issue, related to #462

@johnbelamaric
Copy link
Member Author

Looking at #797, I think we need another policy that just auto-approves everything. It should be really easy to implement. Basically, instead of "initial", we "always" or "ready" which approves any thing that has met the readiness gates. That will allow our controllers to make multiple revisions and avoid some of the race conditions inherent in the "initial" policy use.

@kispaljr
Copy link
Contributor

kispaljr commented Aug 22, 2024

Continuing the conversation from the closed #797:
I agree with @johnbelamaric that it is most probably the approval controller that sometime approves the PackageRevision before the PackageVariant controller applied all mutations to it, thus the mutations (or at least some of them) are applied to a newly created 'v2' draft (as @efiacor noticed).

If this is the case, wouldn't it help, if the PackageVariant controller would create the new v1 PackageRevision with a false readiness gate in the very first place, and only set the readiness gate to true after all the mutations are applied? This would help us avoid race conditions between the PackageVariant and Approval controllers. This seems to me as a better way of solving the problem than tweaking the timing.

@johnbelamaric
Copy link
Member Author

I think the PackageVariant is not marked ready until all mutations are done, and the approval controller checks that. Maybe there is a bug there?

Note that package variant readiness is distinct from the package revision readiness gates. Both are supposed to be true for the approval controller to do its thing.

@johnbelamaric
Copy link
Member Author

If this is the case, wouldn't it help, if the PackageVariant controller would create the new v1 PackageRevision with a false readiness gate in the very first place, and only set the readiness gate to true after all the mutations are applied?

So, based on the approval controller code, approval shouldn't kick in until mutations are done, because the approval requires both PackageVariant and PackageRevision to be ready, and PackageVariant readiness implies all mutations are done.

However, maybe there is a bug where one (or both) of the PackageVariant/PackageRevision are showing "ready" briefly when they shouldn't be?

@kispaljr
Copy link
Contributor

Yes, according to my theory there is such a moment, when both PackageVariant and PackageRevision is "ready", but not all the mutations were applied to the PackageRevision.
I suspect that it is happening, when the PackageVariant controller already updated the PackageRevisionResources of the newly created v1 PackageRevision with the necessary changes (so the PackageVariant is ready), but the package hasn't been rendered yet (the kpt pipeline hasn't run yet at all), so there are no readiness gates set in the PackageRevision (the list of readinessGates is empty in the Kptfile, so the PackageRevision is also ready).
This is the short time interval when the approval controller can approve the package, but after the pipeline ran the first time, it makes changes to the package, (and potentially adds false readiness gates to the package). This triggers the PackageVariant, it creates a v2 Packagerevision and re-applies the mutations, and the initial approval policy won't approve v2.

So using the "always" approval policy indeed solves this, and I agree we should implement that.
But I also think that it would make the solution cleaner if the PackageVariant wouldn't be considered ready until the package was rendered after updating the PackageRevisionResources. This would also make the whole system more robust, and would also make the initial delay in the approval controller unnecessary.

@kispaljr
Copy link
Contributor

Please note that I have no proof yet, that what I wrote above is actually happening, it is just a theory

@johnbelamaric
Copy link
Member Author

But I also think that it would make the solution cleaner if the PackageVariant wouldn't be considered ready until the package was rendered after updating the PackageRevisionResources. This would also make the whole system more robust, and would also make the initial delay in the approval controller unnecessary.

Sounds like a good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants