Special behavior for generators #2528

bgrant0607 · 2021-10-01T01:30:28Z

I've concluded that generators need special treatment.

Preprocessors (#2420) are ok, but are a workaround. #2466 is one issue that has been reported, but there are more issues.

Let's look at a generator like https://github.com/GoogleContainerTools/kpt-functions-sdk/blob/master/ts/demo-functions/src/expand_team_cr.ts.

The expansion function is associated with the pseudo-resource-type, not really with a particular package. That type may be used in many packages. It would be helpful to have a registry for that type-to-function mapping (versioning TBD).
I want to use the functions in the Kptfile pipeline similarly to Kubernetes mutating and validating admission control. They should re-run on generated resources. Other functions I'll just invoke imperatively using kpt fn eval.
I want to be able to edit generated resources and use merge logic similar to kpt pkg update. In this example, Namespaces frequently need to be labeled and annotated, so the function in its current form is useless. One needs to create a separate package just with the Team resource, re-rerun the generator, commit and push that package, then kpt pkg get that into the main package and update that each time the function is run. Because kpt doesn't support local packages AFAIK, there can't really be a review/approval workflow around updates to the package containing generator function's output.
If resources are no longer generated, they need to be pruned from the filesystem.

cc @justinsb

droot · 2021-10-20T22:17:06Z

It would be helpful to have a registry for that type-to-function mapping

@mengqiy what is the easiest way to support this in function catalog.

I want to use the functions in the Kptfile pipeline similarly to Kubernetes mutating and validating admission control. They should re-run on generated resources.

Thinking of some possible options here:

As soon as new resources are detected after executing a function, re-start the pipeline with currently modified resources that includes new resources (note the resources are not written to the disk yet, they are written at the end of successful pipeline execution).
As soon as new resources are detected after executing a function, continue with the pipeline execution, and re-run the pipeline with only the generated resources. [Executing functions with only the new resources may not be desired because functions assumes that entire pkg (resourceList) to be available to them]
If new resources are generated at the end of pipeline run (note resources are written to the disk post successful pipeline run), then re-run the pipeline on the pkg (this behavior could be configurable at the top pkg level)

3rd option is the least disruptive and easy to reason with.

I want to be able to edit generated resources and use merge logic similar to kpt pkg update. In this example, Namespaces frequently need to be labeled and annotated, so the function in its current form is useless. One needs to create a separate package just with the Team resource, re-rerun the generator, commit and push that package, then kpt pkg get that into the main package and update that each time the function is run. Because kpt doesn't support local packages AFAIK, there can't really be a review/approval workflow around updates to the package containing generator function's output.

some early thoughts:

expand_team function can be written as ensure_team form that is idempotent and takes namespace and IAMRoles resources that are part of the input resourceList. So ensure form of the function is sort of acting only on the attributes of resources that are driven by pseudo resource. I see this behavior closer to how kubernetes controllers work. (multiple controllers acting in a coordinated manner and patching the bits they care about). Now obviously, writing ensure form of the function is more work over purely generative function. It's a way of saying functions are taking care of merge part here.
kpt supporting merge behavior for functions which is similar to server-side-apply functionality where expand_team and user-edits are treated as changes by different actors on same resources. To do this cleanly, will have to come up with some way of tracking/ownership of changes.

droot · 2021-10-20T22:19:28Z

I realized, our SDKs can also make writing ensure (or idempotent) functions easier.

morgante · 2021-10-20T23:11:20Z

If new resources are generated at the end of pipeline run (note resources are written to the disk post successful pipeline run), then re-run the pipeline on the pkg (this behavior could be configurable at the top pkg level)

This is the direction I think would make the most sense, and would be the simplest for other functions to reason about.

I realized, our SDKs can also make writing ensure (or idempotent) functions easier.

Yes, mainly I think we need an ensure/generate SDK that helps with:

Merging changes between the function, other functions, and user manual edits
Generation and removal of resources. generate-folders is another function that needs to delete managed resources if they are removed from the master resource.

mikebz · 2021-11-04T18:24:45Z

Additional context #2435

bgrant0607 · 2021-11-17T13:25:00Z

kustomize has issues with the generator pattern also:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-cli/2299-kustomize-plugin-composition

johnbelamaric · 2021-12-14T19:22:24Z

Yes, mainly I think we need an ensure/generate SDK that helps with:

Merging changes between the function, other functions, and user manual edits

Ok, so I am adding this caveat after writing everything below, which is just a sort of thought process thinking through some of these issues, and I am still not clear on how we achieve this and don't have time to continue thinking about it now. If the musings below are not useful, then feel free to ignore them. Perhaps you already have some more ideas on a better way to do this.

I have a use case for a variant generator that would, for example, punch out a set of variants of a workload that are specialized to particular clusters. It works something like this:

Variant Generator Function
Inputs

A selector for a set of resources (think: all resources in a given directory; could even be a package)
An input CRD with a array of structs (think: fields with setter values or inputs to bits of more complex logic)

Behavior

Select the resources according to the selector, resulting in a set of resources we will call A
For each entry in the array
- Duplicate A
- Apply transformations based on the struct
- Store result A_n in its own directory (i.e., modify the directory annotation)

I'd like the user to be able to subsequently modify A_n to produce A_n'. When the function is re-run I do not want to overwrite those edits. This can't be done with the approach above because it simply regenerates the entire A_n with each run, and so I cannot differentiate a delta between A_n and A_n' due to a user vs that due to a change in function inputs (or code).

My initial thought for solving that was to use patch files instead of an array of structs - one set of patches per variant. That is, have the user create a patch that when applied to A, produces A_n. The function then would simply process each of these patches, and apply those deltas to A_n' if it exists (or create it by applying the patch to A).

I think that would work, but it shifts the burden of variant generation to the user. Perhaps we can have two functions: one that generates the patch, and one that applies it to create A_n?

Patch Generator Function
Inputs:

A selector for a set of resources
An input CRD with a array of structs

Behavior:

Select the resources according to the selector to get the set A
For each entry in the array
- Duplicate A
- Apply transformations based on the struct to get A_n
- Calculate a patch P_n to produce A_n

Variant Generator Function
Inputs:

Hmm...some sort of selectors to identify P_n, and A_n, A_n' resources?

Behavior:

For each P_n
- If A_n' exists, apply the patch, otherwise create A_n' as equal to A_n

I think with this approach, re-running the function will not overwrite non-conflicting user edits of A_n'. Conflicting edits are still an issue though. Worse still, we have issues if we modify the generator function or the set of resources in A, because our patches will now potentially not contain some new resource or value from the modification, so that change will never get added to the generated resources. That is, if we add a resource to such that the selected resources, say A1 now have some additional resources than A, the next run will calculate a patch between A1 and A1_n, and the patch won't have the new resource, so it will never be applied to A_n'.

I think we have to take it a step further, and store the patches, along with some sequencing so that we can re-apply them properly ordered. Or maybe even that won't work, but instead we need to store the original A_n from the previous version of the function. I need to think about it more, but this seems to be getting out-of-hand.

Other approaches?

morgante · 2021-12-14T19:33:02Z

@johnbelamaric Have you considered modeling this as a set of subpackages? (See go/package-sets.)

johnbelamaric · 2021-12-14T19:39:19Z

Yes, now that you mention it I did at some point look at that and it looked promising. I'll have to go back and revisit it. Has anything with respect to that been implemented?

…

On Tue, Dec 14, 2021 at 11:33 AM Morgante Pell ***@***.***> wrote: @johnbelamaric <https://github.com/johnbelamaric> Have you considered modeling this as a set of subpackages? (See go/package-sets <https://goto.google.com/package-sets>.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2528 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACIHRM666MLI6FOARUU7OPTUQ6LXTANCNFSM5FDSSYCQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

morgante · 2021-12-14T19:57:05Z

Not yet, the main blocker right now is the challenges with cyclical kpt fn render — when generators insert new resources (including a Kptfile with new functions) we need a mechanism to add functions on those resources.

johnbelamaric · 2021-12-14T20:05:17Z

Ah, yes, I handle that in pre-v1.0 kpt with a two-pass pipeline, haven't looked at v1.0 kpt closely enough, hopefully there will be a similar solution available.

johnbelamaric · 2022-01-04T17:58:44Z

Thinking about this a little more, it does seem that subpackage is the right model. The explicit version in the Kptfile gives us the ancestor for the 3-way merge, so we can make edits via the generator function, via other functions, and via manual user edits of the resulting generated instance.

One issue for generators - shouldn't they end up needing to work something like apply, with respect to pruning, for example? If we alter the generator function or its inputs such that you are getting fewer instances of a given subpackage, you'll need to prune the old packages. We could manage this with function-specific logic (for example, "this generator function owns this directory"), but it seems we should have some utility or at least conventions here.

morgante · 2022-01-04T21:19:00Z

One issue for generators - shouldn't they end up needing to work something like apply, with respect to pruning, for example?

Yes, though they're actually also often similar to owner references. The approach we've taken thus far is to use an annotation to track the source resource and prune if it's changed.

We should definitely standardize this sort of functionality into the SDK though. It's a pain for each function author to write/maintain the logic and implementations might become inconsistent.

bgrant0607 · 2022-01-20T20:09:53Z

Back to the original topic:

Ensure functions make sense to me. That's similar to https://github.com/metacontroller/metacontroller functions.

It could be a supported or at least recommended pattern, like variant constructors (#2590).

However, my eventual conclusion on expand_team_cr was that it wasn't sufficiently useful for declarative use. Instead, I switched to a minimalistic variant-constructor approach, using the ensure pattern for just the Namespace: #2184 (comment)

In order for abstractions need to be worthwhile, they need to dramatically reduce configuration complexity or implement some significant business logic. Otherwise, we should think in terms of manipulating the underlying resource types, with resource-type-specific functions and tools, as imperative tools typically do.

I agree that variant generators could be modeled as generated subpackages, which could leverage the variant constructor pattern and specify variant resources declaratively rather than generating them from scratch. We should spawn another issue for that, if there isn't one already. The variant generation function could generate Kptfiles for the subpackages. We'd then need to trigger cloning of the subpackages, and reinvoke the pipeline to run the variant constructors.

I am generally not a fan of monolithic arrays or maps as ways to express desired variants. They seem appealing for specifying small numbers of variants with small numbers of varying attributes by hand, but are difficult to manage at non-trivial scale. Probably the set of variants would be derived from some input source using some automation. I'd represent them like other generators, using client-side custom resources, one per variant or per dimensional variation value (e.g., region and environment). Convention over configuration is our friend for these scenarios so that users don't need to specify where the varying attributes come from for each variant. Storing sets of varying attributes or "facts" is common in hierarchical parameter stores. We can use CRs (pseudo or actual) or ConfigMaps for this, which also should make custom UX for such use cases simpler to build.

bgrant0607 · 2022-01-28T23:46:27Z

For creating individual resources, imperative functions, CLI (e.g. kubectl create -o yaml), UI, etc. seems like the way to go.

bgrant0607 · 2022-02-04T22:14:11Z

An example:
https://github.com/GoogleContainerTools/kpt-functions-catalog/tree/master/functions/go/enable-gcp-services

This sets a blueprints.cloud.google.com/ownerReference annotation.

Which prunes resources no longer needed.
https://github.com/GoogleContainerTools/kpt-functions-catalog/tree/master/examples/enable-gcp-services-advanced#overview

bgrant0607 · 2022-05-13T19:53:43Z

Ah, the merge behavior was previously requested in #954

bgrant0607 · 2022-06-13T14:16:07Z

The question inevitably comes up regarding what to do when a generator function is updated.

It depends on whether changing the behavior of existing uses is desirable or not. It also depends on whether the generator would just support additional optional, parameterized attributes. Helm charts often conditionalize new attributes so that they don't affect existing uses of the chart automatically. Changes of default behavior are also typically undesirable for existing uses. So changing a generator is often not the best way to enforce new behaviors.

That said, we may want to support similar behaviors for upgrades to new generator versions similarly to new upstream package versions. One way to do that would be to update the function version in upstream packages first, then update their downstream packages. We'd want to surface available function upgrades similar to available package upgrades. This would be easier if we separated upstream and downstream functions (#2544) as well as generators and admission control (#3121).

droot · 2022-06-13T22:35:26Z

That said, we may want to support similar behaviors for upgrades to new generator versions similarly to new upstream package versions.

One quick thought -- I can imagine generator function to represent a subset of resources of the package (sort of a slice of the package) sourced by running a specific version of function similar to pkg get <pkg-version. This is very similar to having a subpkg where upstream is a generator-function:version and it's input and changing the version is similar to upgrading to new version of that upstream.

bgrant0607 · 2022-07-21T01:59:47Z

The issue of pruning also applies to transformers that change resource identifiers, in the case that the identifier transformation is changed across revisions.

bgrant0607 · 2023-03-30T15:30:28Z

Drive-by observation: Infrastructure from Code projects that generate Infrastructure as Code formats are going to have similar challenges. Users will want to customize the generated IaC code/templates, but those changes will be stomped the next time the IaC is regenerated. The idea of generating app config from app code is not a terrible one. "oc new-app" is a simple version of that.

johnbelamaric · 2023-04-06T17:46:30Z

Something of interest here is what @henderiw and others are doing in Nephio. There, we are using functions to generate resources, as well as using the package conditions function of Porch. This means we need to operate a lot like a controller; we need to create, update, and prune child resources based upon one or more inputs. So, they are working on a controller-runtime like function SDK that handles various aspects of that for the function authors.

Wim - can you link in your slides / maybe the PR? This is very early work, but in a month or so it may be good to discuss in kpt office hours.

henderiw · 2023-04-06T17:48:04Z

Attached the link to the slides.

https://docs.google.com/presentation/d/1xKZM4Q_auoUMb6M4I_OTNlS7xu7K6cVm2F_FQg217lU/edit?usp=sharing

bgrant0607 added the enhancement New feature or request label Oct 1, 2021

droot added this to the Q4-2021 milestone Oct 20, 2021

droot added the area/hydrate label Oct 20, 2021

droot self-assigned this Oct 20, 2021

droot added the triaged Issue has been triaged by adding an `area/` label label Oct 20, 2021

bgrant0607 mentioned this issue Nov 17, 2021

friction report: developing a simple generator function #2435

Open

justinsb mentioned this issue Apr 21, 2022

WIP: Introduce generator as option for base #3037

Open

This was referenced May 6, 2022

guide documenting techniques for variant constructor #3083

Merged

Create a way to pass non-KRM files through function input/output #3118

Open

ConfigMap generation #3119

Open

Could we just use kustomize for transformations? #3121

Open

droot added the p1 label May 27, 2022

bgrant0607 mentioned this issue Jun 4, 2022

Document how to approach abstraction #3288

Open

droot removed this from the Q4-2021 milestone Jun 7, 2022

bgrant0607 mentioned this issue Jun 8, 2022

CUJ: How to implements defaults for resourcequota and limitrange in namespace provisioning use-case #3294

Open

bgrant0607 mentioned this issue Jun 13, 2022

Bulk function upgrades #3309

Open

This was referenced Jun 30, 2022

Map function inputs to functions automatically via a catalog mechanism #3339

Open

Figure out and document the recommended way to establish organizational defaults for individual resource types for Config Connector #3344

Open

bgrant0607 mentioned this issue Jul 20, 2022

Flesh out the input data model and patterns #3396

Open

bgrant0607 mentioned this issue Jul 29, 2022

Develop a way to handle application configuration #3210

Open

yuwenma mentioned this issue Aug 8, 2022

set-image function is optimized for out-of-place and found it practically unusable forin-place mode #3444

Open

mortent added this to kpt Jan 21, 2023

mortent moved this to Backlog in kpt Jan 25, 2023

droot removed their assignment Jan 31, 2023

liamfallon mentioned this issue Apr 8, 2024

Bulk function upgrades nephio-project/porch-issue-transfer#100

Closed

liamfallon mentioned this issue Apr 23, 2024

Bulk function upgrades nephio-project/nephio#617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special behavior for generators #2528

Special behavior for generators #2528

bgrant0607 commented Oct 1, 2021

droot commented Oct 20, 2021

droot commented Oct 20, 2021

morgante commented Oct 20, 2021

mikebz commented Nov 4, 2021

bgrant0607 commented Nov 17, 2021

johnbelamaric commented Dec 14, 2021

morgante commented Dec 14, 2021

johnbelamaric commented Dec 14, 2021 via email

morgante commented Dec 14, 2021

johnbelamaric commented Dec 14, 2021

johnbelamaric commented Jan 4, 2022

morgante commented Jan 4, 2022

bgrant0607 commented Jan 20, 2022

bgrant0607 commented Jan 28, 2022

bgrant0607 commented Feb 4, 2022

bgrant0607 commented May 13, 2022

bgrant0607 commented Jun 13, 2022

droot commented Jun 13, 2022

bgrant0607 commented Jul 21, 2022

bgrant0607 commented Mar 30, 2023

johnbelamaric commented Apr 6, 2023

henderiw commented Apr 6, 2023

Special behavior for generators #2528

Special behavior for generators #2528

Comments

bgrant0607 commented Oct 1, 2021

droot commented Oct 20, 2021

droot commented Oct 20, 2021

morgante commented Oct 20, 2021

mikebz commented Nov 4, 2021

bgrant0607 commented Nov 17, 2021

johnbelamaric commented Dec 14, 2021

morgante commented Dec 14, 2021

johnbelamaric commented Dec 14, 2021 via email

morgante commented Dec 14, 2021

johnbelamaric commented Dec 14, 2021

johnbelamaric commented Jan 4, 2022

morgante commented Jan 4, 2022

bgrant0607 commented Jan 20, 2022

bgrant0607 commented Jan 28, 2022

bgrant0607 commented Feb 4, 2022

bgrant0607 commented May 13, 2022

bgrant0607 commented Jun 13, 2022

droot commented Jun 13, 2022

bgrant0607 commented Jul 21, 2022

bgrant0607 commented Mar 30, 2023

johnbelamaric commented Apr 6, 2023

henderiw commented Apr 6, 2023