Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special behavior for generators #2528

Open
bgrant0607 opened this issue Oct 1, 2021 · 22 comments
Open

Special behavior for generators #2528

bgrant0607 opened this issue Oct 1, 2021 · 22 comments
Labels
area/hydrate enhancement New feature or request p1 triaged Issue has been triaged by adding an `area/` label

Comments

@bgrant0607
Copy link
Contributor

I've concluded that generators need special treatment.

Preprocessors (#2420) are ok, but are a workaround. #2466 is one issue that has been reported, but there are more issues.

Let's look at a generator like https://github.com/GoogleContainerTools/kpt-functions-sdk/blob/master/ts/demo-functions/src/expand_team_cr.ts.

  • The expansion function is associated with the pseudo-resource-type, not really with a particular package. That type may be used in many packages. It would be helpful to have a registry for that type-to-function mapping (versioning TBD).
  • I want to use the functions in the Kptfile pipeline similarly to Kubernetes mutating and validating admission control. They should re-run on generated resources. Other functions I'll just invoke imperatively using kpt fn eval.
  • I want to be able to edit generated resources and use merge logic similar to kpt pkg update. In this example, Namespaces frequently need to be labeled and annotated, so the function in its current form is useless. One needs to create a separate package just with the Team resource, re-rerun the generator, commit and push that package, then kpt pkg get that into the main package and update that each time the function is run. Because kpt doesn't support local packages AFAIK, there can't really be a review/approval workflow around updates to the package containing generator function's output.
  • If resources are no longer generated, they need to be pruned from the filesystem.

cc @justinsb

@bgrant0607 bgrant0607 added the enhancement New feature or request label Oct 1, 2021
@droot droot added this to the Q4-2021 milestone Oct 20, 2021
@droot droot self-assigned this Oct 20, 2021
@droot
Copy link
Contributor

droot commented Oct 20, 2021

It would be helpful to have a registry for that type-to-function mapping

@mengqiy what is the easiest way to support this in function catalog.

I want to use the functions in the Kptfile pipeline similarly to Kubernetes mutating and validating admission control. They should re-run on generated resources.

Thinking of some possible options here:

  • As soon as new resources are detected after executing a function, re-start the pipeline with currently modified resources that includes new resources (note the resources are not written to the disk yet, they are written at the end of successful pipeline execution).
  • As soon as new resources are detected after executing a function, continue with the pipeline execution, and re-run the pipeline with only the generated resources. [Executing functions with only the new resources may not be desired because functions assumes that entire pkg (resourceList) to be available to them]
  • If new resources are generated at the end of pipeline run (note resources are written to the disk post successful pipeline run), then re-run the pipeline on the pkg (this behavior could be configurable at the top pkg level)

3rd option is the least disruptive and easy to reason with.

I want to be able to edit generated resources and use merge logic similar to kpt pkg update. In this example, Namespaces frequently need to be labeled and annotated, so the function in its current form is useless. One needs to create a separate package just with the Team resource, re-rerun the generator, commit and push that package, then kpt pkg get that into the main package and update that each time the function is run. Because kpt doesn't support local packages AFAIK, there can't really be a review/approval workflow around updates to the package containing generator function's output.

some early thoughts:

  • expand_team function can be written as ensure_team form that is idempotent and takes namespace and IAMRoles resources that are part of the input resourceList. So ensure form of the function is sort of acting only on the attributes of resources that are driven by pseudo resource. I see this behavior closer to how kubernetes controllers work. (multiple controllers acting in a coordinated manner and patching the bits they care about). Now obviously, writing ensure form of the function is more work over purely generative function. It's a way of saying functions are taking care of merge part here.
  • kpt supporting merge behavior for functions which is similar to server-side-apply functionality where expand_team and user-edits are treated as changes by different actors on same resources. To do this cleanly, will have to come up with some way of tracking/ownership of changes.

@droot droot added the triaged Issue has been triaged by adding an `area/` label label Oct 20, 2021
@droot
Copy link
Contributor

droot commented Oct 20, 2021

I realized, our SDKs can also make writing ensure (or idempotent) functions easier.

@morgante
Copy link
Contributor

If new resources are generated at the end of pipeline run (note resources are written to the disk post successful pipeline run), then re-run the pipeline on the pkg (this behavior could be configurable at the top pkg level)

This is the direction I think would make the most sense, and would be the simplest for other functions to reason about.

I realized, our SDKs can also make writing ensure (or idempotent) functions easier.

Yes, mainly I think we need an ensure/generate SDK that helps with:

  1. Merging changes between the function, other functions, and user manual edits
  2. Generation and removal of resources. generate-folders is another function that needs to delete managed resources if they are removed from the master resource.

@mikebz
Copy link
Contributor

mikebz commented Nov 4, 2021

Additional context #2435

@bgrant0607
Copy link
Contributor Author

@johnbelamaric
Copy link
Contributor

Yes, mainly I think we need an ensure/generate SDK that helps with:

  1. Merging changes between the function, other functions, and user manual edits

Ok, so I am adding this caveat after writing everything below, which is just a sort of thought process thinking through some of these issues, and I am still not clear on how we achieve this and don't have time to continue thinking about it now. If the musings below are not useful, then feel free to ignore them. Perhaps you already have some more ideas on a better way to do this.

I have a use case for a variant generator that would, for example, punch out a set of variants of a workload that are specialized to particular clusters. It works something like this:

Variant Generator Function
Inputs

  • A selector for a set of resources (think: all resources in a given directory; could even be a package)
  • An input CRD with a array of structs (think: fields with setter values or inputs to bits of more complex logic)

Behavior

  • Select the resources according to the selector, resulting in a set of resources we will call A
  • For each entry in the array
    • Duplicate A
    • Apply transformations based on the struct
    • Store result An in its own directory (i.e., modify the directory annotation)

I'd like the user to be able to subsequently modify An to produce An'. When the function is re-run I do not want to overwrite those edits. This can't be done with the approach above because it simply regenerates the entire An with each run, and so I cannot differentiate a delta between An and An' due to a user vs that due to a change in function inputs (or code).

My initial thought for solving that was to use patch files instead of an array of structs - one set of patches per variant. That is, have the user create a patch that when applied to A, produces An. The function then would simply process each of these patches, and apply those deltas to An' if it exists (or create it by applying the patch to A).

I think that would work, but it shifts the burden of variant generation to the user. Perhaps we can have two functions: one that generates the patch, and one that applies it to create An?

Patch Generator Function
Inputs:

  • A selector for a set of resources
  • An input CRD with a array of structs

Behavior:

  • Select the resources according to the selector to get the set A
  • For each entry in the array
    • Duplicate A
    • Apply transformations based on the struct to get An
    • Calculate a patch Pn to produce An

Variant Generator Function
Inputs:

  • Hmm...some sort of selectors to identify Pn, and An, An' resources?

Behavior:

  • For each Pn
    • If An' exists, apply the patch, otherwise create An' as equal to An

I think with this approach, re-running the function will not overwrite non-conflicting user edits of An'. Conflicting edits are still an issue though. Worse still, we have issues if we modify the generator function or the set of resources in A, because our patches will now potentially not contain some new resource or value from the modification, so that change will never get added to the generated resources. That is, if we add a resource to such that the selected resources, say A1 now have some additional resources than A, the next run will calculate a patch between A1 and A1n, and the patch won't have the new resource, so it will never be applied to An'.

I think we have to take it a step further, and store the patches, along with some sequencing so that we can re-apply them properly ordered. Or maybe even that won't work, but instead we need to store the original An from the previous version of the function. I need to think about it more, but this seems to be getting out-of-hand.

Other approaches?

@morgante
Copy link
Contributor

@johnbelamaric Have you considered modeling this as a set of subpackages? (See go/package-sets.)

@johnbelamaric
Copy link
Contributor

johnbelamaric commented Dec 14, 2021 via email

@morgante
Copy link
Contributor

Not yet, the main blocker right now is the challenges with cyclical kpt fn render — when generators insert new resources (including a Kptfile with new functions) we need a mechanism to add functions on those resources.

@johnbelamaric
Copy link
Contributor

Ah, yes, I handle that in pre-v1.0 kpt with a two-pass pipeline, haven't looked at v1.0 kpt closely enough, hopefully there will be a similar solution available.

@johnbelamaric
Copy link
Contributor

Thinking about this a little more, it does seem that subpackage is the right model. The explicit version in the Kptfile gives us the ancestor for the 3-way merge, so we can make edits via the generator function, via other functions, and via manual user edits of the resulting generated instance.

One issue for generators - shouldn't they end up needing to work something like apply, with respect to pruning, for example? If we alter the generator function or its inputs such that you are getting fewer instances of a given subpackage, you'll need to prune the old packages. We could manage this with function-specific logic (for example, "this generator function owns this directory"), but it seems we should have some utility or at least conventions here.

@morgante
Copy link
Contributor

morgante commented Jan 4, 2022

One issue for generators - shouldn't they end up needing to work something like apply, with respect to pruning, for example?

Yes, though they're actually also often similar to owner references. The approach we've taken thus far is to use an annotation to track the source resource and prune if it's changed.

We should definitely standardize this sort of functionality into the SDK though. It's a pain for each function author to write/maintain the logic and implementations might become inconsistent.

@bgrant0607
Copy link
Contributor Author

Back to the original topic:

Ensure functions make sense to me. That's similar to https://github.com/metacontroller/metacontroller functions.

It could be a supported or at least recommended pattern, like variant constructors (#2590).

However, my eventual conclusion on expand_team_cr was that it wasn't sufficiently useful for declarative use. Instead, I switched to a minimalistic variant-constructor approach, using the ensure pattern for just the Namespace: #2184 (comment)

In order for abstractions need to be worthwhile, they need to dramatically reduce configuration complexity or implement some significant business logic. Otherwise, we should think in terms of manipulating the underlying resource types, with resource-type-specific functions and tools, as imperative tools typically do.

I agree that variant generators could be modeled as generated subpackages, which could leverage the variant constructor pattern and specify variant resources declaratively rather than generating them from scratch. We should spawn another issue for that, if there isn't one already. The variant generation function could generate Kptfiles for the subpackages. We'd then need to trigger cloning of the subpackages, and reinvoke the pipeline to run the variant constructors.

I am generally not a fan of monolithic arrays or maps as ways to express desired variants. They seem appealing for specifying small numbers of variants with small numbers of varying attributes by hand, but are difficult to manage at non-trivial scale. Probably the set of variants would be derived from some input source using some automation. I'd represent them like other generators, using client-side custom resources, one per variant or per dimensional variation value (e.g., region and environment). Convention over configuration is our friend for these scenarios so that users don't need to specify where the varying attributes come from for each variant. Storing sets of varying attributes or "facts" is common in hierarchical parameter stores. We can use CRs (pseudo or actual) or ConfigMaps for this, which also should make custom UX for such use cases simpler to build.

@bgrant0607
Copy link
Contributor Author

For creating individual resources, imperative functions, CLI (e.g. kubectl create -o yaml), UI, etc. seems like the way to go.

@bgrant0607
Copy link
Contributor Author

@bgrant0607
Copy link
Contributor Author

Ah, the merge behavior was previously requested in #954

@droot droot added the p1 label May 27, 2022
@bgrant0607
Copy link
Contributor Author

The question inevitably comes up regarding what to do when a generator function is updated.

It depends on whether changing the behavior of existing uses is desirable or not. It also depends on whether the generator would just support additional optional, parameterized attributes. Helm charts often conditionalize new attributes so that they don't affect existing uses of the chart automatically. Changes of default behavior are also typically undesirable for existing uses. So changing a generator is often not the best way to enforce new behaviors.

That said, we may want to support similar behaviors for upgrades to new generator versions similarly to new upstream package versions. One way to do that would be to update the function version in upstream packages first, then update their downstream packages. We'd want to surface available function upgrades similar to available package upgrades. This would be easier if we separated upstream and downstream functions (#2544) as well as generators and admission control (#3121).

@droot
Copy link
Contributor

droot commented Jun 13, 2022

That said, we may want to support similar behaviors for upgrades to new generator versions similarly to new upstream package versions.

One quick thought -- I can imagine generator function to represent a subset of resources of the package (sort of a slice of the package) sourced by running a specific version of function similar to pkg get <pkg-version. This is very similar to having a subpkg where upstream is a generator-function:version and it's input and changing the version is similar to upgrading to new version of that upstream.

@bgrant0607
Copy link
Contributor Author

The issue of pruning also applies to transformers that change resource identifiers, in the case that the identifier transformation is changed across revisions.

@bgrant0607
Copy link
Contributor Author

Drive-by observation: Infrastructure from Code projects that generate Infrastructure as Code formats are going to have similar challenges. Users will want to customize the generated IaC code/templates, but those changes will be stomped the next time the IaC is regenerated. The idea of generating app config from app code is not a terrible one. "oc new-app" is a simple version of that.

@johnbelamaric
Copy link
Contributor

Something of interest here is what @henderiw and others are doing in Nephio. There, we are using functions to generate resources, as well as using the package conditions function of Porch. This means we need to operate a lot like a controller; we need to create, update, and prune child resources based upon one or more inputs. So, they are working on a controller-runtime like function SDK that handles various aspects of that for the function authors.

Wim - can you link in your slides / maybe the PR? This is very early work, but in a month or so it may be good to discuss in kpt office hours.

@henderiw
Copy link

henderiw commented Apr 6, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hydrate enhancement New feature or request p1 triaged Issue has been triaged by adding an `area/` label
Projects
None yet
Development

No branches or pull requests

6 participants