Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to retain resource object origin data as a label #3979

Closed
monopole opened this issue Jun 10, 2021 · 21 comments
Closed

Option to retain resource object origin data as a label #3979

monopole opened this issue Jun 10, 2021 · 21 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@monopole
Copy link
Contributor

monopole commented Jun 10, 2021

Is your feature request related to a problem? Please describe.

The command

kustomize build {foo}

produces a stream of resources to stdout, while

kustomize build {foo} -o {dir}

writes resources to the given directory, one per file, with a generated name like deplyment_foo.yaml.

In either case, it's hard to track an object in the output back to its point of origin (a filepath, or a filepath in some repo).

Describe the solution you'd like

There are many ways to do this, but perhaps a new build flag --add-label-origin that adds the label

metadata:
  labels:
    config.kubernetes.io/origin/repo: http://github.com/userName/repoName
    config.kubernetes.io/origin/path: path/to/file.yaml

Use a label rather than an annotation as origin is an identifying or selectable characteristic.

This would be a best effort. Some resources would might all come from the same file, some will be generated and not come from a file, some will be generated from data in a file (e.g. a helm chart), etc.

This feature could be the basis of recreating file structure in the output, which would ease code review of changing configuration data.

@monopole monopole added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 10, 2021
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 10, 2021
@monopole monopole changed the title Option to retain resource object origin data as an annotation Option to retain resource object origin data as a label Jun 10, 2021
@bgrant0607
Copy link
Member

In what situation would label selectors for complex paths be useful? Are length and character-set limitations a concern? Annotations would make more sense to me.

Regardless, retaining the file structure, similar to helm template, would be a lot more user-friendly.

@natasha41575 natasha41575 added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jun 10, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 10, 2021
@natasha41575 natasha41575 added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 10, 2021
@natasha41575 natasha41575 self-assigned this Jun 10, 2021
@natasha41575
Copy link
Contributor

natasha41575 commented Jun 14, 2021

I agree that length and character-set limitations might be a concern. This limit seems to apply to both labels and annotations, so I'm not sure if I see any good workarounds for that.

EDIT: I misread the documentation; these limitations are just for label values, not annotation values.

@bgrant0607
Copy link
Member

I don't understand the use case for labels, as opposed to annotations.

@natasha41575
Copy link
Contributor

I think annotations make more sense, I'm wondering if @monopole had a specific use case in mind?

@KnVerey
Copy link
Contributor

KnVerey commented Jun 17, 2021

I agree that this sounds like an annotation rather than a label.

perhaps a new build flag --add-label-origin

Doesn't this violate our "no build-time side-effects" policy? Should it be a field instead?

@bgrant0607
Copy link
Member

+1 to fields rather than flags.

@natasha41575
Copy link
Contributor

natasha41575 commented Jul 7, 2021

If it is a field, how should it look?

A single separate field?

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- ../base

addAnnoOrigin: true

Alternatively, should it be placed under an options field in case we have a number of boolean options that we want to consolidate into a single field?

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- ../base

options:
  addAnnoOrigin: true
  addLabelManagedBy: true

@KnVerey WDYT? Or is it better to keep these fields at the top level?

@natasha41575
Copy link
Contributor

natasha41575 commented Jul 8, 2021

For the sake of thoroughness, here is my full proposal for this issue:

Input

There should be a new field in the kustomization to toggle the option to add this annotation:

options:
  addAnnoOrigin: true

Output

Local files

Resources that are provided to kustomize via local files under the resources field in the kustomization file (or in the kustomization file of any bases) should have the following annotation:

metadata:
  annotations:
    config.kubernetes.io/origin/path: deployment/deployment.yaml

where the path specified in the annotation is relative to the directory upon which kustomize build was invoked. That is, if someone runs kustomize build <DIR>, the path in the annotation will be relative to <DIR>.

Remote files

Resources that are provided to kustomize as a remote base should have the following annotations:

metadata:
  annotations:
    config.kubernetes.io/origin/repo: http://github.com/userName/repoName
    config.kubernetes.io/origin/path: deployment/deployment.yaml

where the repo specified in the annotation is the repository specified under the resources field in the kustomization file, and the path annotation is the path from the root of the repo to the resource file.

Generated resource files

Resources that are generated by kustomize will be a best-effort. For resources that are generated from data within a file, kustomize should add the following annotations (where possible):

metadata:
  annotations:
    config.kubernetes.io/origin/generated-by: path/to/helmChart.yaml

Resources that are generated by fields in the kustomization file should have the following annotations (where possible):

metadata:
  annotations:
    config.kubernetes.io/origin/generated-by: path/to/kustomization.yaml

In both these cases, the path should be relative to the directory upon which kustomize build was invoked.

@KnVerey
Copy link
Contributor

KnVerey commented Jul 9, 2021

How will config.kubernetes.io/origin/path: relate to internal.config.kubernetes.io/path? Same thing, but not dropped as internal reader annotations are? If the writer is set to keep reader annotations, do we output both?

Instead of a variable number of annotations, how about a single origin annotation containing structured data, like we use for function configuration?

Re: the field name, I'm a little worried that options is too generic on its own, especially since we already have a [transformer] configurations field. I'm not feeling inspired with an alternative at the moment, but I agree that the managed by label does feel related enough to be grouped with this.

@natasha41575
Copy link
Contributor

natasha41575 commented Jul 9, 2021

Instead of a variable number of annotations, how about a single origin annotation containing structured data, like we use for function configuration?

I think that's a good idea. How about:

metadata:
  annotations:
    config.kubernetes.io/origin: |
      path: examples/multibases/base/pod.yaml
      ref: v1.0.6
      repo: https://github.com/kubernetes-sigs/kustomize

If we do this, is the relation of this path field to internal.config.kubernetes.io/path still a concern? I imagine if the writer is set to keep reader annotations we would want to output both.

I'm a little worried that options is too generic on its own

Since the original proposal was to have it as a flag to kustomize build, how about buildOptions?

buildOptions:
  addAnnoOrigin: true
  addLabelManagedBy: true

I made a PR for this option: #4065.

@monopole
Copy link
Contributor Author

monopole commented Jul 25, 2021

Sorry for the late review, I was on leave.

I wrote this issue in terms of labels since paths and repo names make sense as label values. K8s uses labels for identifying attributes, for imposing an organizational structure on objects, and for fast selection (e.g. show the resources from repo x).

However, Brian's correct that the 253 character limit on label values can be exceeded here, so annotations it is.

Great discussion above about the form the annotations would take; completely agree. Nice!

The only change I'd suggest is the proposed new kustomization field name. Since everything in a kustomization file is effectively a 'build option', we shouldn't have a field called buildOptions.

  • We're adding information to resource metadata only.
  • It's always an add, so saying add is redundant.
  • It happens during a build (not, say, an out-of-build edit operation), so yes, having build in there makes sense.
  • The information being added is owned by kustomize. File names obviously should be tracked by kustomize rather than manually added by the user to the raw material. The name and value of the managedBy label won't be generally trustworthy in analytics if left to the user to set. Etc.

How about:

buildMetadata: [originAnnotations,  managedByLabels]
  • This field is about adding metadata during a build.

  • The upside to a list of things to enable is that it's briefer than a map of string to boolean (the latter obligates the users to add a correctly spelled true). Downside is that we (maintainers) are obligated to validate list entries post-marshalling, rather than rely on a YAML unmarshaling error like

    "unknown field "managedByLABles"
    

Code becomes

if utils.StringSliceContains(kustomization.buildMetadata, 'managedByLabels') {

instead of

if kustomization.buildOptions.managedByLabels {

but makes life easier for the user.

We'd also have to add a check in kustomizattion.FixKustomizationPostUnmarshalling to make sure the user-provided list has only valid entries.

@KnVerey
Copy link
Contributor

KnVerey commented Jul 26, 2021

buildMetadata: [originAnnotations,  managedByLabels]

👍 I like this

@natasha41575
Copy link
Contributor

buildMetadata: [originAnnotations, managedByLabels]

I like this too, I will update the PR

@marshall007
Copy link

marshall007 commented Aug 18, 2021

Sorry for being a bit late to this discussion. Related to @KnVerey's original point regarding the annotations, if we go with config.kubernetes.io/origin are we going to make corresponding changes to the KRM Function ResourceList interface? Is the expectation that kpt functions would switch from using config.kubernetes.io/path as well?

I'm also unclear on the motivation for buildMetadata fields instead of CLI flags. My opinion is that it would make more sense to have a flag for formatting kustomize build output as a valid ResourceList (conforming to the config.kubernetes.io annotation guidelines by default). This would be an easily consumable representation to implement things like #4090 on top of and, more importantly, something third-party tools could interop with.

Immediate value here would be the ability to pipe the output of kustomize build . --resource-list (or similar) to kpt live apply -, kpt fn eval ..., etc.

I do see the value in the buildMetadata fields as a way to preserve config.kubernetes.io annotations when outputting vanilla resource YAML, it's just not obvious how we might accommodate the runtime flag too.


From an API perspective, I really like the direction that was taken with the new labels transformer. Perhaps it would be more internally consistent (and give the user more control) if we could expose these new options there rather than a completely new field. As a rough example:

# only include managed-by labels on Deployments
labels:
- includeManagedBy: true
  fields:
  - kind: Deployment

annotations:
  includeOrigin: true

@KnVerey
Copy link
Contributor

KnVerey commented Aug 18, 2021

I'm also unclear on the motivation for buildMetadata fields instead of CLI flags.

The motivation is that a flag would violate our policy that there should be no build-time side-effects: https://kubectl.docs.kubernetes.io/faq/kustomize/eschewedfeatures/#build-time-side-effects-from-cli-args-or-env-variables. As a rule of thumb, anything that will change the output (produce a diff that needs to be committed if using gitops) must be driven by a field, whereas flags can be used for things that change whether the output is successfully produced at all (e.g. feature enablement). Note that the existing --reorder flag violates this and will be moved to a field per #3947 .

if we go with config.kubernetes.io/origin are we going to make corresponding changes to the KRM Function ResourceList interface? Is the expectation that kpt functions would switch from using config.kubernetes.io/path as well?

My understanding is that internal.config.kubernetes.io/path retains its current meaning and is independent. Although kio can be configured to retain it, Kustomize considers it internal and does not output it. This new annotation on the other hand is intended for output, not internal use.

My opinion is that it would make more sense to have a flag for formatting kustomize build output as a valid ResourceList

I'm not understanding where this is coming from. This issue isn't suggesting any new output formats, or really anything related to functions.

@marshall007
Copy link

marshall007 commented Aug 19, 2021

I'm not understanding where this is coming from. This issue isn't suggesting any new output formats, or really anything related to functions.

@KnVerey sorry, I came here from #4090 and didn't explain my reasoning well. The motivation for my suggestions was related to tooling inter-op and in particular the cfg commands.

Instead of having an arbitrary internal representation in addition to the config.kubernetes.io annotation guidelines specified for ResourceList, we could just make them the same thing.

Whether or not (and which) "internal" annotations are preserved becomes a concern of whatever process writes to disk. kpt fn sink is the only implementation I'm aware of currently.

To borrow terms from kpt, you could imagine kustomize source|sink (both operating on ResourceLists) as the new API with "legacy" kustomize build being the equivalent of kustomize source . | kustomize sink -.

I hope that makes it clear what I'm thinking in terms of it being part of the CLI vs Kustomization API.

We can get a lot of value out of standardizing on ResourceList. Consider, for example, kustomize source . | kustomize cfg grep .... Further note that kustomize/kpt fn source would be made interchangeable in that example, which I think is rather awesome!

@natasha41575
Copy link
Contributor

As an update for this issue, #4065 has been merged but I am holding off on closing this as there is ongoing discussion.

Further note that kustomize/kpt fn source would be made interchangeable in that example, which I think is rather awesome!

@marshall007 We are actually planning to deprecate kustomize fn source and kustomize fn sink - and we've deprecated many commands in the cfg command group. It is good to know that we can redirect people to kpt fn if they need the fn functionality.

Is the expectation that kpt functions would switch from using config.kubernetes.io/path

Yes, this is on my todo list, to migrate both kpt and kustomize to using internal.config.kubernetes.io/path instead.

Immediate value here would be the ability to pipe the output of kustomize build . --resource-list (or similar) to kpt live apply -, kpt fn eval ..., etc.

I think I'm not following why you are making this feature request on this issue. Could you perhaps provide a more concrete example of the use case with config.kubernetes.io/origin?

@marshall007
Copy link

marshall007 commented Aug 20, 2021

We are actually planning to deprecate kustomize fn source and kustomize fn sink - and we've deprecated many commands in the cfg command group.

I'm aware of and agree with the deprecations that are in work. My understanding from #3953 (comment) was that the read-only commands will/may be updated to perform in-place hydration (so that things like cfg tree print out generated/transformed resources).

Things to note on that:

  1. cfg tree output currently includes the local filepath of the resource
  2. most/all of of the cfg commands can read from either stdin or a directory
  3. a cfg grep command that can be used in imperative function pipelines would be very useful
  4. we could surface even more metadata (in things like cfg tree) as long as the annotations are well specified

If we assume both (1) and (2) are desirable properties of the CLI that we wish to maintain, it follows that:

  1. the Kustomization resource itself needs to be present in hydrated output (but excluded in the final output)
  2. the annotations containing provence information (i.e config.kubernetes.io/*) need be present in hydrated output

Luckily we already have a specification for a data format that makes these guarantees, which is ResourceList and the related annotations.

My naming of kustomize source was deliberate as this would be a new command that performs hydration. Basically the same as kustomize build | kustomize fn source, but with internal annotations preserved.

I think I'm not following why you are making this feature request on this issue.

I will write up a more formal proposal in a separate issue as soon as I have a chance, just pressed for time this week. I originally chimed in here because I am still concerned about the buildMeta addition to the Kustomization API.

The build metadata should always be available for other processes in a pipeline to consume no matter what the user's kustomization looks like.

As I said before, I think specifying the build metadata you include when writing out (to the filesystem or a cluster) should fall well within the responsibilities of the process that writes, not the one(s) that hydrate.

@marshall007
Copy link

/cc @natasha41575

@KnVerey
Copy link
Contributor

KnVerey commented Aug 30, 2021

Since #4065 has merged, let's close this issue. Feel free to continue discussion of how the read-only cfg commands should work in #4090, or open a new issue with any independent proposal. Please note that kustomize build itself is not intended to behave as a KRM function, and exposing internal build metadata by default or emitting a ResourceList instead of a YAML stream would not be feasible changes.

@ringerc
Copy link

ringerc commented Nov 23, 2021

This might not interact ideally with kustomize cfg tree's --field selector. The annotation key config.kubernetes.io/origin does not play nice with the --field option AFAICS and there are no relevant examples in the kustomize cfg tree docs that show how it might be done.

e.g. for object:

kind: ConfigMap
metadata:
  annotations:
    config.kubernetes.io/origin: |
      path: configmap-custom-metrics.yaml
....

I didn't find docs on the field syntax. The sources show that cmd/config/internal/commands/tree.go calls ParseFieldPath in cmd/config/runner/runner.go, from which I eventually worked out that a backslash must be used to escape a . but is not required or permitted for a /, so if you want

 --field 'metadata.annotations.config.kubernetes.io/origin'

you must write

 --field 'metadata.annotations.config\.kubernetes\.io/origin'

Hope this helps somebody else.

P.S. I tried to make a docs fix PR, but it requires me to sign a CRA and it'll take forever to get that through my employer, so I can't just patch the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants