Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patchStrategicMerge elides null and {} #2734

Closed
hornpolish opened this issue Jul 18, 2020 · 25 comments
Closed

patchStrategicMerge elides null and {} #2734

hornpolish opened this issue Jul 18, 2020 · 25 comments

Comments

@hornpolish
Copy link
Contributor

@monopole - sorry for the delay, i was trying to narrow it down, but probably should bring y'all up to date

This is the follow on to #2697

Dropping the -emptyDir: {} leads to a malformed volume declaration in the Deployment object
Dropping creationTimestamp: null might not matter semantically, but it seems odd that a patch that merges is able to remove something

cat >kustomization.yaml <<EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- resource.yaml
patchesStrategicMerge:
- patch.yaml
EOF
cat >resource.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sas-crunchy-data-postgres-operator
spec:
  replicas: 1
  template:
    metadata:
      creationTimestamp: null
    spec:
      serviceAccountName: postgres-operator
      containers:
      - name: apiserver
        image: sas-crunchy-data-operator-api-server
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8443
        envFrom: []
        volumeMounts:
        - name: security-ssh
          mountPath: /security-ssh
        - mountPath: /tmp
          name: tmp
      imagePullSecrets: []
      volumes:
      - emptyDir: {}
        name: security-ssh
      - emptyDir: {}
        name: tmp
EOF
cat >patch.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sas-crunchy-data-postgres-operator
  labels:
    workload.sas.com/class: stateless
spec:
  template:
    metadata:
      labels:
        workload.sas.com/class: stateless
    spec:
      tolerations:
        - key: "workload.sas.com/class"
          operator: "Equal"
          value: "stateful"
          effect: "NoSchedule"
        - key: "workload.sas.com/class"
          operator: "Equal"
          value: "stateless"
          effect: "NoSchedule"
---
EOF

if i kustomize build . on this, - emptyDir: {} as well as creationTimestamp: null gets removed from the object that is built, even though i'm not patching that part of the object.

if i remove/comment the patchesStrategicMerge from kustomization.yaml, then the - emptyDir: {} and creationTimestamp: null passes thru into the result

almost as if those null empty fields are not retained as you build the object in preparation (or during) the patchesStrategicMerge process

@lnovara
Copy link

lnovara commented Jul 23, 2020

I've stumbled upon the same issue, worth noting that 3.8.1 is still affected.

Probably, this issue was introduced with the recent migration from api-machinery to kyaml.

Right now, I've switched back to 3.7.0 which is not affected.

@monopole
Copy link
Contributor

Yes, it's the kyaml.

By default kyaml processing does cleanups that should have no impact on KRM use.
E.g. indents are made canonical, trailing blanks dropped,
quoting might be added or removed based on field types,
map fields may be canonically sorted.

Dropping fields with value null feels like it belongs in the cleanup
category. Dropping a field that has an explicitly empty map or array
also feels like a cleanup, but we've established that the latter
can cause problems for code that wants an array to append to.
We've had no reports that this is a problem for maps.

So options are

  1. retain [] (which we already do) but drop {} and null.
  2. retain [] and {}, but drop null.
  3. retain all three.

With any of these options, a user would still have the option to run the yaml
through a filter that aggressively removed any or all of these cases.

It's just a question of what the default should be.

I like 2 (drop null), but sounds like people like 3?

@hornpolish
Copy link
Contributor Author

i don't have an example with null that leads to a malformed kubernetes object if it is dropped, so i'm indifferent on that one.
the emptydir volume example speaks to retaining {} -- dropping it produces objects that don't meet the spec
retaining [] allows people to provide bases that are more easily extended by others in a durable (over time) way

definitely 2. 3 gets us closer to the behaviour we had up the kustomize.370. What do others think?

@monopole
Copy link
Contributor

@Shell32-Natsu , let's let this sit for a week to gather comments.

Default decision is option 2 (which requires a PR to allow {}).

@Shell32-Natsu
Copy link
Contributor

@monopole Got that

hornpolish pushed a commit to hornpolish/kustomize that referenced this issue Jul 25, 2020
hornpolish pushed a commit to hornpolish/kustomize that referenced this issue Jul 28, 2020
monopole added a commit that referenced this issue Aug 3, 2020
@linkvt
Copy link

linkvt commented Aug 13, 2020

Hi @monopole ,

yes this is definitely a bug, an empty object is completely different compared to null which specifies the absense of something. This is the same e.g. in JSON and is also defined like that in the YAML Spec: https://yaml.org/spec/1.2/spec.html#id2803362
Edit: I agree with option 2, [] and {} should be kept, null can be discarded.

Also examples of https://yaml.org/spec/1.2/spec.html#id2805071

A null: null
Also a null: # Empty
Not a null: ""

For us this causes a huge problem with Prometheus Operator as they require an empty object so that rules are picked up everywhere:

            ruleNamespaceSelector:
              description: A label selector is a label query over a set of resources.
                The result of matchLabels and matchExpressions are ANDed. An empty
                label selector matches all objects. A null label selector matches
                no objects.

See

The interesting thing is:
patching an empty object works, but as soon as another patch is applied on the same resource the previously empty object is discarded, I guess that a cleanup happens before the patch.

We would really appreciate a fast fix, thanks!

@monopole
Copy link
Contributor

monopole commented Aug 14, 2020

I believe HEAD is consistent with option 2 right now, but i'd like to review the tests again to confirm this.

We'll be doing a new release shortly

@icy
Copy link

icy commented Aug 27, 2020

I come across here after an outage on our system. When kustomize 3.8.1 is used to deploy gloo (https://github.com/solo-io/gloo), the default setting entries are removed from the rendered output. Below the diff. generated with kustomize-3.4.0 which shows the removed entries:

diff /tmp/LIVE-522819448/gloo.solo.io.v1.Settings.gloo.default /tmp/MERGED-616571511/gloo.solo.io.v1.Settings.gloo.default
8c8
<   generation: 7
---
>   generation: 8
33a34,36
>   kubernetesArtifactSource: {}
>   kubernetesConfigSource: {}
>   kubernetesSecretSource: {}

gloo then goes with the new setting that missed 3 entries (kubernetesArtifactSource*), and it's actually down.

I'm looking forward to a new release of kustomize that addresses this issue. In the mean time, I'd go excluding some manifests from kustomize-3.8.1 output.

Update:

Fyi, gloo doesn't have very strict CRD/setting validation here (FIMXE). When some setting is wrong they can keep running but not functioning. On the otherhand, if kustomize-3.4.0 is used, there would be another problem with gloo, though not critical

Below is the full object manitest they need.

apiVersion: v1
items:
- apiVersion: gloo.solo.io/v1
  kind: Settings
  metadata:
    annotations:
    labels:
      app: gloo
    name: default
    namespace: gloo
  spec:
    discovery:
      fdsMode: WHITELIST
    discoveryNamespace: gloo
    gateway:
      readGatewaysFromAllNamespaces: false
      validation:
        allowWarnings: true
        alwaysAccept: true
        proxyValidationServerAddr: gloo:9988
    gloo:
      disableKubernetesDestinations: false
      disableProxyGarbageCollection: false
      invalidConfigPolicy:
        invalidRouteResponseBody: Gloo Gateway has invalid configuration. Administrators
          should run `glooctl check` to find and fix config errors.
        invalidRouteResponseCode: 404
      xdsBindAddr: 0.0.0.0:9977
    kubernetesArtifactSource: {}
    kubernetesConfigSource: {}
    kubernetesSecretSource: {}
    refreshRate: 60s

@icy
Copy link

icy commented Sep 1, 2020

I believe HEAD is consistent with option 2 right now, but i'd like to review the tests again to confirm this.

We'll be doing a new release shortly

Hi @monopole,

is that fixed in 3.8.2 https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv3.8.2 ?

Many thanks

@Shell32-Natsu
Copy link
Contributor

@icy Yes the fix is in 3.8.2.

@icy
Copy link

icy commented Sep 1, 2020

@icy Yes the fix is in 3.8.2.

Thanks a lot for your information, and thanks the team for the fix.

I can confirm that my gloo-related problem is gone with 3.8.2

@afirth
Copy link
Contributor

afirth commented Sep 3, 2020

Thanks for fixing this so quickly. Was deploying prometheus operator too and had a hell of a time figuring out what happened as I upgraded both kustomize and operator in the same PR. Lesson learned. 3.8.2 fixed

@Shell32-Natsu
Copy link
Contributor

/close

Please reopen if you want to discuss further.

@k8s-ci-robot
Copy link
Contributor

@Shell32-Natsu: Closing this issue.

In response to this:

/close

Please reopen if you want to discuss further.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@goooogs
Copy link

goooogs commented Aug 25, 2021

@Shell32-Natsu , let's let this sit for a week to gather comments.

Default decision is option 2 (which requires a PR to allow {}).

Hi @monopole , when I want to delete some default values in helm chart, it MUST use the null value.

for example:

# Helm Chart
...
readinessProbe:
  httpGet:
    path: /index.html
    port: 8080
...

httpGet probe is conflict with tcpSocket, so if I want to use tcpSocket probe, I must set httpGet to null to delete the default vaule (Although it can be done in the command line with --set xxx=null):

# values.yaml
...
readinessProbe:
  tcpSocket:
    port: 8080
  httpGet: null
...

So, I NEED option 3 ^_^

@langecode
Copy link

@goooogs did you find a solution? Just stumbled upon that very same issue - trying to reset a Helm release vault using null

@mkoertgen
Copy link

Hits me too. Can we reopen?

@coperator
Copy link

Hits me too. :-( Has anybody an idea if there is a workaround?

@afirth
Copy link
Contributor

afirth commented May 6, 2022

if you are trying to patch the output of helm template to remove something from the final manifest, just use a json6902 patch with behavior remove

example:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
[...]
patches:
  - target: # remove all Deployment container args since we override everything using a configmap
      kind: Deployment
      name: external-dns
    patch: |-
      - op: remove
        path: "/spec/template/spec/containers/0/args"

I think that's what all the subsequent commenters are asking for. If not, please provide a concrete use case with code
@goooogs @langecode @mkoertgen @coperator

@cep21
Copy link

cep21 commented May 6, 2022

@afirth I believe they want to create the helm values itself via kustomize. My use case is using kustomize to generate a flux2 HelmRelease CRD. I normally would null out values I don't want set in the helm chart, but if the output goes via kustomize, the null is removed.

@cep21
Copy link

cep21 commented May 6, 2022

I have a reproduction case here: #4628

@afirth
Copy link
Contributor

afirth commented May 6, 2022

Holy cannoli. I see your use case (fluxcd helmRelease values in a CRD) but not sure it's going to get fixed for this, which is essentially providing flat files to go template from a inside a CR.
Proposed workarounds for your case:

  1. use flux ValuesFrom and provide a static yaml file as a configmap.
  2. call helm template | kustomize and send the final manifests to flux instead of driving it, well, backwards. Sample makefile
  3. use post-render support in fluxcd to call kustomize again and undo the mess
  4. patch the upstream helm chart to treat the empty string "" or object {} as null, if it doesn't already

Hope one of these helps you and others.

@coperator
Copy link

@cep21 Thanks for explaining the use case. I have exactly the same ( flux2 HelmRelease CRD)
@afirth Thanks for the proposed workarounds! I was not aware of point 3 (post-render support in fluxcd) and will give it a try.

@MaurGi
Copy link

MaurGi commented Jun 2, 2022

please reactivate this -

currently reproing:

When trying to change a variable in a deployment from :

        - name: VAR
          valueFrom:
            configMapKeyRef:
              key: VAR
              name: cm-config

to:

        - name: VAR
          value: "value"

if you try to kustomize build | kubectl apply you get the error:

spec.template.spec.containers[0].env[0].valueFrom: Invalid value: "": may not be specified when value is not empty

the only way to make that patch work is to have:

        - name: VAR
          value: "value"
          configMapKeyRef: null

but of course configMapKeyRef: null gets removed by kustomize, so we can never get that without deleting the deployment, which causes a downtime to my customers.

thx!

@afirth
Copy link
Contributor

afirth commented Jun 2, 2022

please reactivate this -

currently reproing:

When trying to change a variable in a deployment from :

        - name: VAR
          valueFrom:
            configMapKeyRef:
              key: VAR
              name: cm-config

to:

        - name: VAR
          value: "value"

if you try to kustomize build | kubectl apply you get the error:

spec.template.spec.containers[0].env[0].valueFrom: Invalid value: "": may not be specified when value is not empty

the only way to make that patch work is to have:

        - name: VAR
          value: "value"
          configMapKeyRef: null

but of course configMapKeyRef: null gets removed by kustomize, so we can never get that without deleting the deployment, which causes a downtime to my customers.

thx!

@MaurGi
Values defined in env override envFrom, and inside either the last defined takes precedence. So in your case you can just add another key below it.

        - name: VAR
          valueFrom:
            configMapKeyRef:
              key: VAR
              name: cm-config
        - name: VAR
          value: "I will override you!"

Note this is for the final pod spec, I'm not sure how strategic merge patch will cope. But for example a json6902 add on the end (-) should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests