Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize YAML manifests before running diff #273

Merged
merged 2 commits into from
Jun 4, 2021

Conversation

tommasopozzetti
Copy link
Contributor

@tommasopozzetti tommasopozzetti commented Apr 14, 2021

The current behavior of helm-diff is to show diffs even when the YAML files are semantically identical but they contain style
differences, such as indentation or key ordering.

As an example, the following two YAMLs

apiVersion: v1
kind: Service
metadata:
  name: app-name
  labels:
    app.kubernetes.io/name: app-name
    app.kubernetes.io/managed-by: Helm
spec:
  ports:
    - name: http
      port: 80
      targetPort: http
  selector:
    app.kubernetes.io/name: app-name
  type: ClusterIP
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: app-name
  name: app-name
spec:
  ports:
  - name: http
    port: 80
    targetPort: http
  selector:
    app.kubernetes.io/name: app-name
  type: ClusterIP

would be showing the following diffs

  apiVersion: v1
  kind: Service
  metadata:
-   name: app-name
    labels:
-     app.kubernetes.io/name: app-name     
      app.kubernetes.io/managed-by: Helm
+     app.kubernetes.io/name: app-name
+   name: app-name
  spec:
    ports:
-     - name: http
-       port: 80
-       targetPort: http
+   - name: http
+     port: 80
+     targetPort: http
    selector:
      app.kubernetes.io/name: app-name
    type: ClusterIP

even though they are semantically the same.

This commit exploits the reordering and normalization that is performed by the yaml package when marshaling, by unmarshaling and re-marshaling all manifests before running the diff, thus eliminating the issue.
This PR should address and fix #257.

@davidholsgrove
Copy link

Hi @databus23 any chance we could get this PR and #268 in a new plugin release?

@scott-grimes
Copy link

👍 this would be fantastic, greatly decrease signal to noise ratio when comparing diffs

@tommasopozzetti
Copy link
Contributor Author

@mumoshu Any chance you could take a look at this and possibly create a new plugin release?
Thanks!

@databus23
Copy link
Owner

databus23 commented May 10, 2021

Hello, big sorry for being late to the party. I just haven't found the time recently to work on this project.

I have some concerns about this PR.

  1. In what scenario is this a problem? We have this plugin in use on a daily basis for a lot of charts and I have never noticed this to be an issue. The templates used to render the chart should produce a consistent output. I never experienced something like ordering issues unless the order is changed explicitly in the template.
  2. Normalizing the output by default has the downside the the rendered diff does not match the source template (or the rendered output from the template). Especially for big objects like the container spec it can become difficult to reconcile between the diff and the template. Unless the source template is written in the normalized form, which almost never is the case in my experience.

@scott-grimes
Copy link

  1. In my case we are migrating a series of charts to use a shared library chart which normalizes a substantial portion of each k8s objects configuration (labels, annotations, etc). In this case there are often several dozen k8s objects, some with quite large diffs, that need to be verified. Each object I encounter has dozens of lines which show a "difference" when in fact there is none. I want to ensure when I swap the chart version out that none of the components running in k8s will "really" be different.

Especially for big objects like the container spec it can become difficult to reconcile between the diff and the template. Unless the source template is written in the normalized form, which almost never is the case in my experience.

  1. I would actually consider this an upside. It's very rare that I come across a chart that's written in the normalized form. Having some way of normalizing would be a great way to ensure consistency when writing charts. In order to not introduce breaking changes perhaps this feature could be added as a command flag? I'm not concerned with the diff matching the original template, but I do want to ensure that the objects are equivalent when shipped up to our k8s cluster. Am I wrong in assuming thats the primary use here?

@tommasopozzetti
Copy link
Contributor Author

tommasopozzetti commented May 10, 2021

@databus23 Thank you for taking a look at this!
My specific scenario is that we use helm diff a lot during development of new charts to check diffs with what is deployed but we have some automation performing the actual helm upgrade. This automation also performs some post-processing of the templates which sometimes results in key ordering and/or indentation changes. This makes it extremely hard to run helm diff because the output becomes huge due to these changes being displayed even though they are not actual changes.
A more common scenario might also be the case in which some refactoring is done on the chart which introduces key ordering/indentation changes. These again would pop up in the helm diff output even though Kubernetes wouldn't really react to these changes, making it harder to spot real diffs.
Point 2 sounds like a reasonable concern which I had not thought about since I mainly use the plugin to figure out actual changes in deploying a new version, rather than needing to match these diffs with the actual template. It sounds like there are pros and cons to this approach. Would it then make sense, rather than making this behavior absolute, to make it configurable via some helm diff flag like --normalize? If the flag is passed in, the plugin will perform the normalization, otherwise it will behave as it used to. This way the user can choose based on his/her specific needs.

@tommasopozzetti
Copy link
Contributor Author

Looks like @scott-grimes beat me to the punch with the same suggestion 😆

@davidholsgrove
Copy link

Hi @databus23 any chance we can have this merged and a new cut of helm-diff released?

@databus23
Copy link
Owner

I still have concerns. I get the stated point about occasional noisy diffs when something is refactored but that's not the majority of diffs I'm experencing daily in CI pipelines. Most of the times the diffs are a couple of lines small. In those cases having everything reordered and not looking like the original template seems confusing to me.

I guess it all depends on the actual charts that are used and how frequent substantial changes to the chart are committed.

I would much prefer this to be an option and not change the output by default.

The current behavior of helm-diff is to show diffs even when the
YAML files are semantically identical but they contain style
differences, such as indentation or key ordering.

As an example, the following two YAMLs would be showing diffs:
```
apiVersion: v1
kind: Service
metadata:
  name: app-name
  labels:
    app.kubernetes.io/name: app-name
    app.kubernetes.io/managed-by: Helm
spec:
  ports:
    - name: http
      port: 80
      targetPort: http
  selector:
    app.kubernetes.io/name: app-name
  type: ClusterIP
```
```
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: app-name
  name: app-name
spec:
  ports:
  - name: http
    port: 80
    targetPort: http
  selector:
    app.kubernetes.io/name: app-name
  type: ClusterIP
```
even though they are semantically the same.

This commit exploits the reordering and normalization that is performed
by the yaml package when marshaling by unmarshaling and re-marshaling
all manifests before running the diff, thus eliminating the issue.
@tommasopozzetti
Copy link
Contributor Author

@databus23 I pushed another commit making the normalization behavior configurable with the --normalize-manifests flag which defaults to false to maintain backwards compatibility.

@databus23 databus23 merged commit fae3716 into databus23:master Jun 4, 2021
@databus23
Copy link
Owner

Thanks!

@123BLiN
Copy link

123BLiN commented Jul 19, 2021

thanks! with this change all issues I was aware of fixed for me, I think it worst to have new release with it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalising YAML before diffing
5 participants