Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty yaml documents should be ignored when importing lists #3464

Open
uhthomas opened this issue Sep 25, 2024 · 11 comments
Open

Empty yaml documents should be ignored when importing lists #3464

uhthomas opened this issue Sep 25, 2024 · 11 comments

Comments

@uhthomas
Copy link
Contributor

What version of CUE are you using (cue version)?

$ cue version
cue version 0.9.2

go version go1.22.5
      -buildmode pie
       -compiler gc
       -trimpath true
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.9.2

Does this issue reproduce with the latest stable release?

Yes.

What did you do?

Import a list of yaml documents and transform it with -l.

❯ cue import -l "strings.ToLower(kind)" --list abc.yaml
error evaluating label strings.ToLower(kind): reference "kind" not found
abc.yaml
---
kind: Deployment
---
---
kind: Deployment

What did you expect to see?

It would be nice if this just didn't add anything to the list instead of failing. I use cue import a lot to import Helm charts and often, because Helm is just a bad text templating engine, it will include empty yaml documents. I therefore have to find the empty documents myself and run the import again.

What did you see instead?

error evaluating label strings.ToLower(kind): reference "kind" not found

@uhthomas uhthomas added NeedsInvestigation Triage Requires triage/attention labels Sep 25, 2024
@mvdan
Copy link
Member

mvdan commented Sep 26, 2024

This seems reasonable. Do you want to send a patch with a test?

@mvdan mvdan removed the Triage Requires triage/attention label Sep 26, 2024
@haoqixu
Copy link
Contributor

haoqixu commented Sep 30, 2024

I would like to take a look at this issue

@haoqixu
Copy link
Contributor

haoqixu commented Oct 1, 2024

The YAML decoder decodes an empty YAML as null, making it indistinguishable from an actual null. Would it be reasonable to ignore null when importing lists?

@mvdan
Copy link
Member

mvdan commented Oct 1, 2024

@haoqixu yes, I think that's fine. The YAML spec strongly hints that an empty document is equivalent to a null value.

@uhthomas
Copy link
Contributor Author

uhthomas commented Oct 1, 2024

What is the expected behavior when a document is not empty? Is it possible to handle missing keys gracefully for the general case? Say:

❯ cue import -l "strings.ToLower(metadata.namespace)" --list abc.yaml
---
metadata:
    namespace: some-namespace
---
metadata:
    name: some-name # missing namespace
---
metadata:
    namespace: some-namespace

@mvdan
Copy link
Member

mvdan commented Oct 1, 2024

If multiple behaviors are wanted depending on the use case, we could always rethink the flag slightly so that --list is equivalent to --list=all, and add another mode like --list=nonnull or --list=nonempty. Hopefully the added UX complexity is not needed though.

@mvdan
Copy link
Member

mvdan commented Nov 28, 2024

I wrote a testscript to think through this a bit more before we merge the change:

exec cue import --outfile=- --list                              input-noempty.yaml
exec cue import --outfile=-        --path=strings.ToLower(kind) input-noempty.yaml
exec cue import --outfile=- --list --path=strings.ToLower(kind) input-noempty.yaml
exec cue import --outfile=- --list --path=fixedpath:            input-noempty.yaml

exec cue import --outfile=- --list                              input-withempty.yaml
exec cue import --outfile=-        --path=strings.ToLower(kind) input-withempty.yaml
exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
exec cue import --outfile=- --list --path=fixedpath:            input-withempty.yaml

-- input-noempty.yaml --
kind: Deployment
---
kind: Deployment
---
kind: Other

-- input-withempty.yaml --
---
kind: Deployment
---
---
kind: Deployment
---
kind: Other

As of master, the cases with empty documents fais when using --path, as reported:

> exec cue import --outfile=- --list                              input-withempty.yaml
[stdout]
[{
	kind: "Deployment"
}, null, {
	kind: "Deployment"
}, {
	kind: "Other"
}]
> exec cue import --outfile=-        --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:7: unexpected command failure
> exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:8: unexpected command failure
> exec cue import --outfile=- --list --path=fixedpath:            input-withempty.yaml
[stdout]
fixedpath: [{
	kind: "Deployment"
}, null, {
	kind: "Deployment"
}, {
	kind: "Other"
}]

With https://review.gerrithub.io/c/cue-lang/cue/+/1202049 at patchset 3 on top of master, the results with withempty show:

> exec cue import --outfile=- --list                              input-withempty.yaml
[stdout]
[{
	kind: "Deployment"
}, null, {
	kind: "Deployment"
}, {
	kind: "Other"
}]
> exec cue import --outfile=-        --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:7: unexpected command failure
> exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
[stdout]
deployment: [{
	kind: "Deployment"
}, {
	kind: "Deployment"
}]
other: [{
	kind: "Other"
}]
> exec cue import --outfile=- --list --path=fixedpath:            input-withempty.yaml
[stdout]
fixedpath: [{
	kind: "Deployment"
}, {
	kind: "Deployment"
}, {
	kind: "Other"
}]

The first case looks oddly inconsistent now, because we're not ignoring the empty document in that case. On one hand, keeping the nulls in means that we're more directly representing the list of documents from the original YAML. On the other hand, it's a bit odd that adding --path makes some of the documents disappear, even when the path is fixed, like the fourth example.

The second case looks like it's still a bug; we should not fail, for the sake of consistency.

The third case looks OK; we are now ignoring the empty document as discussed.

My thinking is as follows: --list and --files should always ignore empty documents in multi-document inputs. --path, when used on its own, needs to be a bit more intelligent: if all of the input documents fail with the --path expression, then it should report an error, because the user probably made a typo or mistake with the expression. However, if just some but not all of the documents fail to look up the path, then we should ignore those documents as they could be null or otherwise missing the field.

@haoqixu
Copy link
Contributor

haoqixu commented Dec 2, 2024

My thinking is as follows: --list and --files should always ignore empty documents in multi-document inputs. --path, when used on its own, needs to be a bit more intelligent: if all of the input documents fail with the --path expression, then it should report an error, because the user probably made a typo or mistake with the expression. However, if just some but not all of the documents fail to look up the path, then we should ignore those documents as they could be null or otherwise missing the field.

This approach seems reasonable to me.

@mvdan
Copy link
Member

mvdan commented Dec 2, 2024

After discussing with @rogpeppe and @mpvl we agreed on a simpler approach and smaller fix: make --path skip over null values (such as empty YAML documents) when the --path argument involves a reference. That is, with --path=foo or --path=strings.ToLower(kind), but not with --path=fixedpath:.

We came to the conclusion that always ignoring null values in --list is not ideal, because that loses some information, and one can always filter out the nulls later if they wish to as part of interpreting the data.

@haoqixu are you happy to update https://review.gerrithub.io/c/cue-lang/cue/+/1202049 accordingly? Please also add more tests, in line with the testscript I shared above, to make sure we do what we would expect in each scenario.

@mvdan
Copy link
Member

mvdan commented Dec 3, 2024

I also filed #3608 for an idea we had to extend the YAML encoding a bit too, so that it is able to omit empty or null documents entirely from an input. However, that's not needed to resolve the issue here with cue import --path.

@haoqixu
Copy link
Contributor

haoqixu commented Dec 3, 2024

After discussing with @rogpeppe and @mpvl we agreed on a simpler approach and smaller fix: make --path skip over null values (such as empty YAML documents) when the --path argument involves a reference. That is, with --path=foo or --path=strings.ToLower(kind), but not with --path=fixedpath:.

We came to the conclusion that always ignoring null values in --list is not ideal, because that loses some information, and one can always filter out the nulls later if they wish to as part of interpreting the data.

@haoqixu are you happy to update https://review.gerrithub.io/c/cue-lang/cue/+/1202049 accordingly? Please also add more tests, in line with the testscript I shared above, to make sure we do what we would expect in each scenario.

No problem. I have submitted another CL to add tests and will update CL 1202049 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants