Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tooling: native embed and export #2031

Open
myitcv opened this issue Oct 24, 2022 · 18 comments
Open

tooling: native embed and export #2031

myitcv opened this issue Oct 24, 2022 · 18 comments
Labels
FeatureRequest New feature or request
Milestone

Comments

@myitcv
Copy link
Member

myitcv commented Oct 24, 2022

Two very common use cases are:

  1. Using cue import -l "$path" to "place" some JSON/Yaml/... at a given path in a CUE configuration.
  2. Using cue export -e "$expr" --out $format to take part of a CUE wider config graph and export it to JSON/Yaml/... ready for consumption by a non-CUE tool/system.

(Here, $path is a shell-like variable syntax used to indicate "some path", similarly for $expr standing in for "some CUE expression").

Note that whilst in some situations both use cases appear in tandem, this is not a requirement. The two use cases are orthogonal.

If use case 1 were a one-off import to CUE, it would be relatively straightforward to run cue import and be done, removing the original JSON/Yaml file. However, in some situations it be necessary to leave the original JSON/Yaml file as the source of truth. For example, if some other process generates/maintains that file. In this scenario, it becomes a burden to have to repeatedly re-run cue import to ensure that the CUE configuration is current with respect to the source JSON/Yaml.

In a similar way, use case 2 is in practice never a one-off command. If CUE is being used to maintain the source of truth for the target tool/system, then cue export will need to be run on every change of the source CUE to ensure that the target tool/system sourcing the output JSON/Yaml reads a current version. The burden in this scenario comes from needing to maintain multiple cue export commands somewhere, either in a bash script/similar, or a cue cmd script. Whilst the cue cmd solution will absolutely work there are two drawbacks:

  • Writing a cue cmd command is incredibly verbose.
  • The declaration of intent is a long way from the actual configuration itself (i.e. in a different file).

This issue tracks adding support for a more native embed (use case 1) and export (use case 2) facility.

The word "embed" draws inspiration from Go's embed, and the use of //go:embed directives. It is likely desirable that CUE support a similar directive (comment)-based approach for both use cases 1 and 2.

@myitcv myitcv added the FeatureRequest New feature or request label Oct 24, 2022
@myitcv
Copy link
Member Author

myitcv commented Oct 24, 2022

Previous brainstorming with @mpvl around use case 1 resulted in the following ideas:

// The use of imports to cover use case 1 appear problematic/
// Including for completeness' sake.
import bar “yaml+jsonschema:./foo/bar.yaml” // Is this a url?
import bar “yaml+jsonschema+fs:./**/bar.yaml” // Is this a url?

// An import builtin also seems "wrong"
bar: import(“./foo/bar.yaml”, encoding: “yaml”, interpretation: “jsonschema”)
bar: import(“./foo/bar.yaml”, type: “yaml+jsonschema”)

// An approach that uses attributes as directives is attractive. Exploration
// of various different forms below. 

bar: _ @embed(foo/bar.toml)

bar: {
      @embed(string, bar/file.json)
      @embed(bytes, bar/file.json)
      @embed(glob, bar/file.json)
      @embed(fs, bar/file.json)

      @embed(bar/file.json)
      @embed(jsonschema: bar/*.json, mapping: fs) // <pathString>: <value>
      @embed(foo/*.yaml, mapping: flat)  // path/to/files: <value
      [string]: #Foo
}

bar: _ @embedfs(foo/*.yaml, jsonschema:bar/*.json)
bar: [string]: #Foo

bar: foo: [Filename=string]: _

Use case 2 could similar be solved via attributes:

xconfig: #Config & {
    @export(json, "xconfig.json")
    
    // ...
}

@cedricgc
Copy link
Contributor

I took the time to read about the Go embed package which was enlightening.

One of the main differences for Go and CUE is that the there is a set compile step for programming language (where Go can embed data into a program). With CUE, that timing is not clear, and CUE users may want control of when IO happens.

@cedricgc
Copy link
Contributor

Question: One point that is unclear is when the syncing IO happens, since would that not be a violation of CUE hermiticity?

@cedricgc
Copy link
Contributor

cedricgc commented Nov 5, 2022

Question: One point that is unclear is when the syncing IO happens, since would that not be a violation of CUE hermiticity?

Answered by @mpvl:

@cedricgc - I think it is because other files would be parsed as the same way as CUE files themselves
@mpvl - Yes. The set of files must be within the cue.mod purview, just like the cue files, and are thus equally static.

@cedricgc
Copy link
Contributor

cedricgc commented Nov 5, 2022

For the attributes arguments, I think taking protocol schemas would be more flexible

eg @embed(json:///bar/file.json) for embedding a JSON graph (assuming cue.mod is considered the root directory)

I think designing it this way allows for more schemes in the future that are not based on reading from the filesystem but also network/database. For example a scheme to embed data from a remote system using the query language to pick out the 'subgraph`

Referring to cue help filetypes I think this can be compatible with how we treat filetypes on the command line and can also qualify inputs with multiple tags (eg. openapi+yaml:// or json+data://)

@myitcv
Copy link
Member Author

myitcv commented Nov 7, 2022

For the attributes arguments, I think taking protocol schemas would be more flexible

As discussed offline, we definitely want/need to support specifying the filetype in some way. However, using a URI scheme has (to my mind at least) the unfortunate side effect of suggesting we support (module) absolute paths. Using the Go embed approach as a reference, only relative file paths are supported:

The patterns are interpreted relative to the package directory containing the source file.

@kghenderson
Copy link

kghenderson commented Nov 13, 2022

personally, i'm also open to an ultra-simplified variant where you can only embed/import to an unevaluated string
this gets around the order evaluation problem and treats the content as just a value which you can process and validate using regular cue constructs. the above i still consider to be an "import" case as opposed to simple, raw "embed"

these should just load and error like any other value, outside of tools (which separates this use case).
this particular case isn't for transforms or data processing.


PackageDoc: string  @embed _about.md 

TemplateText: string  @embed mytext.tmpl

@myitcv myitcv added the Discuss Requires maintainer discussion label Feb 8, 2023
@myitcv
Copy link
Member Author

myitcv commented Apr 14, 2023

Adding a further note here: this solution should at least consider the case presented in #2346.

@myitcv
Copy link
Member Author

myitcv commented Apr 14, 2023

Also noting an exchange with @kghenderson in which he observed there is a parallel between the concept of @embed and the read functions Go's os package, like environment variables. That's not to say the concept of @embed should be abstracted to something more general, just an observation that we might also want/need something similar for those read functions. Because ultimately, @embed is os.ReadFile().

@nyarly
Copy link

nyarly commented Apr 14, 2023

An @embed directive would be extremely welcome. We're definitely using something like the cue import command above to make this work already, and reducing the processing that has to happen would be ideal.

One use case we have, though, involves extracting data from CRDs - current we use yq to extract the schema part, and separately pull out GVK stuff and then integrate them. We might be able to get something useful if @embed took a subresource path? e.g.

let schema = @embed(jsonschema+yaml, "./crds/stringsecrets.yaml", "spec.versions.v1alpha1.schema")

or something?

Maybe what's called for is an extra tag - @interpret or something?

let crds = @embed(yaml, "./crds/*.yaml")

for path, crd in crds {
  for version, schema in crd.spec.versions {
    (version): (crd.names.kind): @interpret(jsonschema, schema.openAPIV3Schema)
     (version): (crd.names.kind): {
      apiVersion: "\(crd.spec.group)/\(version)"
      kind: "\(crd.names.kind)"
    }
  }
}

Alternatively, maybe it wants to be encoding/jsonschema -> jsonschema.Parse(...) or something?

The critical thing, I think, is that I have (in this case) JSONSchema embedded in YAML, so to work with it in CUE, I'd want to @embed the YAML to get CUE, extract the JSONSchema, and then process it again to get (different) CUE.

@myitcv
Copy link
Member Author

myitcv commented Apr 18, 2023

Thanks for the use case, @nyarly. I think that fits with a later iteration of @embed and @export.

I'm tentatively marking an initial version of this proposal as v0.7.0, pending working on a design draft with @mpvl and @rogpeppe this week at KubeCon EU.

The initial version would be bytes only for embed and export. That would mean (@nyarly and others) that any transformation inbound/outbound would need to happen via other fields or let declarations.

The main goal of the design draft for the initial version is:

  • Get something out in a timely fashion, goal of v0.7.0 for implementation
  • Not back ourselves into a corner with respect to future enhancements

The second point is key. We should not preclude the kind of syntactic sugar that @nyarly imagines above, and that we have speculated about in the original notes in this discussion. Because ultimately it should, for example, be possible and easy for someone to say "embed this file at this point treating its contents as JSON".

@myitcv myitcv removed Discuss Requires maintainer discussion Re-milestone labels Apr 18, 2023
@myitcv
Copy link
Member Author

myitcv commented Apr 21, 2023

Adding an observation here related to cue import --recursive, and issues like #1209. Recursive import is imprecise because there are heuristics that guess whether a string field is YAML, for example. To make such imports precise we could follow a similar approach to the text proto adapter where a schema helps to make precise what is being imported. i.e. at various paths, types etc are specified. This would be similar to cue vet -d. Noting this observation here in case there is any overlap with the embed/export discussion here.

@myitcv
Copy link
Member Author

myitcv commented Jun 14, 2023

Milestone moved to v0.7.x, in order that we focus on performance and disjunction related changes first in the v0.7 series, and that get to this whilst people are trying out early alphas of v0.7.0.

@phoban01
Copy link

Any further progress to share on this?

@myitcv
Copy link
Member Author

myitcv commented May 22, 2024

@mpvl is currently entirely focussed on performance work as part of the evaluator rewrite. See #2850 for the ~fortnightly updates on progress on that front. When work on that front settles we are going to publish a design doc/proposal on how native embed will work with CUE, along with an experimental implementation.

cueckoo pushed a commit that referenced this issue Jun 25, 2024
Exposing ParseFileAndType. For the embedding
proposal, file and type are specfied separately
and do not need to be parsed as on the
command line.

Issue #2031

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: Ib02f845d503edf1d78834a1ff2a0c224cc936748
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196716
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
cueckoo pushed a commit that referenced this issue Jun 25, 2024
Right now, we only allow one type of extern attribute in
a file. At the file level, we define @extern(kind). Fields
within the file can then be associated with an @extern
attribute that is interpreted as defined by kind.

This approach may work for lower-level functionality like
support for WASM, but it seems a bit unintuitive for embed.
Instead, we suggest that after a file-level @extern(kind)
declaration the field attributes take the form @kind(). This
is what is implemented here.

This has the additional benefit that we could more easily
allow different types of extern fields within a single file.

Note that the original reason to reuse @extern for field
attributes was to avoid a proliferation of attributes.
This namespace encrouching is still a bit mitigated by the
@extern(kind) attribute. In the future we can find a different
mechanism to define attributes scoped by domain.

Issue #2031

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: I28b1fdd0f0a85c46a544f71bbff40a7772e60873
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196717
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue Jun 27, 2024
This is a first-stab and partial implementation of the
embedding proposal. See the TODO list included in
embed.go to see what is outstanding.

Issue #2031

This issue is not closed, as it also referes to the
complementary export attribute.

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: Ic296a28aa009509f9a17913c7e5a0794de5a7a35
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196718
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Aram Hăvărneanu <aram@cue.works>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
cueckoo pushed a commit that referenced this issue Jul 9, 2024
This could later be allowed as an option.

This is a security feature. We also disallow hidden files on
Windows for the same reason, which is a bit more involved.

Note that this results in potentially slightly different
behavior under Windows and Unix. This is already the case.
For instance, the set of valid filenames is different on
the different supported OSes. So we accept this discrepancy
in favor of added security.

Verified that without adding the new logic, the hidden file
that was added to embed.txtar gets included in the output.

Issue #2031

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: Iacff803f4c388d1f2792665ed5adb32f68f00ffa
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196775
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
@myitcv
Copy link
Member Author

myitcv commented Jul 12, 2024

As part of the v0.10.0-alpha.1 release, we have just published an embed proposal #3264. Please give it a try out. We would very much welcome feedback via the linked discussion.

@myitcv
Copy link
Member Author

myitcv commented Sep 5, 2024

Adding a drive-by comment regarding the hypothesised @export. The shape of such a thing is becoming clearer thanks to the @embed experiment. But also as a result of playing with https://github.com/cue-lang/cuelang.org/blob/master/internal/cmd/writefs/main.go.

Indeed the writefs experiment has flagged one important thing that @export would need to support: writing "code generated by" headers to files, if specified via an option.

@myitcv
Copy link
Member Author

myitcv commented Oct 1, 2024

A further drive-by comment on the hypothesised @export. The following is a real life situation that comes from writing the configuration for the vscode-cue extension, to support use of cuepls, the CUE LSP. In that situation I want to export a CUE data value to JSON. Within that value, however, is a schema which I want to be exported as JSON Schema. So, in pseudo-code:

extension: {
    @export(config.json)

    value1: "hello"
    value2: 5

    // This value should be exported as JSON Schema
    configuration: {
        @export(type=jsonschema)
        #config 
    }
}

#config: {
    name?: string
    age?: int
}

In the pseudo-code above, I have marked the "inner" value for the configuration field with a further @export attribute in order to indicate the encoding type for that value.

Note: this pseudo-code is not a suggested design/syntax/etc, instead hopefully a simple means of indicating that the resulting contents of config.json are some hybrid of JSON values and JSON Schema under a certain field.

I suspect this requirement extends more generally to other situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants