Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: update README to explain how CUE differs from jsonnet? #33

Closed
cueckoo opened this issue Jul 3, 2021 · 16 comments
Closed

doc: update README to explain how CUE differs from jsonnet? #33

cueckoo opened this issue Jul 3, 2021 · 16 comments

Comments

@cueckoo
Copy link
Collaborator

cueckoo commented Jul 3, 2021

Originally opened by @ngrilly in cuelang/cue#33

How is CUE different from jsonnet, both coming from Google, and the latter being largely promoted especially in the context of Kubernetes?

@cueckoo cueckoo closed this as completed Jul 3, 2021
@cueckoo cueckoo mentioned this issue Jul 3, 2021
@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#33 (comment)

Jsonnet is based on BCL, an internal language at Google. It fixes a few things relative to BCL, but is mostly the same. This means it copies the biggest mistakes of BCL. Even though BCL is still widely used at Google, its issues are clear. It was just that the alternatives weren't that much better.

There are a myriad of issues with BCL (and Jsonnet and pretty much all of its descendants), but I will mention a couple:

  1. Most notably, the basic operation of composition of BCL/Jsonnet, inheritance, is not commutative and idempotent in the general case. In other words, order matters. This makes it, for humans, hard to track where values are coming from. But also, it makes it very complicated, if not impossible, to do any kind of automation. The complexity of inheritance is compounded by the fact that values can enter an object from one of several directions (super, overlay, etc.), and the order in which this happens matters. The basic operation of CUE is commutative, associative and idempotent. This order independence helps both humans and machines. The resulting model is much less complex.

  2. Typing: most of the BCL offshoots do not allow for schema definitions. This makes it hard to detect any kind of typos or user errors. For a large code bases, no one will question a requirement to have a compiled/typed language. Why should we not require the same kind of rigor for data? Some offshoots of BCL internal to Google and also external have tried to address this a bit, but none quite satisfactory. In CUE types and values are the same thing. This makes things both easier than schema-based languages (less concepts to learn), but also more powerful. It allows for intuitive but also precise typing.

There are many other issues, like handling cycles, unprincipled workarounds for hermeticity, poor tooling and so forth that make BCL and offsprings often awkward.

So why CUE? Configuration is still largely an unsolved problem. We have tried using code to generate configs, or hybrid languages, but that often results in a mess. Using generators on databases doesn't allow keeping it sync with revision control. Simpler approaches like HCL and Kustomize recognize the complexity issue by removing a lot of it, but then sometimes become too weak, and actually also reintroduce some of this complexity with overlays (a poor man's inheritance, if you will, but with some of the same negative consequences). Other forms of removing complexity, for instance by just introducing simpler forms/ abstraction layers of configuration, may work within certain context but are domain-specific and relatively hard to maintain.

So inheritance-based languages, for all its flaws, were the best we had. The idea behind CUE is to recognize that a declarative language is the best approach for many (not all) configuration problems, but to tackle the fundamental issues of these languages.

The idea for CUE is actually not new. It was invented about 30 years ago and has been in use and further developed since that time in the field of computational linguistics, where the concept is used to encode entire lexicons as well as very detailed grammars of human languages. If you think about it, these are huge configurations that are often maintained by both computer scientists and linguists. You can see this as a proof of concept that large-scale, declarative configuration for a highly complex domain can work.

CUE is a bit different from the languages used in linguistics and more tailored to the general configuration issue as we've seen it at Google. But under the hood it adheres strictly to the concepts and principles of these approaches and we have been careful not to make the same mistakes made in BCL (which then were copied in all its offshoots). It also means that CUE can benefit from 30 years of research on this topic. For instance, under the hood, CUE uses a first-order unification algorithm, allowing us to build template extractors based on anti-unification (see issue #7 and #15), something that is not very meaningful or even possible with languages like BCL and Jsonnet.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @ngrilly in cuelang/cue#33 (comment)

I didn't expect such a thorough answer. Thank you so much. It would be great to add this to the README.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @ngrilly in cuelang/cue#33 (comment)

By the way, is the project already stable/mature enough to be an adequate replacement for jsonnet?

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#33 (comment)

@ngrilly: it is very new and would consider it alpha.

That said, see for instance CL 1723. This extracts CUE templates from Kubernetes based on how the Go code interprets JSON (which K8s considers the source of truth). Using such templates on the 2000 line YAML data in doc/tutorials/kubernetes seems to work fine. So getting there.

But there are certainly areas that may be lacking still. The standard lib is mostly generated from the Go standard lib, but may have some gaps. The <label> notation for templates is probably not adequate if we want to be able to put constraints on field names as well and we need to find an alternative (could be done in a backwards compatible way, though). There is no way to refer to the top of the file, until we fully understand the consequences of various alternatives. And we are tweaking the semantics of default values. The evaluation engine is far from optimized and could still be made a lot faster.

The next step is to start using this on very large configurations. I expect that we will run into some more issues to iron out doing that.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @sparkprime in cuelang/cue#33 (comment)

My take on this (I can only speak for Jsonnet, not other config languages):

Jsonnet follows the conventional OO model (albeit generalized to mixins) but this generalization doesn't make a practical difference for tooling if you don't use them and can be enormously useful in certain cases. The non-commutativity of OO is a fundamental aspect of the OO ideology - when you compose A and B you want B to override A. They are not equal citizens.

Properties are nonetheless held in certain situations:

  • Idempotence: A + A === A (iff A does not contain super)
  • Commutativity: A + B === B + A (iff A and B have no fields in common)
  • Associativity: (A + B) + C === A + (B + C) (true all the time in both Cue and Jsonnet and very important to have).

I chose Jsonnet to follow OO not because of BCL or GCLx (its more conventional redesign that was never released outside Google) but because I have always believed in OO, and it was also the focus of my PhD work at Imperial College way back in 2006 with Sophia Drossopoulou and Susan Eisenbach (who were pioneers in formal modelling of OO languages).

Cue doesn't follow OO and is therefore completely different to Jsonnet. FWIW I have always been fond of intersection types so I find the approach intriguing. It is definitely simpler; the question for me is whether it's expressive enough. But that can only be answered by looking at a sufficient corpus of real world examples written idiomatically in both languages.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#33 (comment)

@sparkprime Yes, that is correct. Not being OO in the sense Jsonnet is makes it a completely different beast. It is more focussed on validation than templating. Constraints have a generative aspect to them, though. So as you can see in the tutorial, CUE can remove boilerplate quite effectively (and automatically). But the design principles are fundamentally very different.

Where Jsonnet is OO, CUE falls in the logic programming camp (graph unification style). Another way to look at it, in CUE instances are strict subsets of their parent (you cannot create a dog out of a cat), whereas in Jsonnet instances are modifications of their parents. The latter is more flexible, the former conceptually simpler and more conducive to analysis.

Jsonnet follows the conventional OO model (albeit generalized to mixins) but this generalization doesn't make a practical difference for tooling if you don't use them

In practice it doesn't work that way. If it is available it will be used and tooling needs to deal with it (or give up).

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#33 (comment)

A more visual way of describing how the differences pan out in practice is perhaps as follows:

In Jsonnet one defines data by specifying from which (possibly multiple) templates it inherits and then modifies the contents to get the desired results. (OO, essentially)

In CUE one selects nodes within a graph and then defines which constraints (templates) should apply for all of these nodes.

For instance, let's say we have the JSON object equivalent deployments: { foo: x, bar : y }. In general a JSON object could be described as a a set of path-leaf pairs. So let's first adopt this notation to aid the example:

deployments foo: x
deployments bar: y

Now remember that CUE is about constraints. Whereas in Jsonnet one would write x and y to be composed from a combination of values, in CUE we are first and foremost concerned that x and y meet certain constraints.

Assuming familiarity with Jsonnet, let's focus on the CUE example. For example, to say that all deployments must be Kubernetes deployments, one could write:

import "k8s.io/api/extensions/v1beta1"

deployments <Name>: v1beta1.Deployment

where the definitions in v1beta1 are created using cue get go k8s.io/api/extensions/v1beta1 (or cue go get github.com/prometheus/prometheus/pkg/rulefmt, etc). This says, in our example, that x and y must both be instances of a Kubernetes Deployment.

The cue tool (not the language) assumes a certain directory layout that then allows teams, groups, or an entire org to assumed, to apply different constraints (constrains from current directory and parent directories apply). For example:

deployments <Name>: {
  metadata name: *Name | string // the default name for a deployment is its map key, but may be anything (if not further restrained somewhere else, such as in the Kubernetes template).
  spec replicas: <=50           // never allow more than 50 replicas

  metadata labels app: metadata.name  // there must be a label called "app" matching the name
}

As constraints are not overridable, they provide a powerful way to specify policy and validation.

The funny thing about constraints in CUE, though, is that they also act as templates in the sense that they can reduce boilerplate. In the above example, since I specify that each deployment must have a label with the deployment's name, I don't need to write it anymore wherever this constraint applies. This is where the logic programming part of CUE kicks in.

In CUE's Kubernetes tutorial one can see that constraints are actually very effective at reducing boilerplate. The basic workflow of CUE w.r.t. to templating is:

  1. Create/ Modify templates
  2. Validate your data against the templates
    1. Are added/removed fields okay?
    2. Are conflicts bugs or do the templates need adjustment?
  3. Optionally: remove boilerplate (possibly automatically using cue trim)
  4. Repeat for each org/team/subdirectory

So a perfectly fine use case is to use CUE only for validation, but not reducing boilerplate.

So in the end, CUE and Jsonnet remove boilerplate in very different and incompatible ways. The goal of CUE is not to be better at boilerplate removal than Jsonnet. The point is that it adds validation and makes it easier to follow where values come from (complex inheritance hierarchies can be quite inscrutable). If I see a constraint saying that for all deployments in a certain group replicas is smaller than 50, then this is final and I know that no instance can override this. CUE allows one at one glance to make inferences about a large number of elements. This is useful for humans, but also machines.

That CUE constraints are additive is also useful for all kinds of code generation and validation pipelines, but that is a different story altogether.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @masaeedu in cuelang/cue#33 (comment)

@mpvl Is there some place where the theoretical underpinnings of cue are laid out? It seems very much like it is based in set theory, but some pointers to existing research, or even a rough overview of the mathematical concepts that underlie the idea would be very helpful.

Thanks!

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @copumpkin in cuelang/cue#33 (comment)

Somewhat relatedly, I'm curious if you could add some thoughts on how CUE differs from Nix (and in particular the NixOS configuration "language" built on top of it)

@cueckoo cueckoo mentioned this issue Jul 3, 2021
@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#33 (comment)

@masaeedu : references to the theoretical underpinnings are mentioned in https://github.com/cuelang/cue/blob/master/doc/ref/impl.md (doc is WIP). It is based on Graph Unification, or more precisely unification of typed feature structures.

There are indeed many overlaps with set theory, type theory, and the like.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#33 (comment)

@copumpkin Nix is a neat and exciting language. There are a lot of similarities. Evaluation seems to be nearly identical, for instance. I'll try to point out some of the key differences:

Nix has its base in lambda calculus/ functional programming, CUE has its basis is logic programming and more specifically the graph unification flavor of constraint-based languages. This seems rather theoretical, but has some important consequences. But also small things: I'm not sure if { a = b + 1; b = a - 1;} is legal Nix, but the equivalent in CUE would be.

A key difference is that in CUE types are values. Types and values alike are ordered in a value lattice. This is a very powerful tool for type checking. Whereas most configuration languages focus on boilerplate removal or data generation, CUE focusses on type checking. Not just whether something is a string or int, but detailed validation.

Also as a consequence of this CUE merging configurations/definitions is associative, commutative and idempotent. In practice this means you can merge two configurations in any order you want and the result will be unique and well-defined. For many of CUE's foreseen applications, such as code generation/validation pipelines, this is a key property.

I'm not sure if Nix does not have this property though. GCL and Jsonnet definitely don't. But Nix' inheritance is much more limited (good thing), to the point it may actually posses these properties. If you do want to add inheritance (as in overrides/updates), the Nix approach is a good one.

Another key point where CUE may differ from Nix in philosophy is what kind of complexity a language should allow. CUE has no inheritance (as in no overrides/updates) as this is a common source of complexity and confusion (especially with hierarchical data!). It has no conditional expressions and doing recursion is intentionally hard (and I plan to further artificially limited it soon to the point CUE is no longer primitive recursive). The design decisions for these were all based on lessons learned.

One often uses configurations in settings where one may need to quickly change a setting for a bunch of values. This may become intractable if your configuration is too complex. At that point, you might as well have used a programming or scripting language. The same holds for tooling. CUE is designed to allow making automated rewriting and analysis tractable. That goes down the drain if the language constructs gets too complex.

The idea of CUE is to keep the configuration simple. If one really need to do a computation as input the configuration, they this should be done by a program, possibly invoke by CUE scripting.This keeps the easy-to-change logic separate. This is also where the commutativity comes in handy. One can combine any number of sources, dynamic or static, in any order and the result is the same, even with cross references, lazy evaluation, cycles, etc.

Nix on the other hand is proudly Turing complete, has conditional expressions, etc. Not that there is anything wrong with that in principle, but it is not for the applications foreseen with CUE.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mboes in cuelang/cue#33 (comment)

@mpvl thanks for the detailed explanation. Just two quick questions:

Evaluation seems to be nearly identical, for instance.

AFAIU CUE, the only functions are the primitive functions. There is no lambda and no user-defined functions from within the language. I find this a very interesting point in the design space by the way, but did I understand correctly? If so, added to the lazy evaluation of Nix, this would be quite different.

{ a = b + 1; b = a - 1;}

This is legal in Nix, but the value of both a and b is ⊥ (bottom, i.e. the undefined value). What semantics does dereferencing either of those fields have in CUE?

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @xinau in cuelang/cue#33 (comment)

@mboes why do you think that the provided example results in _|_.
When evaluating the following

a: b + 1
b: a - 1
b: 2

// results in
// a: 3
// b: 2

while the following results in an error up on evaluation

a: b + 1
b: a - 2
b: 2

// results in
// a: conflicting values 1 and 2:
//     ./foo.cue:3:4
// b: conflicting values 1 and 2:
//    ./foo.cue:3:4

Note that the b: 2 is needed in order to check that the "formular" is valid.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @steshaw in cuelang/cue#33 (comment)

@mboes I tried the expression in the Nix repl but couldn't work out how to get a and b to evaluate to bottom. I tried with rec too:

$ nix repl
Welcome to Nix version 2.3.1. Type :? for help.

nix-repl> { a = b + 1; b = a - 1;}
error: undefined variable 'b' at (string):1:7

nix-repl> rec { a = b + 1; b = a - 1;}
error: infinite recursion encountered, at (string):1:22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant