Proposal: composable API definitions with CUE #294

rogpeppe · 2022-03-25T16:36:51Z

rogpeppe
Mar 25, 2022

Overview

InfluxDB API specs are a collection of contracts generated from many YAML source files. Source files are a mix of OpenAPI nodes and complete specs. Contributors and consumers alike would benefit from improving: 1) the source management of the specs, and 2) the tooling used to generate valid contracts (see Background).

CUE is a configuration language designed for
producing YAML and JSON that could be a good fit for our needs here, providing
much better composability than we currently have and also making it easy
to add declarative static validation and linting checks.

Background

We use Swagger/OpenAPI to define our API surface area. The API spec’s source of truth is the openapi repository, where various YAML source files are merged using a combination of Swagger tooling and BASH scripting to produce 13 different schemas.

Many of these APIs share common subcomponents. One significant goal for this repository is to define APIs in a modular way, reducing repetition and maintenance overhead.

The main tool used to manage sharing of common subcomponents is the swagger-cli bundle command, which rewrites a file-based path
of the form:

    $ref: "../schemas/foo.yml"

to the form used inside OpenAPI:

    $ref: "#/components/schemas/foo"

In addition to this, we also apply some ad-hoc postprocessing to the generated
OpenAPI schemas:

we add common paths, parameters and schemas using a sed script that reads
files and interpolates them into the result
we customise URLs similarly by doing a textual substitution on the generated YAML.

The reason we use a sed script in the contract generation process is because the swagger-cli bundle functionality
(and JSON Schema $ref) is limited to referencing and replacing entire objects.
That means we can't have a reference that (for example) includes only some subset of paths,
with others added to by individual specs.

This approach is somewhat fragile. For example, here is a piece of scripting that is not doing its intended job:

sed -e "s|^  - url: '/'|  - url: /api/v2/datasources|" src/svc-datasourcesd.yml > ./src/.svc.yml && \

The script is trying to change the value of servers[0].url to /api/v2/datasources and expects the / to be quoted, but it's not. Therefore, the substitution never takes place.

This could be easily fixed, but what if someone later decides that a quote is appropriate? Or adds an extra space?

It seems to me that this approach could be improved.

CUE overview

CUE (see cuelang.org) is a relatively recent addition into the configuration language space. It's a superset of JSON, and
has some properties that I think could be very helpful here. It was written
by Marcel van Lohuizen at Google, after his experiences creating HCL,
a configuration language used widely at Google. CUE’s main goal is to be maintainable at scale, but it's also
very lightweight syntactically.

For our purposes here, the most salient properties are:

it allows unification of arbitrarily deep trees of JSON-like data with predictable results.
it's easy to write constraints that enforce useful properties of the data.
it's designed with tooling in mind.

We can use CUE to generate YAML and JSON OpenAPI specifications that are identical to our current specifications. It's also easy to convert YAML to CUE when required.

The unification properties make it possible to compose and merge coherent sections of the API in a way that isn't possible with the tooling we're using now.

Constraints can be used in multiple different ways. For example, they can be used for validation and linting of OpenAPI schema properties across all our APIs. They can be applied independently, allowing checks to be as specific or as lax as appropriate.

Brief intro to CUE

I'll describe a few of CUE's features in this section - hopefully enough to understand
the examples that follow.

Here's some CUE:

x: a: "hello"
x: b: 1234
x: b: 1234

It's equivalent to the JSON below. Note that it was fine to specify the same values redundantly.
When the values are the same (or compatible) they unify:

{
    "x": {
        "a": "hello",
        "b": 1234
    }
}

If the values are different, CUE recognizes a conflict. For example, this:

x: 9999
x: "hello"

produces an error like this:

x: conflicting values 9999 and "hello" (mismatched types int and string):
    ./example.cue:3:4
    ./example.cue:4:4

An identifier before a colon (the key) can later be used to refer to the value after the colon, as in the following example:

x: "foo"
y: x

evaluates to this JSON:

{
    "x": "foo",
    "y": "foo"
}

In CUE, types are values, so if we want to constrain possible values of x, we could write:

x: {
    a: string
    b: >1000
}

If we export that on its own, it's an error because we don't know the actual values for
x.a and x.b. We only know some constraints:

x.a: incomplete value string
x.b: incomplete value >1000:
    ./example.cue:3:5

It's common to define a constraint like the above as a definition
so that it doesn't get emitted. Definitions are closed by default, so any fields not mentioned will result
in an error. Here we're constraining the value of x with the definition #X.

#X: {
    a: string
    b: >1000
}
x: #X
x: "hello"
x: 9999

We can provide default values by using *. This allows a definition to provide some defaults that can be overridden if necessary. Here we also use ... to open up the definition of #X to allow arbitrary fields:

#X: {
    a: *"default value for a" | string
    b: *1234 | >1000
    ...
}
x: #X
x: a: "a value"
x: c: ["some", "other", "content"]

This would give the JSON:

x: {
    a: "a value"
    b: 1234
    c: ["some", "other", "content"]
}

The | operator is known as the "disjunction" operator. It allows us to constrain by
a set of possible values. For example:

color: "red" | "green" | "blue" | "purple"
stringOrStringList: string | [... string]
databaseOperation: {
    type: "add"
    identifier: string
    value: stringOrStringList
} | {
    type: "remove"
    identifier: string
}

There's also a "conjunction" operator, &, which constrains the result to match
all of the subexpressions. This is done implicitly when multiple values are
given for the same thing, so an equivalent to the above x-constraining fragment is:

x: #X & {
    a: "a value"
}

All these operators work consistently and generally according to a set of
algebraic rules

As you might expect for a JSON-like data model, we have string, number and bool types,
but there's also the type _ (known as "top") that allows anything, list and struct types and more.

This all amounts to a strong typing system for JSON that's much easier to express and rather
more general than JSON Schema.

CUE also provides the ability to define packages
and import them. Like Go packages, CUE packages
can be split up arbitrarily between separate files.

Using CUE to compose Swagger specs

Here's a definition for an OpenAPI spec. All our existing specs are compatible with this definition.

#OpenAPI: {
    openapi: "3.0.0"
    info: {
        title:        string
        version:      string
        description?: string
    }
    servers: [{
        url: string
    }]
    paths: [string]: _
    components: {
        parameters?: [string]:      _
        schemas?: [string]:         _
        responses?: [string]:       _
        securitySchemes?: [string]: _
        requestBodies?: [_]:        _
    }
    ...
}

Let's imagine that the various subcomponents of our API were defined as individual, self-contained OpenAPI specs, so each one could be imported individually.

This is similar to what we're doing with the src/svc directory, but if they're defined in CUE, we then have the possibility to compose them together arbitrarily. Although different subcomponents have portions in common (for example, they all define the ServerError schema), CUE's unification is sufficient to merge them together as one might hope.

For example, here's a definition of the existing annotationd service expressed in CUE. Note that we define various fields with defaults so that they can take on a different value when unified with another schema. For example, if we merge the annotationd service with something else, we won't want the overall title to remain "Annotations service".

package contracts

import (
    "github.com/influxdata/openapi/src/svc/annotationd/annotationdpaths"
    "github.com/influxdata/openapi/src/svc/annotationd/annotationdschemas"
    "github.com/influxdata/openapi/src/svc/annotationd/annotationdparameters"
    "github.com/influxdata/openapi/src/common/commonschemas"
    "github.com/influxdata/openapi/src/common/commonresponses"
)

#Annotationd: #OpenAPI & {
    info: {
        // Note: we define the title and versio
        title:   *"Annotations service" | _
        version: *"0.2.3" | _
    }
    servers: [{
        url: *"/api/v2private" | _
    }]
    paths: {
        "/annotations":                annotationdpaths.annotations
        "/annotations/{annotationID}": annotationdpaths.annotation
        "/streams":                    annotationdpaths.streams
        "/streams/{streamID}":         annotationdpaths.stream
    }
    components: {
        parameters: {
            AnnotationListFilter:   annotationdparameters.AnnotationListFilter
            AnnotationDeleteFilter: annotationdparameters.AnnotationDeleteFilter
            StreamListFilter:       annotationdparameters.StreamListFilter
            StreamDeleteFilter:     annotationdparameters.StreamDeleteFilter
        }
        schemas: {
            AnnotationListFilter:   annotationdschemas.AnnotationListFilter
            AnnotationDeleteFilter: annotationdschemas.AnnotationDeleteFilter
            BasicFilter:            annotationdschemas.BasicFilter
            AnnotationList:         annotationdschemas.AnnotationList
            AnnotationEventList:    annotationdschemas.AnnotationEventList
            AnnotationEvent:        annotationdschemas.AnnotationEvent
            AnnotationCreateList:   annotationdschemas.AnnotationCreateList
            AnnotationCreate:       annotationdschemas.AnnotationCreate
            Annotation:             annotationdschemas.Annotation
            AnnotationResponse:     annotationdschemas.AnnotationResponse
            StreamListFilter:       annotationdschemas.StreamListFilter
            StreamDeleteFilter:     annotationdschemas.StreamDeleteFilter
            StreamList:             annotationdschemas.StreamList
            Stream:                 annotationdschemas.Stream
            ReadStream:             annotationdschemas.ReadStream
            Error:                  commonschemas.Error
        }
        responses: {
            NoContent:   commonresponses.NoContent
            ServerError: commonresponses.ServerError
        }
    }
}

Here's a similar spec for the notebooksd service:

package contracts

import (
    "github.com/influxdata/openapi/src/svc/notebooksd/notebooksdschemas"
    "github.com/influxdata/openapi/src/svc/notebooksd/notebooksdrequestBodies"
    "github.com/influxdata/openapi/src/svc/notebooksd/notebooksdpaths"
    "github.com/influxdata/openapi/src/common/commonschemas"
    "github.com/influxdata/openapi/src/common/commonresponses"
)

#Notebooksd: #OpenAPI & {
    info: {
        title:   *"notebooksd" | _
        version: *"1.0.0" | _
    }
    servers: [{
        url: *"/api/v2private" | _
    }]
    paths: {
        "/notebooks":                     notebooksdpaths.notebooks
        "/notebooks/{id}":                notebooksdpaths.notebooks_id
        "/notebooks/share":               notebooksdpaths.notebooks_share
        "/notebooks/share/{id}":          notebooksdpaths.notebooks_share_id
        "/api/share/{id}/query/{pipeID}": notebooksdpaths.api_share_id_query_pipeid
        "/api/share/{id}":                notebooksdpaths.api_share_id
    }
    components: {
        requestBodies: {
            NotebookParams: notebooksdrequestBodies.NotebookParams
            ShareParams:    notebooksdrequestBodies.ShareParams
        }
        schemas: {
            NotebookParams: notebooksdschemas.NotebookParams
            Notebook:       notebooksdschemas.Notebook
            NotebookArray:  notebooksdschemas.NotebookArray
            Notebooks:      notebooksdschemas.Notebooks
            ShareParams:    notebooksdschemas.ShareParams
            Share:          notebooksdschemas.Share
            Shares:         notebooksdschemas.Shares
            Error:          commonschemas.Error
        }
        responses: {
            NoContent:   commonresponses.NoContent
            ServerError: commonresponses.ServerError
        }
    }
}

Here's some CUE that merges the two into a single specification:

annotebooksd: {
    #Annotationd
    #Notebooksd
    info: {
        title: "Annotations and notebooks service"
        version: "1.0.0"
    }
}

Running the following command gives us a properly formed OpenAPI spec holding both portions of the API, written to the file annotebooks.yml:

cue export -e annotebooksd -o annotebooks.yml

Different API versions

This works well when two portions of the API are compatible, but what about occasions when we have two versions that are different? For example, currently we have an annotationsd API and an annotationsd-oss API, which differ in relatively minor ways.

As long as the relevant parts of the spec are written so that they're amenable to change (for example by using a default value), we can use the same techniques above. For example, here's how we can define the annotationd-oss API in terms of the above #Annotationd definition:

#AnnotationdOSS: {
    #Annotationd
    paths: {
        "/annotations": {
            get: parameters: [{
                 annotationdparameters.AnnotationListFilter.#Ref
             }, {
                 name: "orgID"
                 in:   "query"
                 schema: type: "string"
            }]
            delete: parameters: [{
                 annotationdparameters.AnnotationDeleteFilter.#Ref
             }, {
                 name: "orgID"
                 in:   "query"
                 schema: type: "string"
            }]

        }
        "/streams": {
            get:     parameters: [{
                 annotationdparameters.StreamListFilter.#Ref
             }, {
                 name: "orgID"
                 in:   "query"
                 schema: type: "string"
            }]
            delete: parameters: [{
                 annotationdparameters.StreamDeleteFilter.#Ref
             }, {
                 name: "orgID"
                 in:   "query"
                 schema: type: "string"
            }]
        }
    }
}

API validation and linting

Once our spec is composed in CUE, we gain the advantage of constraints to enforce API consistency.
For example, we could check that every endpoint has a default error response with the same form, or that all endpoints have a JSON response. As CUE is a constraint language, this kind of thing
can be expressed naturally in declarative form.

It's also possible to use CUE for other tasks too, such as automatically gathering all the operation tags
into a top level groups.tags field, something which is currently done for the API reference docs with OpenAPI decorators in Javascript.

Interoperability

Note that although we're proposing CUE as the source of truth here, the resulting contracts are still OpenAPI YAML, amenable to all the usual OpenAPI tooling. CUE is designed to interoperate cleanly with both JSON and YAML - it can easily import and export from both those formats and more.

The proposal here is to use CUE for composing and generating the OpenAPI schemas without affecting the schema contracts or how they’re consumed. That is, in the openapi repo, make generate would produce
the same artifacts as always and a CI flow would check that the artifacts are up to date with the source.

Proof of concept

By way of a proof of concept, I've pushed a branch which could work as an initial step. All the existing source
YAML files have been translated to CUE, and the output schemas have been verified to be the same as the
current schemas. The directory structure and the per-file data remains the same. From there, it could be
refactored to become more modular as desired.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: composable API definitions with CUE #294

{{title}}

Replies: 0 comments

Select a reply

Proposal: composable API definitions with CUE #294

rogpeppe Mar 25, 2022

Overview

Background

CUE overview

Brief intro to CUE

Using CUE to compose Swagger specs

Different API versions

API validation and linting

Interoperability

Proof of concept

Replies: 0 comments

rogpeppe
Mar 25, 2022