You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
InfluxDB API specs are a collection of contracts generated from many YAML source files. Source files are a mix of OpenAPI nodes and complete specs. Contributors and consumers alike would benefit from improving: 1) the source management of the specs, and 2) the tooling used to generate valid contracts (see Background).
CUE is a configuration language designed for
producing YAML and JSON that could be a good fit for our needs here, providing
much better composability than we currently have and also making it easy
to add declarative static validation and linting checks.
Background
We use Swagger/OpenAPI to define our API surface area. The API spec’s source of truth is the openapi repository, where various YAML source files are merged using a combination of Swagger tooling and BASH scripting to produce 13 different schemas.
Many of these APIs share common subcomponents. One significant goal for this repository is to define APIs in a modular way, reducing repetition and maintenance overhead.
The main tool used to manage sharing of common subcomponents is the swagger-cli bundle command, which rewrites a file-based path
of the form:
$ref: "../schemas/foo.yml"
to the form used inside OpenAPI:
$ref: "#/components/schemas/foo"
In addition to this, we also apply some ad-hoc postprocessing to the generated
OpenAPI schemas:
we add common paths, parameters and schemas using a sed script that reads
files and interpolates them into the result
we customise URLs similarly by doing a textual substitution on the generated YAML.
The reason we use a sed script in the contract generation process is because the swagger-cli bundle functionality
(and JSON Schema $ref) is limited to referencing and replacing entire objects.
That means we can't have a reference that (for example) includes only some subset of paths,
with others added to by individual specs.
This approach is somewhat fragile. For example, here is a piece of scripting that is not doing its intended job:
The script is trying to change the value of servers[0].url to /api/v2/datasources and expects the / to be quoted, but it's not. Therefore, the substitution never takes place.
This could be easily fixed, but what if someone later decides that a quote is appropriate? Or adds an extra space?
It seems to me that this approach could be improved.
CUE overview
CUE (see cuelang.org) is a relatively recent addition into the configuration language space. It's a superset of JSON, and
has some properties that I think could be very helpful here. It was written
by Marcel van Lohuizen at Google, after his experiences creating HCL,
a configuration language used widely at Google. CUE’s main goal is to be maintainable at scale, but it's also
very lightweight syntactically.
For our purposes here, the most salient properties are:
it allows unification of arbitrarily deep trees of JSON-like data with predictable results.
it's easy to write constraints that enforce useful properties of the data.
it's designed with tooling in mind.
We can use CUE to generate YAML and JSON OpenAPI specifications that are identical to our current specifications. It's also easy to convert YAML to CUE when required.
The unification properties make it possible to compose and merge coherent sections of the API in a way that isn't possible with the tooling we're using now.
Constraints can be used in multiple different ways. For example, they can be used for validation and linting of OpenAPI schema properties across all our APIs. They can be applied independently, allowing checks to be as specific or as lax as appropriate.
Brief intro to CUE
I'll describe a few of CUE's features in this section - hopefully enough to understand
the examples that follow.
It's equivalent to the JSON below. Note that it was fine to specify the same values redundantly.
When the values are the same (or compatible) they unify:
{
"x": {
"a": "hello",
"b": 1234
}
}
If the values are different, CUE recognizes a conflict. For example, this:
x: 9999
x: "hello"
produces an error like this:
x: conflicting values 9999 and "hello" (mismatched types int and string):
./example.cue:3:4
./example.cue:4:4
An identifier before a colon (the key) can later be used to refer to the value after the colon, as in the following example:
x: "foo"
y: x
evaluates to this JSON:
{
"x": "foo",
"y": "foo"
}
In CUE, types are values, so if we want to constrain possible values of x, we could write:
x: {
a: string
b: >1000
}
If we export that on its own, it's an error because we don't know the actual values for x.a and x.b. We only know some constraints:
x.a: incomplete value string
x.b: incomplete value >1000:
./example.cue:3:5
It's common to define a constraint like the above as a definition
so that it doesn't get emitted. Definitions are closed by default, so any fields not mentioned will result
in an error. Here we're constraining the value of x with the definition #X.
We can provide default values by using *. This allows a definition to provide some defaults that can be overridden if necessary. Here we also use ... to open up the definition of #X to allow arbitrary fields:
#X: {
a: *"default value for a" | string
b: *1234 | >1000
...
}
x: #X
x: a: "a value"
x: c: ["some", "other", "content"]
There's also a "conjunction" operator, &, which constrains the result to match
all of the subexpressions. This is done implicitly when multiple values are
given for the same thing, so an equivalent to the above x-constraining fragment is:
x: #X & {
a: "a value"
}
All these operators work consistently and generally according to a set of algebraic rules
As you might expect for a JSON-like data model, we have string, number and bool types,
but there's also the type _ (known as "top") that allows anything, list and struct types and more.
This all amounts to a strong typing system for JSON that's much easier to express and rather
more general than JSON Schema.
CUE also provides the ability to define packages
and import them. Like Go packages, CUE packages
can be split up arbitrarily between separate files.
Using CUE to compose Swagger specs
Here's a definition for an OpenAPI spec. All our existing specs are compatible with this definition.
Let's imagine that the various subcomponents of our API were defined as individual, self-contained OpenAPI specs, so each one could be imported individually.
This is similar to what we're doing with the src/svc directory, but if they're defined in CUE, we then have the possibility to compose them together arbitrarily. Although different subcomponents have portions in common (for example, they all define the ServerError schema), CUE's unification is sufficient to merge them together as one might hope.
For example, here's a definition of the existing annotationd service expressed in CUE. Note that we define various fields with defaults so that they can take on a different value when unified with another schema. For example, if we merge the annotationd service with something else, we won't want the overall title to remain "Annotations service".
Running the following command gives us a properly formed OpenAPI spec holding both portions of the API, written to the file annotebooks.yml:
cue export -e annotebooksd -o annotebooks.yml
Different API versions
This works well when two portions of the API are compatible, but what about occasions when we have two versions that are different? For example, currently we have an annotationsd API and an annotationsd-oss API, which differ in relatively minor ways.
As long as the relevant parts of the spec are written so that they're amenable to change (for example by using a default value), we can use the same techniques above. For example, here's how we can define the annotationd-oss API in terms of the above #Annotationd definition:
Once our spec is composed in CUE, we gain the advantage of constraints to enforce API consistency.
For example, we could check that every endpoint has a default error response with the same form, or that all endpoints have a JSON response. As CUE is a constraint language, this kind of thing
can be expressed naturally in declarative form.
It's also possible to use CUE for other tasks too, such as automatically gathering all the operation tags
into a top level groups.tags field, something which is currently done for the API reference docs with OpenAPI decorators in Javascript.
Interoperability
Note that although we're proposing CUE as the source of truth here, the resulting contracts are still OpenAPI YAML, amenable to all the usual OpenAPI tooling. CUE is designed to interoperate cleanly with both JSON and YAML - it can easily import and export from both those formats and more.
The proposal here is to use CUE for composing and generating the OpenAPI schemas without affecting the schema contracts or how they’re consumed. That is, in the openapi repo, make generate would produce
the same artifacts as always and a CI flow would check that the artifacts are up to date with the source.
Proof of concept
By way of a proof of concept, I've pushed a branch which could work as an initial step. All the existing source
YAML files have been translated to CUE, and the output schemas have been verified to be the same as the
current schemas. The directory structure and the per-file data remains the same. From there, it could be
refactored to become more modular as desired.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Overview
InfluxDB API specs are a collection of contracts generated from many YAML source files. Source files are a mix of OpenAPI nodes and complete specs. Contributors and consumers alike would benefit from improving: 1) the source management of the specs, and 2) the tooling used to generate valid contracts (see Background).
CUE is a configuration language designed for
producing YAML and JSON that could be a good fit for our needs here, providing
much better composability than we currently have and also making it easy
to add declarative static validation and linting checks.
Background
We use Swagger/OpenAPI to define our API surface area. The API spec’s source of truth is the openapi repository, where various YAML source files are merged using a combination of Swagger tooling and BASH scripting to produce 13 different schemas.
Many of these APIs share common subcomponents. One significant goal for this repository is to define APIs in a modular way, reducing repetition and maintenance overhead.
The main tool used to manage sharing of common subcomponents is the
swagger-cli bundle
command, which rewrites a file-based pathof the form:
to the form used inside OpenAPI:
In addition to this, we also apply some ad-hoc postprocessing to the generated
OpenAPI schemas:
sed
script that readsfiles and interpolates them into the result
The reason we use a
sed
script in the contract generation process is because theswagger-cli bundle
functionality(and JSON Schema
$ref
) is limited to referencing and replacing entire objects.That means we can't have a reference that (for example) includes only some subset of paths,
with others added to by individual specs.
This approach is somewhat fragile. For example, here is a piece of scripting that is not doing its intended job:
The script is trying to change the value of
servers[0].url
to/api/v2/datasources
and expects the/
to be quoted, but it's not. Therefore, the substitution never takes place.This could be easily fixed, but what if someone later decides that a quote is appropriate? Or adds an extra space?
It seems to me that this approach could be improved.
CUE overview
CUE (see cuelang.org) is a relatively recent addition into the configuration language space. It's a superset of JSON, and
has some properties that I think could be very helpful here. It was written
by Marcel van Lohuizen at Google, after his experiences creating HCL,
a configuration language used widely at Google. CUE’s main goal is to be maintainable at scale, but it's also
very lightweight syntactically.
For our purposes here, the most salient properties are:
We can use CUE to generate YAML and JSON OpenAPI specifications that are identical to our current specifications. It's also easy to convert YAML to CUE when required.
The unification properties make it possible to compose and merge coherent sections of the API in a way that isn't possible with the tooling we're using now.
Constraints can be used in multiple different ways. For example, they can be used for validation and linting of OpenAPI schema properties across all our APIs. They can be applied independently, allowing checks to be as specific or as lax as appropriate.
Brief intro to CUE
I'll describe a few of CUE's features in this section - hopefully enough to understand
the examples that follow.
Here's some CUE:
It's equivalent to the JSON below. Note that it was fine to specify the same values redundantly.
When the values are the same (or compatible) they unify:
If the values are different, CUE recognizes a conflict. For example, this:
produces an error like this:
An identifier before a colon (the key) can later be used to refer to the value after the colon, as in the following example:
evaluates to this JSON:
In CUE, types are values, so if we want to constrain possible values of
x
, we could write:If we export that on its own, it's an error because we don't know the actual values for
x.a
andx.b
. We only know some constraints:It's common to define a constraint like the above as a definition
so that it doesn't get emitted. Definitions are closed by default, so any fields not mentioned will result
in an error. Here we're constraining the value of
x
with the definition#X
.We can provide default values by using
*
. This allows a definition to provide some defaults that can be overridden if necessary. Here we also use...
to open up the definition of#X
to allow arbitrary fields:This would give the JSON:
The
|
operator is known as the "disjunction" operator. It allows us to constrain bya set of possible values. For example:
There's also a "conjunction" operator,
&
, which constrains the result to matchall of the subexpressions. This is done implicitly when multiple values are
given for the same thing, so an equivalent to the above
x
-constraining fragment is:All these operators work consistently and generally according to a set of
algebraic rules
As you might expect for a JSON-like data model, we have
string
,number
andbool
types,but there's also the type
_
(known as "top") that allows anything, list and struct types and more.This all amounts to a strong typing system for JSON that's much easier to express and rather
more general than JSON Schema.
CUE also provides the ability to define packages
and import them. Like Go packages, CUE packages
can be split up arbitrarily between separate files.
Using CUE to compose Swagger specs
Here's a definition for an OpenAPI spec. All our existing specs are compatible with this definition.
Let's imagine that the various subcomponents of our API were defined as individual, self-contained OpenAPI specs, so each one could be imported individually.
This is similar to what we're doing with the
src/svc
directory, but if they're defined in CUE, we then have the possibility to compose them together arbitrarily. Although different subcomponents have portions in common (for example, they all define theServerError
schema), CUE's unification is sufficient to merge them together as one might hope.For example, here's a definition of the existing
annotationd
service expressed in CUE. Note that we define various fields with defaults so that they can take on a different value when unified with another schema. For example, if we merge the annotationd service with something else, we won't want the overall title to remain "Annotations service".Here's a similar spec for the
notebooksd
service:Here's some CUE that merges the two into a single specification:
Running the following command gives us a properly formed OpenAPI spec holding both portions of the API, written to the file
annotebooks.yml
:Different API versions
This works well when two portions of the API are compatible, but what about occasions when we have two versions that are different? For example, currently we have an
annotationsd
API and anannotationsd-oss
API, which differ in relatively minor ways.As long as the relevant parts of the spec are written so that they're amenable to change (for example by using a default value), we can use the same techniques above. For example, here's how we can define the
annotationd-oss
API in terms of the above#Annotationd
definition:API validation and linting
Once our spec is composed in CUE, we gain the advantage of constraints to enforce API consistency.
For example, we could check that every endpoint has a default error response with the same form, or that all endpoints have a JSON response. As CUE is a constraint language, this kind of thing
can be expressed naturally in declarative form.
It's also possible to use CUE for other tasks too, such as automatically gathering all the operation tags
into a top level
groups.tags
field, something which is currently done for the API reference docs with OpenAPI decorators in Javascript.Interoperability
Note that although we're proposing CUE as the source of truth here, the resulting contracts are still OpenAPI YAML, amenable to all the usual OpenAPI tooling. CUE is designed to interoperate cleanly with both JSON and YAML - it can easily import and export from both those formats and more.
The proposal here is to use CUE for composing and generating the OpenAPI schemas without affecting the schema contracts or how they’re consumed. That is, in the
openapi
repo,make generate
would producethe same artifacts as always and a CI flow would check that the artifacts are up to date with the source.
Proof of concept
By way of a proof of concept, I've pushed a branch which could work as an initial step. All the existing source
YAML files have been translated to CUE, and the output schemas have been verified to be the same as the
current schemas. The directory structure and the per-file data remains the same. From there, it could be
refactored to become more modular as desired.
Beta Was this translation helpful? Give feedback.
All reactions