-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementer's guide #2454
Add implementer's guide #2454
Changes from 2 commits
265bf1d
09e2c65
a7faac8
bd28e96
2026c7b
1653156
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,367 @@ | ||
# Gateway API Implementer's Guide | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: This location doesn't feel quite right. So far all our guides are targeted at users of the API, and this one is decidedly different than those other ones. Maybe this belongs under the "Reference" tab instead? |
||
|
||
aka | ||
|
||
Everything you wanted to know about building a Gateway API implementation | ||
but were too afraid to ask. | ||
|
||
This document is a place to collect tips and tricks for _writing a Gateway API | ||
implementation_ that have no straightforward place within the godoc fields of the | ||
underlying types. | ||
|
||
It's also intended to be a place to write down some guidelines to | ||
help implementers of this API to skip making common mistakes. | ||
|
||
It may not be very relevant if you are intending to _use_ this API as an end | ||
user as opposed to _building_ something that uses it. | ||
|
||
!!! note | ||
This document is officially a part of the Gateway API specification. | ||
Requirements in this document labelled as MUST, SHOULD, or MAY must be treated | ||
the same as in the detailed specification page. (That is, the words MUST, SHOULD, | ||
and MAY must be interpreted as described in RFC 2119.) | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This is a living document, if you see something missing, PRs welcomed! | ||
|
||
## Important things to remember about Gateway API | ||
|
||
Hopefully most of these are not surprising, but they sometimes have non-obvious | ||
implications that we'll try and lay out here. | ||
|
||
### Gateway API is a `kubernetes.io` API | ||
|
||
Gateway API uses the `gateway.networking.k8s.io` API group. This means that, | ||
like APIs delivered in the core Kubernetes binaries, each time a release happens, | ||
the APIs have been reviewed by upstream Kubernetes reviewers, just like the APIs | ||
delivered in the core binaries. | ||
|
||
### Gateway API is delivered using CRDs | ||
|
||
Gateway API is supplied as a set of CRDs, version controlled using our [versioning | ||
policy][versioning]. | ||
|
||
The most important part of that versioning policy is that what _appears to be_ | ||
the same object (that is, it has the same `group`,`version`, and `kind`) may have | ||
a slightly different schema. We make changes in ways that are _compatible_, so | ||
things should generally "just work", but there are some actions implementations | ||
need to take to make "just work"ing more reliable; these are detailed below. | ||
|
||
The CRD-based delivery also means that if an implementation tries to use (that is | ||
get, list, watch, etc) Gateway API objects when the CRDs have _not_ been installed, | ||
then it's likely that your Kubernetes client code will return serious errors. | ||
Tips to deal with this are also detailed below. | ||
|
||
The CRD definitions for Gateway API objects all contain two specific | ||
annotations: | ||
|
||
- `gateway.networking.k8s.io/bundle-version: <semver-release-version>` | ||
- `gateway.networking.k8s.io/channel: <channel-name>` | ||
|
||
The concepts of "bundle version" and "channel" (short for "release channel") are | ||
explained in our [versioning][versioning] documentation. | ||
|
||
Implementations may use these to determine what schema versions are installed in | ||
the cluster, if any. | ||
|
||
[versioning]: /concepts/versioning | ||
|
||
### Changes to the Gateway API CRDs are backwards compatible | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of my concerns here is that this could get out of sync with our versioning guide or vice versa, since a lot of this section is overlapping. Wherever possible, I'd like to have a single source of truth in our docs for a topic so we don't end up with accidentally conflicting docs in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand, but I think that having a summary of the key parts is really important for implementers. I'm not sure how to trim this to tell people about the backward compatibility guarantees in any shorter way. |
||
|
||
Part of the contract for Gateway API CRDs is that changes _within an API version_ | ||
must be _compatible_. | ||
|
||
"Within an API Version" means changes to a CRD that occur while the same API version | ||
(`v1alpha2` or `v1` for example) is in use, and "compatible" means that any new | ||
fields, values, or validation will be added to ensure that _previous_ | ||
objects _will still be valid objects_ after the change. | ||
|
||
This means that once Gateway API objects move to the `v1` API version, then _all_ | ||
changes must be compatible. | ||
|
||
This contract also means that an implementation will not fail with a higher version | ||
of the API than the version it was written with, because the newer schema being | ||
stored by Kubernetes will definitely be able to be serialized into the older version | ||
used in code by the implementation. | ||
|
||
Similarly, if an implementation was written with a _higher_ version, the newer | ||
values that it understands will simply _never be used_, as they are not present | ||
in the older version. | ||
|
||
A similar guarantee occurs between the "experimental" and "standard" channels for | ||
objects in the same API version, so an implementation may be written with the | ||
experimental API definitions, but work just fine with having only the standard | ||
definitions installed - there will be fields or values that will never be used. | ||
The same applies for an implementation written using the standard API definitions | ||
running in a cluster with the experimental definitions installed. | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Implementation Rules and Guidelines | ||
|
||
### CRD Management | ||
|
||
For a Gateway API implementation to work, the Gateway API CRDs must be installed | ||
in the Kubernetes cluster the implementation is watching. | ||
|
||
Implementations have two options: automatically installing CRDs or requiring | ||
installation before working. Both have tradeoffs. | ||
|
||
Either way has certain things that SHOULD be true, however: | ||
|
||
Whatever method is used, cluster admins SHOULD attempt to ensure that | ||
the Bundle version of the CRDs is not _downgraded_. Although we ensure that | ||
API changes are backwards compatible, changing CRD definitions can change the | ||
storage version of the resource, which could have unforseen effects. Most of the | ||
time, things will probably work, but if it doesn't work, it will most likely | ||
break in weird ways. | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Try your best to ensure that the bundle version doesn't roll backwards. It's safer. | ||
|
||
Implementations SHOULD also handle the Gateway API CRDs _not_ being present in | ||
the cluster without crashing or panicing. Exiting with a clear fatal error is | ||
acceptable in this case, as is disabling Gateway API support even if enabled in | ||
configuration. | ||
|
||
Practically, for implementations using tools like `controller-runtime` or | ||
similar tooling, they may need to check for the _presence_ of the CRDs by | ||
getting the list of installed CRDs before attempting to watch those resources. | ||
(Note that this will require the implementation to have `read` access to those | ||
resources though.) | ||
|
||
#### Automatic CRD installation | ||
|
||
Automatic CRD installation also includes automatic installation mechanisms such | ||
as Helm, if the CRDs are included in a Helm chart with the implementation's | ||
installation. | ||
|
||
CRD definitions MAY be installed automatically by implementations, and if they do, | ||
they MUST have a way to ensure: | ||
|
||
- there are no other Gateway API CRDs installed in the cluster before starting, or | ||
- that the CRD definitions are only installed if they are a higher bundle version | ||
than any existing Gateway API CRDs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even this may not be safe in the case of experimental channel. I think if we say that this can be done, we also need to clearly state the risks of breaking other implementations. |
||
|
||
This avoids problems if another implementation is also installed in the cluster | ||
and expects a higher version of the CRDs to be installed. | ||
|
||
If the implementation can guarantee that no other implementation will interact | ||
with the cluster, then it MAY automatically install a relevant version of the CRDs. | ||
|
||
The ideal method for an automatic installation would require the implementation | ||
to: | ||
|
||
- Check if there are any Gateway API CRDs installed in the cluster. | ||
- If not, install its most compatible version of the CRDs. | ||
- If so, only install its version of the CRDs if the bundle version is higher | ||
than the existing one. | ||
|
||
|
||
Because of our backwards compatibility guarantees, it's also safe for a controller | ||
to flip the install channel between "standard" and "experimental", although | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is only safe in one direction: standard -> experimental. Once you've gone that direction, I think it's impossible to safely transition back to standard channel. |
||
implementations MUST NOT do this without consulting the implementation owner. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who is the "implementation owner"? In my reading, if I substitute phrases, this says "although Anyhow, one risk with flipping the channel is another controller flipping it back endlessly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing this out, I was intending to say "implementations shouldn't do this without asking a human first" - I've updated, PTAL. |
||
|
||
Automatic CRD installation has the advantage that there is less for the | ||
implementation user to do; any required version checking can be performed by | ||
code instead of by the cluster admin. | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Manual CRD installation | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Manual CRD installation has the advantage that the implementer needs to maintain | ||
less code; however, it pushes the responsibility for correctly managing the | ||
Gateway API CRDs back to the cluster admin, who may not have as much context | ||
as is provided here. | ||
|
||
Implementations MAY require the installation to be done manually; if so, the | ||
installation instructions SHOULD include commands to check if there are any other | ||
CRDs installed already and verify that the installation will not be a downgrade. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a followup maybe we should provide a common command to do this. Istio, for instance, has an example of a suboptimal approach: https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#setup There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, this is a great followup. I'll create an issue a bit later for it. |
||
|
||
### Conformance and Version compatibility | ||
|
||
A conformant Gateway API implementation is one that passes the conformance tests | ||
that are included in each Gateway API bundle version release. | ||
|
||
An implementation MUST pass the conformance suite with _no_ skipped tests to be | ||
conformant. Tests may be skipped during development, but a version you want to | ||
be conformant MUST have no skipped tests. | ||
|
||
Extended features may, as per the contract for Extended status, be disabled. | ||
|
||
Gateway API conformance is version-specific. An implementation that passes | ||
conformance for version N may not pass conformance for version N+1 without changes. | ||
|
||
Implementations SHOULD submit a report from the conformance testing suite back | ||
to the Gateway API Github repo containing details of their testing. | ||
|
||
The conformance suite output includes the Gateway API version supported. | ||
|
||
#### Version compatibility | ||
|
||
Once v1.0 is released, for implementations supporting Gateway and GatewayClass, | ||
they MUST set a new Condition, `SupportedVersion`, with `status: true` meaning | ||
that the installed CRD version is supported, and `status: false` meaning that it | ||
is not. | ||
|
||
### Standard Status fields and Conditions | ||
|
||
Gateway API has many resources, but when designing this, we've worked to keep | ||
the status experience as consistent as possible across objects, using the | ||
Condition type and the `status.conditions` field. | ||
|
||
Most resources have a `status.conditions` field, but some also have a namespaced | ||
field that _contains_ a `conditions` field. | ||
|
||
For the latter, Gateway's `status.listeners` and the Route `status.parents` | ||
fields are examples where each item in the slice identifies the Conditions | ||
associated with some subset of configuration. | ||
|
||
For the Gateway case, it's to allow Conditions per _Listener_, and in the Route | ||
case, it's to allow Conditions per _implementation_ (since Route objects can | ||
be used in multiple Gateways, and those Gateways can be reconciled by different | ||
implementations). | ||
|
||
In all of these cases, there are some relatively-common Condition types that have | ||
similar meanings: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm worried that the definitions we have here will become out of sync with the definitions we have in the spec, do we need to define these in both places? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see these changing any time soon, so I'm not worried about this. |
||
- `Accepted` - the resource or part thereof contains acceptable config that will | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
produce some configuration in the underlying data plane that the implementation | ||
controls. This does not mean that the _whole_ configuration is valid, just that | ||
_enough_ is valid to produce some effect. | ||
- `Programmed` - this represents a later phase of operation, after `Accepted`, | ||
when the resource or part thereof has been Accepted and programmed into the | ||
underlying dataplane. Users should expect the configuration to be ready for | ||
traffic to flow _at some point in the near future_. This Condition does _not_ | ||
say that the dataplane is ready _when it's set_, just that everything is valid | ||
and it _will become ready soon_. "Soon" may have different meanings depending | ||
on the implementation. | ||
- `ResolvedRefs` - this Condition indicates that all references in the resource | ||
or part thereof were valid and pointed to an object that both exists and allows | ||
that reference. If this Condition is set to `status: false`, then _at least one_ | ||
reference in the resource or part thereof is invalid for some reason, and the | ||
`message` field should indicate which one are invalid. | ||
|
||
Implementers should check the godoc for each type to see the exact details of | ||
these Conditions on each resource or part thereof. | ||
|
||
Additionally, the upstream `Conditions` struct contains an optional | ||
`observedGeneration` field - implementations MUST use this field and set it to | ||
the `metadata.generation` field of the object at the time the status is generated. | ||
This allows users of the API to determine if the status is relevant to the current | ||
version of the object. | ||
|
||
|
||
### Resources details | ||
|
||
For each currently available conformance profile, there are a set of resources | ||
that implementations are expected to reconcile. | ||
|
||
The following section goes through each Gateway API object, indicates expected | ||
behaviors, and which conformance profiles that object is included in. | ||
|
||
#### GatewayClass | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we decide what belongs in here vs https://gateway-api.sigs.k8s.io/api-types/gatewayclass/ vs API Spec? I'm worried that we're going to end up with subtle differences in each individual source if we're not very careful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've tried to keep this very general, with only the main status being captured here. I don't see us changing the Maybe the guideline here is that we can only mention stable or core fields? |
||
|
||
GatewayClass has one main `spec` field - `controllerName`. Each implementation | ||
is expected to claim a domain-prefixed string value (like | ||
`example.com/example-ingress`) as its `controllerName`. | ||
|
||
Implementations MUST watch _all_ GatewayClasses, and reconcile GatewayClasses | ||
that have a matching `controllerName`. The implementation must choose at least | ||
one compatible GatewayClass out of the set of GatewayClasses that have a matching | ||
`controllerName`, and indicate that it accepts processing of that GatewayClass | ||
by setting an `Accepted` Condition to `status: true` in each. Any GatewayClasses | ||
that have a matching `controllerName` but are _not_ Accepted must have the | ||
`Accepted` Condition sett to `status: false`. | ||
|
||
Implementations MAY choose only one GatewayClass out of the pool of otherwise | ||
acceptable GatewayClasses if they can only reconcile one, or, if they are capable | ||
of reconciling multiple GatewayClasses, they may also choose as many as they like. | ||
|
||
If something in the GatewayClass renders it incompatibie (at the time of writing, | ||
the only possible reason for this is that there is a pointer to a `paramsRef` | ||
object that is not supported by the implementation), then the implementation | ||
SHOULD mark the incompatible GatewayClass as not `Accepted`. | ||
|
||
Watched in profiles: | ||
|
||
- HTTP | ||
- TLS | ||
youngnick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Gateway | ||
|
||
Gateway objects MUST refer in the `spec.gatewayClassName` field to a GatewayClass | ||
that exists and is `Accepted` by an implementation for that implementation to | ||
reconcile them. | ||
|
||
Gateway objects that fall out of scope (for example, because the GatewayClass | ||
they reference was deleted) for reconciliation MAY have their status removed by | ||
the implementation as part of the delete process, but this is not required. | ||
|
||
Watched in profiles: | ||
|
||
- HTTP | ||
- TLS | ||
|
||
#### General Route information | ||
|
||
All Route objects share some properties: | ||
|
||
- They MUST be attached to an in-scope parent for the implementation to consider | ||
them reconcilable. | ||
- The implementation MUST update the status for each in-scope Route with the | ||
relevant Conditions, using the namespaced `parents` field. See the specific Route | ||
types for details, but this usually includes `Accepted`, `Programmed` and | ||
`ResovledRefs` Conditions. | ||
- Routes that fall out of scope SHOULD NOT have status updated, since it's possible | ||
that these updates may overwrite any new owners. The `observedGeneration` field | ||
will indicate that any remaining status is out of date. | ||
|
||
|
||
#### HTTPRoute | ||
|
||
HTTPRoutes route HTTP traffic that is _unencrypted_ and available for inspection. | ||
This allows the HTTPRoute to use HTTP properties, like path, method, or headers | ||
in its routing directives. | ||
|
||
Watched in profiles: | ||
|
||
- HTTP | ||
- MESH | ||
|
||
#### TLSRoute | ||
|
||
TLSRoutes route encrypted TLS traffic using the SNI header, _without decrypting | ||
the traffic stream_, to the relevant backends. | ||
|
||
Watched in profiles: | ||
|
||
- TLS | ||
|
||
#### TCPRoute | ||
|
||
TCPRoutes route a TCP stream that arrives at a Listener to one of the given | ||
backends. | ||
|
||
Not currently included in any conformance profiles. | ||
|
||
#### UDPRoute | ||
|
||
UDPRoutes route UDP packets that arrive at a Listener to one of the given | ||
backends. | ||
|
||
Not currently included in any conformance profiles. | ||
|
||
#### ReferenceGrant | ||
|
||
ReferenceGrant is a special resource that is used by resource owners in one | ||
namespace to _selectively_ allow references from Gateway API objects in other | ||
namespaces. | ||
|
||
A ReferenceGrant is created in the same namespace as the thing it's granting | ||
reference access to, and allows access from other namespaces, from other Kinds, | ||
or both. | ||
|
||
Implementations that support cross-namespace references MUST watch ReferenceGrant | ||
and reconcile any ReferenceGrant that points to an object that's referred to by | ||
an in-scope Gateway API object. | ||
|
||
Watched in profiles: | ||
|
||
- HTTP | ||
- TLS | ||
- MESH |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,12 @@ | ||
# API Specification | ||
|
||
This page contains the API field specification for Gateway API. | ||
|
||
However, the [Implementer's Guide][implguide] also contains requirements for implementers | ||
that don't fit cleanly into a single field's documentation here. The API spec | ||
as a whole must be considered to be represented by both documents. | ||
|
||
[implguide]: /guides/implementers-guide | ||
|
||
|
||
REPLACE_WITH_GENERATED_CONTENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started this PR reviewing this page, but it turned out that these guidelines are actually for us, the project contributors and maintainers, not for implementers. So "Design guidelines" felt more appropriate.