Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementer's guide #2454

Merged
merged 6 commits into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ nav:
- TCP routing: guides/tcp.md
- gRPC Routing: guides/grpc-routing.md
- Migrating from Ingress: guides/migrating-from-ingress.md
- Implementer's Guide: guides/implementers-guide.md
- Reference:
- API Types:
GatewayClass: api-types/gatewayclass.md
Expand Down
2 changes: 1 addition & 1 deletion site-src/concepts/guidelines.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Implementation guidelines
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started this PR reviewing this page, but it turned out that these guidelines are actually for us, the project contributors and maintainers, not for implementers. So "Design guidelines" felt more appropriate.

# Design guidelines

There are some general design guidelines used throughout this API.

Expand Down
367 changes: 367 additions & 0 deletions site-src/guides/implementers-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,367 @@
# Gateway API Implementer's Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This location doesn't feel quite right. So far all our guides are targeted at users of the API, and this one is decidedly different than those other ones. Maybe this belongs under the "Reference" tab instead?


aka

Everything you wanted to know about building a Gateway API implementation
but were too afraid to ask.

This document is a place to collect tips and tricks for _writing a Gateway API
implementation_ that have no straightforward place within the godoc fields of the
underlying types.

It's also intended to be a place to write down some guidelines to
help implementers of this API to skip making common mistakes.

It may not be very relevant if you are intending to _use_ this API as an end
user as opposed to _building_ something that uses it.

!!! note
This document is officially a part of the Gateway API specification.
Requirements in this document labelled as MUST, SHOULD, or MAY must be treated
the same as in the detailed specification page. (That is, the words MUST, SHOULD,
and MAY must be interpreted as described in RFC 2119.)
youngnick marked this conversation as resolved.
Show resolved Hide resolved

This is a living document, if you see something missing, PRs welcomed!

## Important things to remember about Gateway API

Hopefully most of these are not surprising, but they sometimes have non-obvious
implications that we'll try and lay out here.

### Gateway API is a `kubernetes.io` API

Gateway API uses the `gateway.networking.k8s.io` API group. This means that,
like APIs delivered in the core Kubernetes binaries, each time a release happens,
the APIs have been reviewed by upstream Kubernetes reviewers, just like the APIs
delivered in the core binaries.

### Gateway API is delivered using CRDs

Gateway API is supplied as a set of CRDs, version controlled using our [versioning
policy][versioning].

The most important part of that versioning policy is that what _appears to be_
the same object (that is, it has the same `group`,`version`, and `kind`) may have
a slightly different schema. We make changes in ways that are _compatible_, so
things should generally "just work", but there are some actions implementations
need to take to make "just work"ing more reliable; these are detailed below.

The CRD-based delivery also means that if an implementation tries to use (that is
get, list, watch, etc) Gateway API objects when the CRDs have _not_ been installed,
then it's likely that your Kubernetes client code will return serious errors.
Tips to deal with this are also detailed below.

The CRD definitions for Gateway API objects all contain two specific
annotations:

- `gateway.networking.k8s.io/bundle-version: <semver-release-version>`
- `gateway.networking.k8s.io/channel: <channel-name>`

The concepts of "bundle version" and "channel" (short for "release channel") are
explained in our [versioning][versioning] documentation.

Implementations may use these to determine what schema versions are installed in
the cluster, if any.

[versioning]: /concepts/versioning

### Changes to the Gateway API CRDs are backwards compatible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my concerns here is that this could get out of sync with our versioning guide or vice versa, since a lot of this section is overlapping. Wherever possible, I'd like to have a single source of truth in our docs for a topic so we don't end up with accidentally conflicting docs in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but I think that having a summary of the key parts is really important for implementers. I'm not sure how to trim this to tell people about the backward compatibility guarantees in any shorter way.


Part of the contract for Gateway API CRDs is that changes _within an API version_
must be _compatible_.

"Within an API Version" means changes to a CRD that occur while the same API version
(`v1alpha2` or `v1` for example) is in use, and "compatible" means that any new
fields, values, or validation will be added to ensure that _previous_
objects _will still be valid objects_ after the change.

This means that once Gateway API objects move to the `v1` API version, then _all_
changes must be compatible.

This contract also means that an implementation will not fail with a higher version
of the API than the version it was written with, because the newer schema being
stored by Kubernetes will definitely be able to be serialized into the older version
used in code by the implementation.

Similarly, if an implementation was written with a _higher_ version, the newer
values that it understands will simply _never be used_, as they are not present
in the older version.

A similar guarantee occurs between the "experimental" and "standard" channels for
objects in the same API version, so an implementation may be written with the
experimental API definitions, but work just fine with having only the standard
definitions installed - there will be fields or values that will never be used.
The same applies for an implementation written using the standard API definitions
running in a cluster with the experimental definitions installed.
youngnick marked this conversation as resolved.
Show resolved Hide resolved

## Implementation Rules and Guidelines

### CRD Management

For a Gateway API implementation to work, the Gateway API CRDs must be installed
in the Kubernetes cluster the implementation is watching.

Implementations have two options: automatically installing CRDs or requiring
installation before working. Both have tradeoffs.

Either way has certain things that SHOULD be true, however:

Whatever method is used, cluster admins SHOULD attempt to ensure that
the Bundle version of the CRDs is not _downgraded_. Although we ensure that
API changes are backwards compatible, changing CRD definitions can change the
storage version of the resource, which could have unforseen effects. Most of the
time, things will probably work, but if it doesn't work, it will most likely
break in weird ways.
youngnick marked this conversation as resolved.
Show resolved Hide resolved

Try your best to ensure that the bundle version doesn't roll backwards. It's safer.

Implementations SHOULD also handle the Gateway API CRDs _not_ being present in
the cluster without crashing or panicing. Exiting with a clear fatal error is
acceptable in this case, as is disabling Gateway API support even if enabled in
configuration.

Practically, for implementations using tools like `controller-runtime` or
similar tooling, they may need to check for the _presence_ of the CRDs by
getting the list of installed CRDs before attempting to watch those resources.
(Note that this will require the implementation to have `read` access to those
resources though.)

#### Automatic CRD installation

Automatic CRD installation also includes automatic installation mechanisms such
as Helm, if the CRDs are included in a Helm chart with the implementation's
installation.

CRD definitions MAY be installed automatically by implementations, and if they do,
they MUST have a way to ensure:

- there are no other Gateway API CRDs installed in the cluster before starting, or
- that the CRD definitions are only installed if they are a higher bundle version
than any existing Gateway API CRDs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even this may not be safe in the case of experimental channel. I think if we say that this can be done, we also need to clearly state the risks of breaking other implementations.


This avoids problems if another implementation is also installed in the cluster
and expects a higher version of the CRDs to be installed.

If the implementation can guarantee that no other implementation will interact
with the cluster, then it MAY automatically install a relevant version of the CRDs.

The ideal method for an automatic installation would require the implementation
to:

- Check if there are any Gateway API CRDs installed in the cluster.
- If not, install its most compatible version of the CRDs.
- If so, only install its version of the CRDs if the bundle version is higher
than the existing one.


Because of our backwards compatibility guarantees, it's also safe for a controller
to flip the install channel between "standard" and "experimental", although
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only safe in one direction: standard -> experimental. Once you've gone that direction, I think it's impossible to safely transition back to standard channel.

implementations MUST NOT do this without consulting the implementation owner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the "implementation owner"? In my reading, if I substitute phrases, this says "although
Istio MUST NOT do this without consulting the Istio maintainers" which seems odd? Or is it the user?

Anyhow, one risk with flipping the channel is another controller flipping it back endlessly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I was intending to say "implementations shouldn't do this without asking a human first" - I've updated, PTAL.


Automatic CRD installation has the advantage that there is less for the
implementation user to do; any required version checking can be performed by
code instead of by the cluster admin.
youngnick marked this conversation as resolved.
Show resolved Hide resolved

#### Manual CRD installation
youngnick marked this conversation as resolved.
Show resolved Hide resolved

Manual CRD installation has the advantage that the implementer needs to maintain
less code; however, it pushes the responsibility for correctly managing the
Gateway API CRDs back to the cluster admin, who may not have as much context
as is provided here.

Implementations MAY require the installation to be done manually; if so, the
installation instructions SHOULD include commands to check if there are any other
CRDs installed already and verify that the installation will not be a downgrade.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup maybe we should provide a common command to do this.

Istio, for instance, has an example of a suboptimal approach: https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#setup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is a great followup. I'll create an issue a bit later for it.


### Conformance and Version compatibility

A conformant Gateway API implementation is one that passes the conformance tests
that are included in each Gateway API bundle version release.

An implementation MUST pass the conformance suite with _no_ skipped tests to be
conformant. Tests may be skipped during development, but a version you want to
be conformant MUST have no skipped tests.

Extended features may, as per the contract for Extended status, be disabled.

Gateway API conformance is version-specific. An implementation that passes
conformance for version N may not pass conformance for version N+1 without changes.

Implementations SHOULD submit a report from the conformance testing suite back
to the Gateway API Github repo containing details of their testing.

The conformance suite output includes the Gateway API version supported.

#### Version compatibility

Once v1.0 is released, for implementations supporting Gateway and GatewayClass,
they MUST set a new Condition, `SupportedVersion`, with `status: true` meaning
that the installed CRD version is supported, and `status: false` meaning that it
is not.

### Standard Status fields and Conditions

Gateway API has many resources, but when designing this, we've worked to keep
the status experience as consistent as possible across objects, using the
Condition type and the `status.conditions` field.

Most resources have a `status.conditions` field, but some also have a namespaced
field that _contains_ a `conditions` field.

For the latter, Gateway's `status.listeners` and the Route `status.parents`
fields are examples where each item in the slice identifies the Conditions
associated with some subset of configuration.

For the Gateway case, it's to allow Conditions per _Listener_, and in the Route
case, it's to allow Conditions per _implementation_ (since Route objects can
be used in multiple Gateways, and those Gateways can be reconciled by different
implementations).

In all of these cases, there are some relatively-common Condition types that have
similar meanings:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that the definitions we have here will become out of sync with the definitions we have in the spec, do we need to define these in both places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these changing any time soon, so I'm not worried about this.

- `Accepted` - the resource or part thereof contains acceptable config that will
youngnick marked this conversation as resolved.
Show resolved Hide resolved
produce some configuration in the underlying data plane that the implementation
controls. This does not mean that the _whole_ configuration is valid, just that
_enough_ is valid to produce some effect.
- `Programmed` - this represents a later phase of operation, after `Accepted`,
when the resource or part thereof has been Accepted and programmed into the
underlying dataplane. Users should expect the configuration to be ready for
traffic to flow _at some point in the near future_. This Condition does _not_
say that the dataplane is ready _when it's set_, just that everything is valid
and it _will become ready soon_. "Soon" may have different meanings depending
on the implementation.
- `ResolvedRefs` - this Condition indicates that all references in the resource
or part thereof were valid and pointed to an object that both exists and allows
that reference. If this Condition is set to `status: false`, then _at least one_
reference in the resource or part thereof is invalid for some reason, and the
`message` field should indicate which one are invalid.

Implementers should check the godoc for each type to see the exact details of
these Conditions on each resource or part thereof.

Additionally, the upstream `Conditions` struct contains an optional
`observedGeneration` field - implementations MUST use this field and set it to
the `metadata.generation` field of the object at the time the status is generated.
This allows users of the API to determine if the status is relevant to the current
version of the object.


### Resources details

For each currently available conformance profile, there are a set of resources
that implementations are expected to reconcile.

The following section goes through each Gateway API object, indicates expected
behaviors, and which conformance profiles that object is included in.

#### GatewayClass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we decide what belongs in here vs https://gateway-api.sigs.k8s.io/api-types/gatewayclass/ vs API Spec? I'm worried that we're going to end up with subtle differences in each individual source if we're not very careful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to keep this very general, with only the main status being captured here. I don't see us changing the status handling or controllerName behavior - we can't once we go GA.

Maybe the guideline here is that we can only mention stable or core fields?


GatewayClass has one main `spec` field - `controllerName`. Each implementation
is expected to claim a domain-prefixed string value (like
`example.com/example-ingress`) as its `controllerName`.

Implementations MUST watch _all_ GatewayClasses, and reconcile GatewayClasses
that have a matching `controllerName`. The implementation must choose at least
one compatible GatewayClass out of the set of GatewayClasses that have a matching
`controllerName`, and indicate that it accepts processing of that GatewayClass
by setting an `Accepted` Condition to `status: true` in each. Any GatewayClasses
that have a matching `controllerName` but are _not_ Accepted must have the
`Accepted` Condition sett to `status: false`.

Implementations MAY choose only one GatewayClass out of the pool of otherwise
acceptable GatewayClasses if they can only reconcile one, or, if they are capable
of reconciling multiple GatewayClasses, they may also choose as many as they like.

If something in the GatewayClass renders it incompatibie (at the time of writing,
the only possible reason for this is that there is a pointer to a `paramsRef`
object that is not supported by the implementation), then the implementation
SHOULD mark the incompatible GatewayClass as not `Accepted`.

Watched in profiles:

- HTTP
- TLS
youngnick marked this conversation as resolved.
Show resolved Hide resolved

#### Gateway

Gateway objects MUST refer in the `spec.gatewayClassName` field to a GatewayClass
that exists and is `Accepted` by an implementation for that implementation to
reconcile them.

Gateway objects that fall out of scope (for example, because the GatewayClass
they reference was deleted) for reconciliation MAY have their status removed by
the implementation as part of the delete process, but this is not required.

Watched in profiles:

- HTTP
- TLS

#### General Route information

All Route objects share some properties:

- They MUST be attached to an in-scope parent for the implementation to consider
them reconcilable.
- The implementation MUST update the status for each in-scope Route with the
relevant Conditions, using the namespaced `parents` field. See the specific Route
types for details, but this usually includes `Accepted`, `Programmed` and
`ResovledRefs` Conditions.
- Routes that fall out of scope SHOULD NOT have status updated, since it's possible
that these updates may overwrite any new owners. The `observedGeneration` field
will indicate that any remaining status is out of date.


#### HTTPRoute

HTTPRoutes route HTTP traffic that is _unencrypted_ and available for inspection.
This allows the HTTPRoute to use HTTP properties, like path, method, or headers
in its routing directives.

Watched in profiles:

- HTTP
- MESH

#### TLSRoute

TLSRoutes route encrypted TLS traffic using the SNI header, _without decrypting
the traffic stream_, to the relevant backends.

Watched in profiles:

- TLS

#### TCPRoute

TCPRoutes route a TCP stream that arrives at a Listener to one of the given
backends.

Not currently included in any conformance profiles.

#### UDPRoute

UDPRoutes route UDP packets that arrive at a Listener to one of the given
backends.

Not currently included in any conformance profiles.

#### ReferenceGrant

ReferenceGrant is a special resource that is used by resource owners in one
namespace to _selectively_ allow references from Gateway API objects in other
namespaces.

A ReferenceGrant is created in the same namespace as the thing it's granting
reference access to, and allows access from other namespaces, from other Kinds,
or both.

Implementations that support cross-namespace references MUST watch ReferenceGrant
and reconcile any ReferenceGrant that points to an object that's referred to by
an in-scope Gateway API object.

Watched in profiles:

- HTTP
- TLS
- MESH
9 changes: 9 additions & 0 deletions site-src/references/spec.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# API Specification

This page contains the API field specification for Gateway API.

However, the [Implementer's Guide][implguide] also contains requirements for implementers
that don't fit cleanly into a single field's documentation here. The API spec
as a whole must be considered to be represented by both documents.

[implguide]: /guides/implementers-guide


REPLACE_WITH_GENERATED_CONTENT