Skip to content

Latest commit

 

History

History
610 lines (505 loc) · 30.9 KB

A36-xds-for-servers.md

File metadata and controls

610 lines (505 loc) · 30.9 KB

A36: xDS-Enabled Servers

Abstract

Bring xDS-based configuration to Servers. xDS has many features and this gRFC will not provide support for any of them in particular. However, it will provide the central plumbing and APIs to add support for specific features and have them work without additional user code changes.

There are multiple pieces involved:

  • The API surface exposed to users to enable the new system.
  • The gRPC-internal plumbing to allow injecting behavior into servers, and changing that behavior on-the-fly.
  • The specific xDS protocol behavior gRPC will use to retrieve configuration.

Background

Since gRFC A27: xDS-Based Global Load Balancing, Channels have had growing support for understanding xDS configuration and configuring themselves. However, Servers are also important to connect to xDS in order to power TLS communication, authorization, fault injection, rate limiting, and other miscellaneous features.

On client-side, users are able to opt-in to the behavior via the xds: scheme which enables the xDS Name Resolver. Servers in gRPC do not use Name Resolution and certainly don't have things like Load Balancing APIs which have been leveraged heavily on the client-side design. While server-side will need to use a different approach, users should have a similar experience: they make a trivial code change to opt-in to xDS, and from that point on newly-added xDS-powered gRPC features can be used without code changes.

Related Proposals:

Proposal

Each language will create an "XdsServer" API that will wrap the normal "Server" API. This will allow the implementation to see the listening ports, start event, stop event, and also allow it access to the server to change behavior (e.g., via an interceptor). Users will use the XdsServer API to construct the server instead of the existing API. XdsServer will not support ephemeral ports as part of this gRFC but may be enhanced in the future.

Credential configuration will be managed separately by providing XdsServerCredentials, similar to client-side. The user must pass the XdsServerCredentials to the XdsServer API to enable control-plane managed TLS certificates. While the user-facing API is discussed here, the precise implementation details are covered in gRFC A29. The user is free to pass other ServerCredentials types to XdsServer and they will be honored. Note that RBAC is authz, not authn, so it would be enabled by using the XdsServer, even without XdsServerCredentials.

Serving and Not Serving

To serve RPCs, the XdsServer must have its xDS configuration, provided via a Listener resource and potentialy other related resources. When its xDS configuration does not exist or has not yet been received the server must be in a "not serving" mode. This is ideally implemented by not listen()ing on the port. If that is impractical an implementation may be listen()ing on the port, but it must also accept() and immediately close() connections, making sure to not send any data to the client (e.g., no TLS ServerHello nor HTTP/2 SETTINGS). With either behavior, client connection attempts will quickly fail and clients would not perform further attempts without a backoff. Load balancing policies like pick_first would naturally attempt connections to any remaining addresses to quickly find an operational backend. However, the accept()+close() approach will be improperly detected as server liveness for TCP heath checking.

If the xDS bootstrap is missing or invalid, implementations would ideally fail XdsServer startup, but it is also acceptable to consider it a lack of xDS configuration and enter a permanent "not serving" mode.

If the server is unable to open serving ports (e.g., because the port is already in use by another process), XdsServer startup may fail or it may enter "not serving" mode. If entering "not serving" mode, opening the port must be retried automatically (e.g., retry every minute).

Communication failures do not impact the XdsServer's xDS configuration; the XdsServer should continue using the most recent configuration until connectivity is restored. However, XdsServer must accept configuration changes provided by the xDS server, including resource deletions. If that causes the XdsServer to lack essential configuration (e.g., the Listener was deleted) the server would need to enter "not serving" mode. "Not serving" requires existing connections to be closed, but already-started RPCs should not fail. The XdsServer is permitted to use the previously-existing configuration to service RPCs during a two-phase GOAWAY to avoid any RPC failures, or it may use a one-phase GOAWAY which will fail racing RPCs but in a way that the client may transparently retry. The XdsServer is free to use a different "not serving" strategy post-startup than for the initial startup.

The XdsServer does not have to wait until server credentials (e.g., TLS certs) are available before accepting connections; since XdsServerCredentials might not be used, the server is free to lazily load credentials. However, the XdsServer should keep the credentials cache fresh and up-to-date after that initial lazy-loading, as it is clear at that point that XdsServerCredentials are being used.

The XdsServer API will allow applications to register a "serving state" callback to be invoked when the server begins serving and when the server encounters errors that force it to be "not serving". If "not serving", the callback must be provided error information, for debugging use by developers. The error information should generally be language-idiomatic, but determining the cause of the error does not need to be machine-friendly. If the application does not register the callback, XdsServer should log any errors and each serving resumption after an error, all at a default-visible log level.

XdsServer's start must not fail due to transient xDS issues, like missing xDS configuration from the xDS server. If XdsServer's start blocks waiting for xDS configuration an application can use the serving state callback to be notified of issues preventing startup progress.

xDS Protocol

The GRPC_XDS_BOOTSTRAP file will be enhanced to have a new field:

{
  // A template for the name of the Listener resource to subscribe to for a gRPC
  // server. If the token `%s` is present in the string, all instances of the
  // token will be replaced with the server's listening "IP:port" (e.g.,
  // "0.0.0.0:8080", "[::]:8080").
  "server_listener_resource_name_template": "example/resource/%s",
  // ...
}

XdsServer will use the normal XdsClient to communicate with the xDS server. xDS v3 support is required. xDS v2 support is optional and may have lower quality standards when adhering to the specifics of v2 when they differ from v3. There is no default value for server_listener_resource_name_template so if it is not present in the bootstrap then server creation or start will fail or the XdsServer will become "not serving". XdsServer will perform the %s replacement in the template (if the token is present) to produce a listener name. The XdsServer will start an XdsClient watch on the listener name for a envoy.config.listener.v3.Listener resource. No special character handling of the template or its replacement is performed. For example, with an address of [::]:80 and a template of grpc/server?xds.resource.listening_address=%s, the resource name would be grpc/server?xds.resource.listening_address=[::]:80.

To be useful, the xDS-returned Listener must have an address that matches the listening address provided. The Listener's address would be a TCP SocketAddress with matching address and port_value. The XdsServer must be "not serving" if the address does not match.

The xDS client must NACK the Listener resource if Listener.listener_filters is non-empty. It must also NACK the Listener resource if Listener.use_original_dst is present and true.

The xDS client must NACK the Listener resource if any entry in filter_chains or default_filter_chain is invalid. FilterChains are valid if all of their network filters are supported by the implementation, the network filters' configuration is valid, and the filter names are unique within the filters list. Additionally, the FilterChain is only valid if filters contains exactly one entry for the envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager filter, HttpConnectionManager hereafter, as the last entry in the list.

Filter types are the type contained in the typed_config Any. If the type is udpa.type.v1.TypedStruct, then its type_url is used instead. There is not an equivalent of envoy.config.route.v3.FilterConfig for network filters. Note that this disagrees with the current documentation in listener_components.proto, but that documentation is trailing a change to Envoy's behavior.

HttpConnectionManager support is required. HttpConnectionManager must have valid http_filters, as defined by A39: xDS HTTP Filter Support.

If the envoy.extensions.filters.http.router.v3.Router is not present in http_filters, A39 calls for inserting a special filter that fails all RPCs. If an XdsServer implementation does not use RouteConfiguration (as is expected today) and does not support any HTTP filters other than the hard-coded Router behavior called for in A39, then the special filter that fails all RPCs is not required. This is to allow implementations that only support L4 xDS features to avoid L7 plumbing and implementation. This has no impact on the resource validation and NACKing behavior called for in A39.

If an XdsServer implementation uses RouteConfiguration or supports any HTTP filters other than the hard-coded Router, then HttpConnectionManager.route_config and HttpConnectionManager.rds must be supported and RouteConfigurations must be validated. RouteConfiguration validation logic inherits all previous validations made for client-side usage as RDS does not distinguish between client-side and server-side. That is predomenently defined in gRFC A28, although note that configuration for all VirtualHosts have been validated on client-side since sharing the XdsClient was introduced, yet was not documented in a gRFC. The validation must be updated to allow Route.non_forwarding_action as a valid action. The VirtualHost is selected on a per-RPC basis using the RPC's requested :authority. Routes are matched the same as on client-side. Route.non_forwarding_action is expected for all Routes used on server-side and Route.route continues to be expected for all Routes used on client-side; a Route with an inappropriate action causes RPCs matching that route and reaching the end of the filter chain to fail with UNAVAILABLE. If HttpConnectionManager.rds references a NACKed resource without a previous good version, an unavailable resource because of communication failures with control plane or a triggered loading timeout, or a non-existent resource, then all RPCs processed by that HttpConnectionManager will fail with UNAVAILABLE.

There are situations when an XdsServer can clearly tell the configuration will cause errors, yet it still applies the configuration. In these situations the XdsServer should log a warning each time it receives updates for configuration in this state. This is known as "configuration error logging." If an XdsServer logs such a warning, then it should also log a single warning once there are no longer any such errors. Configuration error logging is currently limited to broken RDS resources and an unsupported Route action (i.e., is not non_forwarding_action), both of which cause RPCs to fail with UNAVAILABLE as described above.

Like in Envoy, updates to a Listener cause all older connections on that Listener to be gracefully shut down (i.e., "drained") with a default grace period of 10 minutes for long-lived RPCs, such that clients will reconnect and have the updated configuration apply. This applies equally to an update in a RouteConfiguration provided inline via the route_config field as it is part of the Listener, but it does not apply to an updated RouteConfiguration provided by reference via rds field. Draining must not cause the server to spuriously fail RPCs or connections, so the listening port must not be closed as part of the process. Applying updates to a Listener should be delayed until dependent resources have been attempted to be loaded (e.g., via RDS). The existing resource loading timeout in XdsClient prevents the update from being delayed indefinitely and the duplicate resource update detection in XdsClient prevents replacing the Listener when nothing changes. The grace period should be adjustable when building the XdsServer and should be described as the "drain grace time."

FilterChainMatch

Although FilterChain.filterss will not be used as part of this gRFC, the FilterChain contains data that may be used like TLS configuration in transport_socket. When looking for a FilterChain, the standard matching logic must be used. The most specific filter_chain_match of the repeated filter_chains should be found for the specific connection and if one is not found, the default_filter_chain must be used if it is provided. If there is no default_filter_chain and no chain matches, then the connection should be closed without any bytes being sent.

If the most-specific matching logic might not produce a unique result, then the Listener must be NACKed. This case can only occur if there are duplicate matchers. However, when checking for duplicates the matchers need to be normalized.

For normalization, each matcher should be replaced with the Cartesian product of its present fields, which avoids repeating fields and converts the matchers into disjunctive normal form. That is, the matcher:

prefix_ranges: 192.168.0.0/24, 10.1.0.0/16
source_prefix_ranges: 192.168.1.0/24, 10.2.0.0/16
source_type: EXTERNAL

Should be treated as four matchers:

prefix_ranges: 192.168.0.0/24
source_prefix_ranges: 192.168.1.0/24
source_type: EXTERNAL

prefix_ranges: 192.168.0.0/24
source_prefix_ranges: 10.2.0.0/16
source_type: EXTERNAL

prefix_ranges: 10.1.0.0/16
source_prefix_ranges: 192.168.1.0/24
source_type: EXTERNAL

prefix_ranges: 10.1.0.0/16
source_prefix_ranges: 10.2.0.0/16
source_type: EXTERNAL

CIDRs need mulitple normalizations. The prefix_len should be adjusted to the valid range of 0-32 (inclusive) for IPv4 and 0-128 (inclusive) for IPv6. An absent prefix_len should be considered equivalent to 0. Unused low bits of the address_prefix must be ignored, which can be achieved by replacing it with the network mask. Note that a CIDR with prefix_len of 0 is not the same as an unspecified CIDR, because it still matches a network type (IPv4 vs IPv6).

The following is a snippet of all current fields of FilterChainMatch, and if they will be handled specially during connection matching:

message FilterChainMatch {
  enum ConnectionSourceType {
    ANY = 0;
    SAME_IP_OR_LOOPBACK = 1;
    EXTERNAL = 2;
  }
  google.protobuf.UInt32Value destination_port = 8; // Always fail match
  repeated core.v3.CidrRange prefix_ranges = 3;
  ConnectionSourceType source_type = 12;
  repeated core.v3.CidrRange source_prefix_ranges = 6;
  repeated uint32 source_ports = 7;
  repeated string server_names = 11; // Always fail match
  string transport_protocol = 9; // Only matches "raw_buffer"
  repeated string application_protocols = 10; // Always fail match
}

All fields are "supported," however, we know that some features separate from the match are unsupported. Fields depending on missing features are guaranteed a result that can be hard-coded. This applies to destination_port which relies on use_original_dst. It also applies to server_names, transport_protocol, application_protocols which depend on Listener.listener_filters.

While an always-failing matcher may be pruned and ignored when matching a new connection, it must still be checked when verifying that the most-specific matching logic is guaranteed to produce a unique result during Listener validation.

Language-specifics

The overall "wrapping" server API has many language-specific ramifications. We show each individual language's approach. For wrapped languages we just show a rough sketch of how they would be done.

XdsClients may be shared without impacting this design. If shared, any mentions of "creating" or "shutting down" an XdsClient would simply mean "acquire a reference" and "release a reference" on the shared instance, or similar behavior. Such sharing does not avoid the need of shutting down XdsClients when no longer in use; they are a resource and must not be leaked.

C-Core

C-Core will expose a new opaque type grpc_server_config_fetcher and API to create a server config fetcher for xDS, and register it with a server thereafter. The xDS server config fetcher will also create the XdsClient object needed to communicate with the control plane.

typedef struct {
  grpc_status_code code;
  const char* error_message;
} grpc_serving_status_update;

typedef struct {
  void (*on_serving_status_update)(void* user_data, const char* uri,
                                   grpc_serving_status_update update);
  void* user_data;
} grpc_server_xds_status_notifier;

typedef struct grpc_server_config_fetcher grpc_server_config_fetcher;

/** Creates an xDS config fetcher. */
GRPCAPI grpc_server_config_fetcher* grpc_server_config_fetcher_xds_create(
    grpc_server_xds_status_notifier notifier, const grpc_channel_args* args);

/** Destroys a config fetcher. */
GRPCAPI void grpc_server_config_fetcher_destroy(
    grpc_server_config_fetcher* config_fetcher);

/** Sets the server's config fetcher.  Takes ownership. Must be called before
    adding ports. */
GRPCAPI void grpc_server_set_config_fetcher(
    grpc_server* server, grpc_server_config_fetcher* config_fetcher);

The server needs to be configured with the config fetcher before we add ports to the server. Note that we will initially not support ephemeral ports on an xDS-enabled server, but we might add that later. This arises from implementation details where C-Core currently invokes both bind() and listen() as part of grpc_tcp_server_add_port(), and as detailed earlier, since we would ideally not want to invoke listen() until we have a valid xDS configuration, grpc_tcp_server_add_port() is only invoked when the server is first ready to serve. (Note that any future transitions to the not serving state are still dealt with via the accept()+close() method.) This behavior might change in the future by splitting grpc_tcp_server_add_port so that bind() and listen() are done separately, allowing bind() to be invoked when ports are added to the server and allowing the API to return the bound port for wildcard port inputs. Alternatively, C-Core might choose to follow the accept()+close() from the start.

grpc_server_config_fetcher_xds_create takes a notifier arg of the type grpc_server_xds_status_notifier. The function pointer on_serving_status_update if not NULL configures the xDS server config fetcher to invoke when the serving status of the server changes. A status code of GRPC_STATUS_OK signifies that the server is serving, and not-serving otherwise. The API does not provide any guarantees around duplicate updates.

C++

C++ will expose a new type XdsServerBuilder that mirrors the ServerBuilder API. The XdsServerBuilder will use the C core API described above to configure the server with the xDS server config fetcher. The server created after BuildAndStart() on the XdsServerBuilder will be xDS enabled.

class XdsServerServingStatusNotifierInterface {
 public:
  struct ServingStatusUpdate {
    ::grpc::Status status;
  };

  virtual ~XdsServerServingStatusNotifierInterface() = default;

  // \a uri contains the listening target associated with the notification. Note
  // that a single target provided to XdsServerBuilder can get resolved to
  // multiple listening addresses.
  // The callback is invoked each time there is an update to the serving status.
  // The API does not provide any guarantees around duplicate updates.
  // Status::OK signifies that the server is serving, while a non-OK status
  // signifies that the server is not serving.
  virtual void OnServingStatusUpdate(std::string uri,
                                     ServingStatusUpdate update) = 0;
};

class XdsServerBuilder : public ::grpc::ServerBuilder {
 public:
  // It is the responsibility of the application to make sure that \a notifier
  // outlasts the life of the server. Notifications will start being made
  // asynchronously once `BuildAndStart()` has been called. Note that it is
  // possible for notifications to be made before `BuildAndStart()` returns.
  void set_status_notifier(XdsServerServingStatusNotifierInterface* notifier);

The support for ephemeral ports on an xDS-enabled server is based on the support provided by C-Core, and hence not supported at the moment.

Wrapped Languages

This section is a sketch to convey the "feel" of the API. But details may vary.

Each wrapped language will need a custom "xds server" creation API. This will vary per-language. We use Python here just as an example of how it may look.

Python could add a grpc.xds_server(...) that mirrors grpc.server(...). Since ports are added after the server is created, the returned grpc.Server object may be an xds-aware Server to coordinate with the C API when server.start() is called. However, its implementation should be trivial as it could mostly delegate to a normal server instance.

It would be possible to manage the lifetime of the XdsClient from the server, based on server.stop(), but given C++ lacks this avenue the C API probably will not need this notification.

Java

Create an XdsServerBuilder that extends ServerBuilder and delegates to a "real" builder. The server returned from builder.build() would be an xDS-aware server delegating to a "real" server instance. The XdsServerBuilder would install an AtomicReference<ServerInterceptor>-backed interceptor in the built server and pass the listening port and interceptor reference to the xDS-aware server when constructed. Since interceptors cannot be removed once installed, the builder may only be used once; build() should throw if called more than once. To allow the XdsServerBuilder to create additional servers if necessary, mutation of the builder after build() should also throw.

The xDS-aware server’s start() will create the XdsClient with the passed port and wait for initial configuration before delegating to the real start(). If the xDS bootstrap is missing it will throw an IOException within start(). The XdsClient will be shut down when the server is terminated (generally noticed via shutdownNow()/awaitTermination()). When new or updated configuration is received, it will create a new intercepter and update the interceptor reference.

XdsServerBuilder will have an xdsServingStatusListener(XdsServingStatusListener) method. XdsServingStatusListener will be defined as:

package io.grpc.xds;

public interface XdsServingStatusListener {
  void onServing();
  void onNotServing(Throwable t);
}

It is an interface instead of an abstract class as additional methods are not expected to be added before Java 8 language features are permitted in the code base. If this proves incorrect, an additional interface can be added.

If not specified, a default implementation of XdsServingStatusListener will be used. It will log the exception and log calls to onServing() following a call to onNotServing() at WARNING level.

In order to allow transport-specific configuration, the XdsServerBuilder will have a @ExperimentalApi ServerBuilder transportBuilder() method whose return value can be cast to NettyServerBuilder for experimental API configuration.

NettyServerBuilder will add an internal method eagAttributes(Attributes). It will plumb those attributes to GrpcHttp2ConnectionHandler.getEagAttributes() as implemented by NettyServerHandler, which currently just returns Attributes.EMPTY. XdsServerBuilder will then specify eagAttributes to inject credential information for the XdsServerCredential, similar to client-side, although with the added step of needing to process the FilterChainMatch for the specific connection.

Go

Create an xds.GRPCServer struct that would internally contain an unexported grpc.Server. It would inject its own StreamServerInterceptor and UnaryServerInterceptors into the grpc.Server. The interceptors would be controlled by an atomic.Value or mutex-protected field that would update with the configuration.

GRPCServer.Serve() takes a net.Listener instead of a host:port string to listen on, so as to be consistent with the Serve() method on the grpc.Server. The implementation expects the Addr() method on the passed in net.Listener to return an address of type net.TCPAddr. It then creates an XdsClient and observes the Addr() to create a watch. Before Serve(net.Listener) returns, it will cancel the watch registered on the XdsClient. As part of Stop()/GracefulStop(), the XdsClient is shut down. Note that configuration is nominally per-address, but there is only one set of interceptors, so the interceptors will need to look up the per-address configuration for each RPC.

Service registration is done on a method in the generated code which accepts a grpc.ServiceRegistrar interface. Both grpc.Server and xds.GRPCServer implement this interface and therefore the latter can be passed to service registration methods just like the former.

// instead of
s := grpc.NewServer()
pb.RegisterGreeterServer(s, &server{})
// it'd be
s := xds.NewGRPCServer()
pb.RegisterGreeterServer(s, &server{})

Package credentials/xds exports a NewServerCredentials() function which returns a transport credentials implementation which uses security configuration received from the xDS server. The user is expected to provide this credentials as part of the grpc.ServerOptions passed to xds.NewGRPCServer(), if they are interested in sourcing their security configuration from the xDS server. It is the responsibility of the xds.GRPCServer implementation to forward the most recently received security configuration to the credentials implementation.

Rationale

XdsServer server startup does not fail due to configuration from xDS server to avoid applications getting hung in the rare event of an issue during startup. If the start API was one-shot, then users would need to loop themselves and many might introduce bugs in the rarely-run code path or simply not handle the case at all.

We chose the XdsServer approach over an "xDS Interceptor." In such a design, the user would just construct an XdsInterceptor and add it to their server. Unfortunately the interceptor would not be informed of the server lifecycle and so could not manage the connection to the xDS control plane. It also is not informed of listening ports until RPCs actually arrive on those ports and is not able to learn when a port is no longer in use (as is possible in grpc-go). It would also be difficult to share the XdsClient configuration between the interceptor and credentials. It simply cannot be used in its current state.

We chose the XdsServer approach over creating a new server-side plugin API. In such a design we could create a new plugin API where the user could construct the Xds module and add it to the server. The events needing to be exposed are: server start, server stop, listening socket addition, listening socket removal. Those events are basically exactly what are on the server API today, making this very similar functionally to wrapping, but in an awkward and repetitive way. It would also need some method of plumbing the XdsClient for the Xds ServerCredential to use, which is unclear how that would be accomplished.

Implementation

All prototyping work occurred in Java so it is furthest along, with only smaller changes necessary to converge with the design presented here. Go is expected to be complete very soon after Java. C++/wrapped languages are expected to lag waiting on the C core implementation.

  • C++/wrapped languages. The surface APIs should be relatively easy compared to the C core changes. The C core changes are being investigated by @markdroth and the plan is to flesh them out later when we have a better idea of the appropriate structure. So the gRFC just tries to show that the API design would work for C++/wrapped languages but there is currently no associated implementation work.

    • This design changes the client-side behavior for an inappropriate action and requires the RPC be processed by the filters before being failed. C will initially fail the client-side RPCs without filter processing. Implementing the full behavior will be follow-up work because the behavior difference isn't important for the currently-supported filters and the change is more invasive in C than other languages.
  • Java. Implementation work primarily by @sanjaypujare, along with gRFC A29. Added classes are in the grpc-xds artifact and io.grpc.xds package.

  • Go. Implementation work by @easwars