GatewayAPI topology (DAG 1.0) #530

eguzki · 2024-04-10T10:11:54Z

#447 introduced the GatewayAPI topology (DAG) implementation

This (epic) issue tracks the usage of GatewayAPI topology (DAG) implementation by the different kuadrant controllers

The GatewayAPI Topology

The controller builds what it is named "The GatewayAPI Topology". It reads ALL Gateways, HTTPRoutes and policies (only those the controller cares about) in the cluster. With all that info, the controller creates a new instance of the Gateway API topology as well as a set of indexes (another "view" of the topology) that can be queried. Interesting to note here that field indexers can optionally be added to the manager to build internal index caches. One example being implemented is an index to query "give me all the HTTPRoutes parenting Gateway A".

The current implementation of the "GatewayAPI Topology" does not use a k8s client to query Gateway API and kuadrant policies. Instead, fetching resources must be done externally and then the GatewayAPI topology receives a list of resources. It could be done otherwise and have generic function with a k8s client to build topology for a given policy of one kind. The main reason for the current implementation is to decouple GatewayAPI topology and the cluster. Meaning that I can build GatewayAPI topology from a set of resources filtered with some criteria.

The implementation assumes "cheap" operations of reading ALL the gateways, routes and policies. That assumption comes from the controller runtime k8s client that holds a cache. At least for list and get operations. That cache is being kept up to date in the background when there are updates/deletes in the objects triggered by kuadrant controllers or any other third party controllers or any other k8s API server user. Check for this comment with a verification of this "cheap" operation assumption. Indeed, the resource lists are cached across reconciliation events and controllers.

The topology implementation is based on a pure Directed Acyclic Graph data structure implementation that is totally decoupled from the GatewayAPI relations. On top of that, there is an implementation of the GatewayAPI topology which builds a graph from Gateways and HTTPRoutes. In the future, it can be extended for additional GatewayAPI API types. On top of the GatewayAPI topology, a number of indexes are created which are tailored for the needs of the kuadrant controllers. Thus:

Kuadrant controller 
       |
       |
       V
Custom topology indexes (views)
       |
       |
       V
GatewayAPI topology (focused on one gateway)
       |
       |
       V
     DAG

The text was updated successfully, but these errors were encountered:

guicassolato · 2024-04-11T09:38:26Z

I like the direction this is going really. In fact, I now wonder if the approach adopted for the WasmPlugin (#447) and (ongoing) for the Limitador CR (#527) shouldn't be a model for all internal resources managed by the Kuadrant Operator, and not resources that map 1:1 to the number of gateway objects only.

What I mean by that is no longer having any controller that reconciles a policy. Instead, policies would be treated as just another source of events that triggers reconciliation through mapping. All events would map to the affected gateways, and therefore the input to the reconciliation function becomes the gateway object always.

The kinds of resources related to policies to actually reconcile are:

WasmPlugin
EnvoyFilter
Limitador
AuthConfig
AuthorizationPolicy
DNSRecord
Certificate

The reconciliation of each of these resources can be independent events from each other, by being handled by dedicated reconcilers. Within each reconciliation process, we could leverage concurrency in Go to compute the desired states of resources known upfront to be truly independent from each other (if there is any, given by the number of branches down from a gateway node until that kind of resource.)

In the future, if we decide to grow the hierarchy upwards and, say, turn the GatewayClass the new root (thus replacing the Gateway), it should be easy to adapt the mappers and wrap every reconciliation function within another loop. Arguably, this could be done in a generic way from day 1.

I realise this approach increases the computation load, compared to the current attempt to isolate the blast by focusing on each individual policy object. On the other hand, I believe it's more consistent with the expectation of a "state of the world" kind of reconciliation. Plus, computing a bigger desired state does not mean necessarily more requests to the API server.

The bigger challenge IMO is status updating, but I believe even on this one we can win with this approach. The status of the reconciled objects (when available) is straightforward as it can be set at each reconciliation function. The status of the policy objects, on the hand, depends on gathering the final states of all reconciliation functions to only then to be set. If every step (i.e. every internal resource kind, every reconciliation function) has a corresponding status condition added to the policy, the final one should be a no brainer. Finally, the state the gateway objects and routes can wait for the policy and/or be partially and incrementally set by the reconciliation functions.

At the level of the Kuadrant instance, the kinds of resource to reconcile are:

IstioOperator
(Istio) ConfigMap
ServiceMeshControlPlane

I'd like to propose modifying the list of controllers in the description to the list of resource kinds above. I.e., no more "AuthPolicy controller" or "RateLimitPolicy controller", but "AuthConfigReconciler", "AuthorizationPolicyReconciler", "WasmPluginReconciler", "LimitadorReconciler", etc.

eguzki · 2024-04-11T13:10:35Z

The main motivation of moving logic from "policy controller" to specific resource controller is that each controller can define their own mappers and predicates to filter events. This is a powerful capability that totally affects the controller's logic. Additionally, the reconciliation of some elements are not coupled with others. Responsibility is decoupled by design. But that also comes with a price. And that brings me to "policy controllers". They still need to exist, as far as I can see now. The policy controllers main responsibility would be basically maintain updated status. Plus other features centered on the policy itself. For instance the constraint of network resources being targeted by a single kuadrant policy (per type). That functionaliy is owned, IMO, to the "policy controller".

The controller reconciliation function input cannot be always Gateways (or gateway classes). One example is Limitador's limits controller. There is one limitador object to be reconciled per kuadrant instance (currently only one allowed per cluster). So for this particular case, the controller reconciliation function input is the kuadrant CR.

I'd like to propose modifying the list of controllers in the description to the list of resource kinds above. I.e., no more "AuthPolicy controller" or "RateLimitPolicy controller", but "AuthConfigReconciler", "AuthorizationPolicyReconciler", "WasmPluginReconciler", "LimitadorReconciler", etc.

Sure 👍

guicassolato · 2024-04-11T15:46:41Z

The controller reconciliation function input cannot be always Gateways (or gateway classes). One example is Limitador's limits controller. There is one limitador object to be reconciled per kuadrant instance (currently only one allowed per cluster). So for this particular case, the controller reconciliation function input is the kuadrant CR.

This makes sense and it's a more balanced view than mine from before actually. Depending on the kind of object, we should pick the correct root of the tree from where to reconcile. And I guess the "correct" one is the first common node that would require no jumping to sibling nodes to compute the desired state – as you exemplified for the Limitador CR, whose root cannot be the Gateway.

IOW, if you can create a Go routine that reduces the problem to the size of the subtree rooted by the object passed as input to the reconciliation function, then you're good. But you shouldn't have to query anything outside this subtree.

I could argue that today the Kuadrant CR is the tip and true root of the entire tree.

The main motivation of moving logic from "policy controller" to specific resource controller is that each controller can define their own mappers and predicates to filter events. This is a powerful capability that totally affects the controller's logic. Additionally, the reconciliation of some elements are not coupled with others. Responsibility is decoupled by design. But that also comes with a price. And that brings me to "policy controllers". They still need to exist, as far as I can see now. The policy controllers main responsibility would be basically maintain updated status. Plus other features centered on the policy itself. For instance the constraint of network resources being targeted by a single kuadrant policy (per type). That functionaliy is owned, IMO, to the "policy controller".

Here I see almost no conflict with what I said actually. What you call "policy controller" is what I implied as needed for the status reconciliation. I would call those "xPolicyStatusReconciler" and stick with the pattern of a reconciler is named after the resource kind it reconciles, not the resource kind it reconciles from.

As for the validation, I argue it does not belong to this controller. If you deem a policy to be invalid at this stage, it's already too late as the other reconcilers may have already created resources as if the policy were valid. All validation should be performed at the API level IMO, which is not the case today, I know.

eguzki · 2024-09-10T14:56:21Z

we can close this. Already working on DAG 2.0

Follow up on Kuadrant/architecture#29 Kuadrant/architecture#95

guicassolato added this to Kuadrant Apr 10, 2024

eguzki moved this to In Progress in Kuadrant Apr 10, 2024

eguzki added kind/epic Master issue tracking broken down work size/large labels Apr 10, 2024

guicassolato mentioned this issue Apr 10, 2024

RLP atomic override #523

Merged

eguzki mentioned this issue Apr 11, 2024

Rate limiting limits controller #527

Closed

guicassolato mentioned this issue Apr 18, 2024

AuthPolicy Atomic Overrides #525

Merged

Boomatang mentioned this issue Jun 26, 2024

Kuadrant controller to DAG #714

Closed

This was referenced Jul 9, 2024

enhanced gateway api topology #751

Merged

base reconciler reconciles status #752

Merged

eguzki mentioned this issue Jul 23, 2024

Object types #777

Merged

guicassolato changed the title ~~GatewayAPI topology (DAG)~~ GatewayAPI topology (DAG 1.0) Aug 16, 2024

guicassolato mentioned this issue Aug 16, 2024

DAG for policy reconciliation Kuadrant/architecture#29

Closed

eguzki closed this as completed Sep 10, 2024

github-project-automation bot moved this from In Progress to Done in Kuadrant Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GatewayAPI topology (DAG 1.0) #530

GatewayAPI topology (DAG 1.0) #530

eguzki commented Apr 10, 2024 •

edited

Loading

guicassolato commented Apr 11, 2024 •

edited

Loading

eguzki commented Apr 11, 2024

guicassolato commented Apr 11, 2024

eguzki commented Sep 10, 2024

GatewayAPI topology (DAG 1.0) #530

GatewayAPI topology (DAG 1.0) #530

Comments

eguzki commented Apr 10, 2024 • edited Loading

The GatewayAPI Topology

guicassolato commented Apr 11, 2024 • edited Loading

eguzki commented Apr 11, 2024

guicassolato commented Apr 11, 2024

eguzki commented Sep 10, 2024

eguzki commented Apr 10, 2024 •

edited

Loading

guicassolato commented Apr 11, 2024 •

edited

Loading