-
Notifications
You must be signed in to change notification settings - Fork 401
feat(controller): decouple A2A handler registration from controller reconcilation #1138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR decouples A2A handler registration from the main Agent controller reconciliation loop by introducing a dedicated A2ARegistrar that watches Agent resources independently. This architectural change solves horizontal scaling issues where non-leader replicas lacked A2A handler registrations, causing API requests to fail.
Key Changes:
- Removed A2A registration logic from the controller reconciliation loop
- Introduced
A2ARegistraras a manager runnable that maintains A2A routing tables on all replicas using Kubernetes informers - Refactored agent card building logic into a reusable method in the translator
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| go/pkg/app/app.go | Replaced a2a reconciler initialization with A2ARegistrar setup, moving it outside leader election scope |
| go/internal/a2a/a2a_registrar.go | New component that watches Agent resources via informers and manages A2A handler registration on all replicas |
| go/internal/controller/a2a/a2a_reconciler.go | Removed - functionality moved to a2a_registrar.go |
| go/internal/controller/reconciler/reconciler.go | Removed A2A reconciliation logic and dependencies from the main reconciler |
| go/internal/controller/translator/agent/adk_api_translator.go | Extracted agent card building logic into reusable methods; added TranslateAgentCard interface method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
65d486f to
e118dfc
Compare
| return ownedResources | ||
| } | ||
|
|
||
| func (a *adkApiTranslator) buildAgentCard(agent *v1alpha2.Agent) *server.AgentCard { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an interface method and not just a public function somewhere? It doesn't seem to use any items on the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This method specifically could just be a private helper function. However as for "a public function somewhere" - I would say this functionality does belong in this file (as it does so already today) and should be exposed via the AdkApiTranslator since it translates from v1alpha2.Agent to server.AgentCard.
Happy to refactor - can easily make this a private helper function if thats what you were getting at. Otherwise what would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, methods are important when you want to either hide the implementation of an interface, or you need to use some long lived item in a struct. In this case it's neither so really it should just be a public utility function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored in 8f51f11
| return fmt.Errorf("failed to get cache informer: %w", err) | ||
| } | ||
|
|
||
| if _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you used the cache directly here instead of creating a Controller like the rest of the k8s watchers? I usually prefer consistency across the various watchers so the codebase is easier to grok, but definitely open to this if there's a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I debated this a bit as well. I ended up just an informer implementation for a couple reasons.
Controllerscome with a bunch of overhead that I think is unnecessary for this implementation. We don't need the reconciliation semantics that controllers are designed for. The registrar doesn't need to loop and update to conmverge on desired state; it just needs to react to add/update/delete events to maintain an in-memory routing table. Additionally we don't need all the other overhead that comes with a controller including predicates, owning/watching relationships, woirk queues, rate limiting, retries, etc.- I wanted to explicitly break from the existing controllers. This code needs to run on all controller replicas, hence the
A2ARegistrardeliberately, and explicilty, returnsfalsefromNeedLeaderElection()- rather than making this configurable when using aController.
Finally, looking at it another way... if we were in future going to try and extract a kagent-api component, then it would feel very weird to me to be running a "Controller" within that component - but maybe that's just me 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also.... having 2 AgentControllers in the same pod also feels really strange (and wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are all very good reasons!
This PR enables leader election on the controller if it is configured with one than 1 replica to ensure that only 1 replica is actively reconciling watched manifests. It also ensures that the necessary RBAC manifests are created. Final part of #1133 (excluding #1138). --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Removed A2A "reconciler" and replaced with a registrar that is responsible for registering/deregistering mux handlers. Main controller still manages cluster resources and DB/status. Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
610929d to
8f51f11
Compare
This PR enables leader election on the controller if it is configured with one than 1 replica to ensure that only 1 replica is actively reconciling watched manifests. It also ensures that the necessary RBAC manifests are created. Final part of kagent-dev#1133 (excluding kagent-dev#1138). --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
…econcilation (kagent-dev#1138) **Decided to split this out of kagent-dev#1133 to try make review a little easier as it's a chunky commit that can live in isolation of the rest of the changes in that PR** This change separates A2A handler registration from the main `Agent` controller reconciliation loop by introducing a dedicated `A2ARegistrar` that manages the A2A routing table independently from the main controller. Currently, A2A handler registration is tightly coupled to the `Agent` controller's reconciliation loop, which performs the following operations: 1. Reconcile Kubernetes resources (Deployment, Service, etc.) 2. Store agent metadata in database 3. Register A2A handler in routing table 4. Update resource status This coupling is problematic for a number of reasons: 1. Breaks horizontal scaling - with leader election enabled (required to prevent duplicate reconciliation), only the leader pod performs reconciliation and registers A2A handlers. When API requests hit non-leader replicas, they fail because those replicas lack the necessary handler registrations. 2. Could be argued that this violates separation of concerns - the controller handles both cluster resource management (its core responsibility) and API routing configuration (an orthogonal concern). 3. Makes future architectural changes (e.g., splitting API and control plane) unnecessarily complex. This PR attempts to address those concerns ensuring that all controller replicas, when scaled, will maintain consistent A2A routing tables enabling transparent load balancing across replicas. A2A logic is also consolidated into a dedicated package rather than scattered across controller code ensuring a clean separation of API and control plane such that these could be split into independent deployments without significant refactoring in future. --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
Decided to split this out of #1133 to try make review a little easier as it's a chunky commit that can live in isolation of the rest of the changes in that PR
This change separates A2A handler registration from the main
Agentcontroller reconciliation loop by introducing a dedicatedA2ARegistrarthat manages the A2A routing table independently from the main controller.Currently, A2A handler registration is tightly coupled to the
Agentcontroller's reconciliation loop, which performs the following operations:This coupling is problematic for a number of reasons:
This PR attempts to address those concerns ensuring that all controller replicas, when scaled, will maintain consistent A2A routing tables enabling transparent load balancing across replicas. A2A logic is also consolidated into a dedicated package rather than scattered across controller code ensuring a clean separation of API and control plane such that these could be split into independent deployments without significant refactoring in future.