feat(controller): decouple A2A handler registration from controller reconcilation #1138

onematchfox · 2025-11-26T13:37:51Z

Decided to split this out of #1133 to try make review a little easier as it's a chunky commit that can live in isolation of the rest of the changes in that PR

This change separates A2A handler registration from the main Agent controller reconciliation loop by introducing a dedicated A2ARegistrar that manages the A2A routing table independently from the main controller.

Currently, A2A handler registration is tightly coupled to the Agent controller's reconciliation loop, which performs the following operations:

Reconcile Kubernetes resources (Deployment, Service, etc.)
Store agent metadata in database
Register A2A handler in routing table
Update resource status

This coupling is problematic for a number of reasons:

Breaks horizontal scaling - with leader election enabled (required to prevent duplicate reconciliation), only the leader pod performs reconciliation and registers A2A handlers. When API requests hit non-leader replicas, they fail because those replicas lack the necessary handler registrations.
Could be argued that this violates separation of concerns - the controller handles both cluster resource management (its core responsibility) and API routing configuration (an orthogonal concern).
Makes future architectural changes (e.g., splitting API and control plane) unnecessarily complex.

This PR attempts to address those concerns ensuring that all controller replicas, when scaled, will maintain consistent A2A routing tables enabling transparent load balancing across replicas. A2A logic is also consolidated into a dedicated package rather than scattered across controller code ensuring a clean separation of API and control plane such that these could be split into independent deployments without significant refactoring in future.

Copilot

Pull request overview

This PR decouples A2A handler registration from the main Agent controller reconciliation loop by introducing a dedicated A2ARegistrar that watches Agent resources independently. This architectural change solves horizontal scaling issues where non-leader replicas lacked A2A handler registrations, causing API requests to fail.

Key Changes:

Removed A2A registration logic from the controller reconciliation loop
Introduced A2ARegistrar as a manager runnable that maintains A2A routing tables on all replicas using Kubernetes informers
Refactored agent card building logic into a reusable method in the translator

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
go/pkg/app/app.go	Replaced a2a reconciler initialization with A2ARegistrar setup, moving it outside leader election scope
go/internal/a2a/a2a_registrar.go	New component that watches Agent resources via informers and manages A2A handler registration on all replicas
go/internal/controller/a2a/a2a_reconciler.go	Removed - functionality moved to a2a_registrar.go
go/internal/controller/reconciler/reconciler.go	Removed A2A reconciliation logic and dependencies from the main reconciler
go/internal/controller/translator/agent/adk_api_translator.go	Extracted agent card building logic into reusable methods; added TranslateAgentCard interface method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

go/internal/a2a/a2a_registrar.go

EItanya · 2025-12-02T15:04:49Z

go/internal/controller/translator/agent/adk_api_translator.go

 	return ownedResources
 }

+func (a *adkApiTranslator) buildAgentCard(agent *v1alpha2.Agent) *server.AgentCard {


Why is this an interface method and not just a public function somewhere? It doesn't seem to use any items on the struct.

Good point. This method specifically could just be a private helper function. However as for "a public function somewhere" - I would say this functionality does belong in this file (as it does so already today) and should be exposed via the AdkApiTranslator since it translates from v1alpha2.Agent to server.AgentCard.

Happy to refactor - can easily make this a private helper function if thats what you were getting at. Otherwise what would you suggest?

In my opinion, methods are important when you want to either hide the implementation of an interface, or you need to use some long lived item in a struct. In this case it's neither so really it should just be a public utility function

Refactored in 8f51f11

EItanya · 2025-12-02T15:07:02Z

go/internal/a2a/a2a_registrar.go

+		return fmt.Errorf("failed to get cache informer: %w", err)
+	}
+
+	if _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{


Is there a reason you used the cache directly here instead of creating a Controller like the rest of the k8s watchers? I usually prefer consistency across the various watchers so the codebase is easier to grok, but definitely open to this if there's a good reason.

Yeah, I debated this a bit as well. I ended up just an informer implementation for a couple reasons.

Controllers come with a bunch of overhead that I think is unnecessary for this implementation. We don't need the reconciliation semantics that controllers are designed for. The registrar doesn't need to loop and update to conmverge on desired state; it just needs to react to add/update/delete events to maintain an in-memory routing table. Additionally we don't need all the other overhead that comes with a controller including predicates, owning/watching relationships, woirk queues, rate limiting, retries, etc.

I wanted to explicitly break from the existing controllers. This code needs to run on all controller replicas, hence the A2ARegistrar deliberately, and explicilty, returns false from NeedLeaderElection() - rather than making this configurable when using a Controller.

Finally, looking at it another way... if we were in future going to try and extract a kagent-api component, then it would feel very weird to me to be running a "Controller" within that component - but maybe that's just me 😆

Also.... having 2 AgentControllers in the same pod also feels really strange (and wrong).

I think these are all very good reasons!

This PR enables leader election on the controller if it is configured with one than 1 replica to ensure that only 1 replica is actively reconciling watched manifests. It also ensures that the necessary RBAC manifests are created. Final part of #1133 (excluding #1138). --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

Removed A2A "reconciler" and replaced with a registrar that is responsible for registering/deregistering mux handlers. Main controller still manages cluster resources and DB/status. Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

This PR enables leader election on the controller if it is configured with one than 1 replica to ensure that only 1 replica is actively reconciling watched manifests. It also ensures that the necessary RBAC manifests are created. Final part of kagent-dev#1133 (excluding kagent-dev#1138). --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

…econcilation (kagent-dev#1138) **Decided to split this out of kagent-dev#1133 to try make review a little easier as it's a chunky commit that can live in isolation of the rest of the changes in that PR** This change separates A2A handler registration from the main `Agent` controller reconciliation loop by introducing a dedicated `A2ARegistrar` that manages the A2A routing table independently from the main controller. Currently, A2A handler registration is tightly coupled to the `Agent` controller's reconciliation loop, which performs the following operations: 1. Reconcile Kubernetes resources (Deployment, Service, etc.) 2. Store agent metadata in database 3. Register A2A handler in routing table 4. Update resource status This coupling is problematic for a number of reasons: 1. Breaks horizontal scaling - with leader election enabled (required to prevent duplicate reconciliation), only the leader pod performs reconciliation and registers A2A handlers. When API requests hit non-leader replicas, they fail because those replicas lack the necessary handler registrations. 2. Could be argued that this violates separation of concerns - the controller handles both cluster resource management (its core responsibility) and API routing configuration (an orthogonal concern). 3. Makes future architectural changes (e.g., splitting API and control plane) unnecessarily complex. This PR attempts to address those concerns ensuring that all controller replicas, when scaled, will maintain consistent A2A routing tables enabling transparent load balancing across replicas. A2A logic is also consolidated into a dedicated package rather than scattered across controller code ensuring a clean separation of API and control plane such that these could be split into independent deployments without significant refactoring in future. --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Copilot AI review requested due to automatic review settings November 26, 2025 13:37

onematchfox requested review from EItanya, ilackarms and yuval-k as code owners November 26, 2025 13:37

Copilot started reviewing on behalf of onematchfox November 26, 2025 13:38 View session

Copilot finished reviewing on behalf of onematchfox November 26, 2025 13:41

Copilot AI reviewed Nov 26, 2025

View reviewed changes

go/internal/a2a/a2a_registrar.go Show resolved Hide resolved

go/internal/a2a/a2a_registrar.go Show resolved Hide resolved

go/internal/a2a/a2a_registrar.go Show resolved Hide resolved

onematchfox force-pushed the decouple-a2a-mux branch 2 times, most recently from 65d486f to e118dfc Compare December 2, 2025 08:57

EItanya reviewed Dec 2, 2025

View reviewed changes

onematchfox mentioned this pull request Dec 2, 2025

Enable leader election on controller when scaled #1146

Merged

onematchfox added 2 commits December 3, 2025 11:11

refactor: extract get agent card into public utility function

8f51f11

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

onematchfox force-pushed the decouple-a2a-mux branch from 610929d to 8f51f11 Compare December 3, 2025 10:11

EItanya approved these changes Dec 3, 2025

View reviewed changes

EItanya merged commit 420a603 into kagent-dev:main Dec 3, 2025
17 checks passed

onematchfox deleted the decouple-a2a-mux branch December 8, 2025 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(controller): decouple A2A handler registration from controller reconcilation #1138

feat(controller): decouple A2A handler registration from controller reconcilation #1138

Uh oh!

onematchfox commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EItanya Dec 2, 2025

Uh oh!

onematchfox Dec 2, 2025

Uh oh!

EItanya Dec 2, 2025

Uh oh!

onematchfox Dec 3, 2025

Uh oh!

EItanya Dec 2, 2025

Uh oh!

onematchfox Dec 2, 2025

Uh oh!

onematchfox Dec 2, 2025 •

edited

Loading

Uh oh!

EItanya Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(controller): decouple A2A handler registration from controller reconcilation #1138

feat(controller): decouple A2A handler registration from controller reconcilation #1138

Uh oh!

Conversation

onematchfox commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EItanya Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

onematchfox Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

EItanya Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

onematchfox Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

EItanya Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

onematchfox Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

onematchfox Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EItanya Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

onematchfox Dec 2, 2025 •

edited

Loading