Skip to content

Conversation

@onematchfox
Copy link
Contributor

Decided to split this out of #1133 to try make review a little easier as it's a chunky commit that can live in isolation of the rest of the changes in that PR

This change separates A2A handler registration from the main Agent controller reconciliation loop by introducing a dedicated A2ARegistrar that manages the A2A routing table independently from the main controller.

Currently, A2A handler registration is tightly coupled to the Agent controller's reconciliation loop, which performs the following operations:

  1. Reconcile Kubernetes resources (Deployment, Service, etc.)
  2. Store agent metadata in database
  3. Register A2A handler in routing table
  4. Update resource status

This coupling is problematic for a number of reasons:

  1. Breaks horizontal scaling - with leader election enabled (required to prevent duplicate reconciliation), only the leader pod performs reconciliation and registers A2A handlers. When API requests hit non-leader replicas, they fail because those replicas lack the necessary handler registrations.
  2. Could be argued that this violates separation of concerns - the controller handles both cluster resource management (its core responsibility) and API routing configuration (an orthogonal concern).
  3. Makes future architectural changes (e.g., splitting API and control plane) unnecessarily complex.

This PR attempts to address those concerns ensuring that all controller replicas, when scaled, will maintain consistent A2A routing tables enabling transparent load balancing across replicas. A2A logic is also consolidated into a dedicated package rather than scattered across controller code ensuring a clean separation of API and control plane such that these could be split into independent deployments without significant refactoring in future.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR decouples A2A handler registration from the main Agent controller reconciliation loop by introducing a dedicated A2ARegistrar that watches Agent resources independently. This architectural change solves horizontal scaling issues where non-leader replicas lacked A2A handler registrations, causing API requests to fail.

Key Changes:

  • Removed A2A registration logic from the controller reconciliation loop
  • Introduced A2ARegistrar as a manager runnable that maintains A2A routing tables on all replicas using Kubernetes informers
  • Refactored agent card building logic into a reusable method in the translator

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
go/pkg/app/app.go Replaced a2a reconciler initialization with A2ARegistrar setup, moving it outside leader election scope
go/internal/a2a/a2a_registrar.go New component that watches Agent resources via informers and manages A2A handler registration on all replicas
go/internal/controller/a2a/a2a_reconciler.go Removed - functionality moved to a2a_registrar.go
go/internal/controller/reconciler/reconciler.go Removed A2A reconciliation logic and dependencies from the main reconciler
go/internal/controller/translator/agent/adk_api_translator.go Extracted agent card building logic into reusable methods; added TranslateAgentCard interface method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@onematchfox onematchfox force-pushed the decouple-a2a-mux branch 2 times, most recently from 65d486f to e118dfc Compare December 2, 2025 08:57
return ownedResources
}

func (a *adkApiTranslator) buildAgentCard(agent *v1alpha2.Agent) *server.AgentCard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an interface method and not just a public function somewhere? It doesn't seem to use any items on the struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This method specifically could just be a private helper function. However as for "a public function somewhere" - I would say this functionality does belong in this file (as it does so already today) and should be exposed via the AdkApiTranslator since it translates from v1alpha2.Agent to server.AgentCard.

Happy to refactor - can easily make this a private helper function if thats what you were getting at. Otherwise what would you suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, methods are important when you want to either hide the implementation of an interface, or you need to use some long lived item in a struct. In this case it's neither so really it should just be a public utility function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in 8f51f11

return fmt.Errorf("failed to get cache informer: %w", err)
}

if _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you used the cache directly here instead of creating a Controller like the rest of the k8s watchers? I usually prefer consistency across the various watchers so the codebase is easier to grok, but definitely open to this if there's a good reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I debated this a bit as well. I ended up just an informer implementation for a couple reasons.

  1. Controllers come with a bunch of overhead that I think is unnecessary for this implementation. We don't need the reconciliation semantics that controllers are designed for. The registrar doesn't need to loop and update to conmverge on desired state; it just needs to react to add/update/delete events to maintain an in-memory routing table. Additionally we don't need all the other overhead that comes with a controller including predicates, owning/watching relationships, woirk queues, rate limiting, retries, etc.
  2. I wanted to explicitly break from the existing controllers. This code needs to run on all controller replicas, hence the A2ARegistrar deliberately, and explicilty, returns false from NeedLeaderElection() - rather than making this configurable when using a Controller.

Finally, looking at it another way... if we were in future going to try and extract a kagent-api component, then it would feel very weird to me to be running a "Controller" within that component - but maybe that's just me 😆

Copy link
Contributor Author

@onematchfox onematchfox Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also.... having 2 AgentControllers in the same pod also feels really strange (and wrong).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are all very good reasons!

EItanya pushed a commit that referenced this pull request Dec 2, 2025
This PR enables leader election on the controller if it is configured
with one than 1 replica to ensure that only 1 replica is actively
reconciling watched manifests. It also ensures that the necessary RBAC
manifests are created.

Final part of #1133 (excluding #1138).

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Removed A2A "reconciler" and replaced with a registrar that is responsible for registering/deregistering mux handlers. Main controller still manages cluster resources and DB/status.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@EItanya EItanya merged commit 420a603 into kagent-dev:main Dec 3, 2025
17 checks passed
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
This PR enables leader election on the controller if it is configured
with one than 1 replica to ensure that only 1 replica is actively
reconciling watched manifests. It also ensures that the necessary RBAC
manifests are created.

Final part of kagent-dev#1133 (excluding kagent-dev#1138).

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
…econcilation (kagent-dev#1138)

**Decided to split this out of
kagent-dev#1133 to try make review a
little easier as it's a chunky commit that can live in isolation of the
rest of the changes in that PR**

This change separates A2A handler registration from the main `Agent`
controller reconciliation loop by introducing a dedicated `A2ARegistrar`
that manages the A2A routing table independently from the main
controller.

Currently, A2A handler registration is tightly coupled to the `Agent`
controller's reconciliation loop, which performs the following
operations:
1. Reconcile Kubernetes resources (Deployment, Service, etc.)
2. Store agent metadata in database
3. Register A2A handler in routing table
4. Update resource status

This coupling is problematic for a number of reasons:
1. Breaks horizontal scaling - with leader election enabled (required to
prevent duplicate reconciliation), only the leader pod performs
reconciliation and registers A2A handlers. When API requests hit
non-leader replicas, they fail because those replicas lack the necessary
handler registrations.
2. Could be argued that this violates separation of concerns - the
controller handles both cluster resource management (its core
responsibility) and API routing configuration (an orthogonal concern).
3. Makes future architectural changes (e.g., splitting API and control
plane) unnecessarily complex.

This PR attempts to address those concerns ensuring that all controller
replicas, when scaled, will maintain consistent A2A routing tables
enabling transparent load balancing across replicas. A2A logic is also
consolidated into a dedicated package rather than scattered across
controller code ensuring a clean separation of API and control plane
such that these could be split into independent deployments without
significant refactoring in future.

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
@onematchfox onematchfox deleted the decouple-a2a-mux branch December 8, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants