Skip to content

Agent unable to handoff from realtime model back to text-based LLM #4691

@HubertChen

Description

@HubertChen

Bug Description

I have two agents:

  1. Manager (stt -> llm -> tts)
  2. Worker (realtime model)

The Manager will start the call then handoff to the Worker via a tool call. The Worker completes its work, then attempts to handoff back to the Manager via a tool call. The bug is that the handoff back to the Manager never completes and the call continues with the Worker.

Expected Behavior

The Worker should be able to handoff back to the Manager, and the Manager should continue the conversation.

Reproduction Steps

class ManagerAgent(Agent):
    def __init__(
        self,
    ) -> None:
        super().__init__(
            instructions="You are John,a manager at a restaurant",
        )

    async def on_enter(self) -> None:
        logger.info("ManagerAgent on_enter called")
        await self.session.generate_reply(instructions="Say hi")
        logger.info("ManagerAgent done")

    @function_tool()
    async def delegate(self) -> Agent:
        logger.info("ManagerAgent delegate called")
        return WorkerAgent(self)


class WorkerAgent(Agent):
    def __init__(
        self,
        manager_agent: ManagerAgent,
    ) -> None:
        self._manager_agent = manager_agent
        super().__init__(
            instructions="You are Jane, a worker at a restaurant.",
            llm=google.realtime.RealtimeModel(
                model="gemini-live-2.5-flash-native-audio",
                voice="Erinome",
                # proactivity=True,
                vertexai=True,
            ),
        )

    async def on_enter(self) -> None:
        logger.info("WorkerAgent on_enter called")
        await self.session.generate_reply(instructions="Say hi")
        logger.info("WorkerAgent done")

    @function_tool()
    async def done(self) -> Agent:
        logger.info("WorkerAgent done called")
        return self._manager_agent

@server.rtc_session()
async def main(ctx: JobContext):
    session = AgentSession(
        stt=inference.STT(model="deepgram/nova-3-general"),
        llm=google.LLM(
            model="gemini-3-flash-preview", location="global", vertexai=True
        ),
        tts=elevenlabs.TTS(
            model="eleven_multilingual_v2",
            voice_id="UgBBYS2sOqTuMpoF3BR0"
        ),
    )

    await session.start(
        agent=ManagerAgent(),
        room=ctx.room,
    )

    await ctx.connect()


if __name__ == "__main__":
    cli.run_app(server)

Operating System

macOS Sequoia

Models Used

Deepgram Nova 3, Gemini 3 Flash, Gemini 2.5 Flash Native Audio, and ElevenLabs TTS

Package Versions

google-auth = v2.48.0
google-cloud-speech = v2.36.0
google-cloud-texttospeech = v2.34.0
google-genai = v1.61.0
livekit-agents = v1.3.12
livekit-plugins-elevenlabs = v1.3.12
livekit-plugins-google = v1.3.12
livekit-plugins-silero = v1.3.12
livekit-plugins-turn-detector = v1.3.12
livekit-protocol = v1.1.2

Session/Room/Call IDs

Room Session ID: RM_gsGqCGz9iCzf

The issue can also be reproduced by running uv run python main.py console.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions