Skip to content

Conversation

1egoman
Copy link
Contributor

@1egoman 1egoman commented Sep 15, 2025

This change comprises the new client agents sdk, a set of react hooks that are being built to make interaction with the livekit agents framework less complex.

This is version 3 - version 1 can be found here, and version 2 can be found here. Each step it has evolved significantly based on comments and perspectives from people who have taken a look!

Single file example

import { useEffect, useState } from "react";
import { Track, TokenSource } from "livekit-client";
import {
  useSession,
  useAgent,
  useSessionMessages,

  VideoTrack,
  StartAudio,
  RoomAudioRenderer,
  useMediaDeviceSelect,
  useTrackToggle,
} from "@livekit/components-react";

// From: https://github.com/livekit/client-sdk-js/pull/1645
const tokenSource = new TokenSource.SandboxTokenServer({ sandboxId: "xxx" });

export default function SinglePageDemo() {
  const session = useSession(tokenSource, { agentName: 'voice ai quickstart' });

  const agent = useAgent(session);

  // FIXME: still using the old local participant related hooks, so this isn't much simplier than in
  // the past. Eventually I think there needs to be something like a `useLocalTrack(session.local.camera)`
  // hook that abstracts over all this...
  const audioDevices = useMediaDeviceSelect({ kind: "audioinput", room: session.room });
  const microphoneTrack = useTrackToggle({ source: Track.Source.Microphone, room: session.room });
  const videoDevices = useMediaDeviceSelect({ kind: "videoinput", room: session.room });
  const cameraTrack = useTrackToggle({ source: Track.Source.Camera, room: session.room });

  const [started, setStarted] = useState(false);
  useEffect(() => {
    if (!started) {
      return;
    }
    session.start();
    return () => {
      session.end();
    };
  }, [started]);

  const { messages, send, isSending } = useSessionMessages(session);
  const [chatMessage, setChatMessage] = useState('');

  return (
    <div className="flex flex-col gap-4 p-4">
      <div className="flex items-center gap-4">
        <Button variant="primary" onClick={() => setStarted(s => !s)} disabled={session.connectionState === 'connecting'}>
          {session.isConnected ? 'Disconnect' : 'Connect'}
        </Button>
        <span>
          <strong className="mr-1">Statuses:</strong>
          {session.connectionState} / {agent.state ?? 'N/A'}
        </span>
      </div>

      {session.isConnected ? (
        <>
          <div className="border rounded bg-muted p-2">
            <Button onClick={() => cameraTrack.toggle()} disabled={cameraTrack.pending}>
              {cameraTrack.enabled ? 'Disable' : 'Enable'} local camera
            </Button>
            <Button onClick={() => microphoneTrack.toggle()} disabled={microphoneTrack.pending}>
              {microphoneTrack.enabled ? 'Mute' : 'Un mute'} local microphone
            </Button>
            <div>
              <p>Local camera sources:</p>
              {videoDevices.devices.map(item => (
                <li
                  key={item.deviceId}
                  onClick={() => videoDevices.setActiveMediaDevice(item.deviceId)}
                  style={{ color: item.deviceId === videoDevices.activeDeviceId ? 'red' : undefined }}
                >
                  {item.label}
                </li>
              ))}
            </div>
            <div>
              <p>Local microphone sources:</p>
              {audioDevices.devices.map(item => (
                <li
                  key={item.deviceId}
                  onClick={() => audioDevices.setActiveMediaDevice(item.deviceId)}
                  style={{ color: item.deviceId === audioDevices.activeDeviceId ? 'red' : undefined }}
                >
                  {item.label}
                </li>
              ))}
            </div>
          </div>

          <div>
            {session.local.camera.publication ? (
              <VideoTrack trackRef={session.local.camera} />
            ) : null}
            {agent.camera ? (
              <VideoTrack trackRef={agent.camera} />
            ) : null}
          </div>

          <ul>
            {messages.map(receivedMessage => (
              <li key={receivedMessage.id}>{receivedMessage.message}</li>
            ))}
            <li className="flex items-center gap-1">
              <input
                type="text"
                value={chatMessage}
                onChange={e => setChatMessage(e.target.value)}
                className="border border-2"
              />
              <Button
                variant="secondary"
                disabled={isSending}
                onClick={() => {
                  send(chatMessage);
                  setChatMessage('');
                }}
              >{isSending ? 'Sending' : 'Send'}</Button>
            </li>
          </ul>
        </>
      ) : null}

      <StartAudio label="Start audio" />
      <RoomAudioRenderer room={session.room} />
    </div>
  );
}

New API surface area

  • useSession(tokenSource: TokenSourceFixed | TokenSourceConfigurable, options: UseSessionConfigurableOptions | UseSessionFixedOptions): UseSessionReturn
    A thin wrapper around a Room which handles connecting to a room and dispatching a given agent into that room (or in the future, maybe multiple agents?). In the future it will probably become thicker as more global agent state is required.
const tokenSource: TokenSourceConfigurable = /* ... */;

const session = useSession(tokenSource, {
  // NOTE: either `room` can be a property here, or if not specified, it reads `room` from `RoomContext`, or if that can't be found, finally falls back to just making a new room
  tokenSource,
  tokenFetchOptions: { agentName: 'agent name to dispatch' },
});

useEffect(() => {
  session.begin();
  return () => {
    session.end();
  };
}, [session]);
  • useAgent(session: UseSessionReturn): Agent
    A much more advanced version of the previously existing useVoiceAssistant hook - tracks the agent's state within the conversation, manages agent connection timeouts / other failures, and largely maintains backwards compatibility with existing interfaces.
const agent = useAgent(session);

// Log agent connection errors
useEffect(() => {
  if (agent.state === "failed") {
    console.error(`Error connecting to agent: ${agent.failureReasons.join(", ")}`);
  }
}, [agent]);

// later on, in a component:
<VideoTrack trackRef={agent.camera} /> 
  • useSessionMessages
    A mechanism for interacting with ReceivedMessages across the whole conversation. A ReceivedMessage can be a ReceivedChatMessage (already exists today), or a ReceivedUserTranscriptionMessage / ReceivedAgentTranscriptionMessage (both brand new). This is exposed at the conversation level so in a future world where multiple agents are within a conversation, this hook will return messages from all of them
const { messages, isSending, send } = useSessionMessages(session);
// NOTE: send / isSending are proxies of the existing interface returned by `useChat`

// later on, in a component:
<ul>
  {messages.map(receivedMessage => (
    <li key={receivedMessage.id}>{receivedMessage.from.name}: {receivedMessage.message}</li>
  )}
</ul>

Additional refactoring / cleanup

  • Added a new ParticipantAgentAttributes constant and ported all usages of lk.-prefixed attributes (which previously were just magic strings in the code) to refer to this enum.
  • Fixed type error in handleMediaDeviceError callback function in useLiveKitRoom
  • Added support for explicit room parameter to a few hooks and components that didn't support it previously, to make single file example type scenarios easier:
    • RoomAudioRenderer
    • StartAudio
    • useChat
    • useTextStream
    • useTrackToggle
    • useTranscriptions

Copy link

changeset-bot bot commented Sep 15, 2025

⚠️ No Changeset found

Latest commit: 8c9cba0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@1egoman 1egoman force-pushed the agent-sdk branch 3 times, most recently from ce80b15 to ef0fed7 Compare September 17, 2025 20:44
Comment on lines 8 to 39
type ReceivedMessageWithType<
Type extends string,
Metadata extends {} = {},
> = {
id: string;
timestamp: number;

type: Type;

from?: Participant;
attributes?: Record<string, string>;
} & Metadata;

/** @public */
export type ReceivedChatMessage = ReceivedMessageWithType<'chatMessage', ChatMessage & {
from?: Participant;
attributes?: Record<string, string>;
}>;

export type ReceivedUserTranscriptionMessage = ReceivedMessageWithType<'userTranscript', {
message: string;
}>;

export type ReceivedAgentTranscriptionMessage = ReceivedMessageWithType<'agentTranscript', {
message: string;
}>;

/** @public */
export type ReceivedMessage =
| ReceivedUserTranscriptionMessage
| ReceivedAgentTranscriptionMessage
| ReceivedChatMessage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported the existing ReceivedMessage abstraction from here on top of the pre-existing ReceivedChatMessage - this means that now,ReceivedChatMessage is now a ReceivedMessage subtype.

Note ReceivedChatMessage has one new type field addition which acts as the discriminant key in ReceivedMessage, but otherwise is identical. So this should be a fully backwards compatible change even though behind the scenes a lot has been updated.

@1egoman 1egoman changed the title [WIP] Agent SDK - ported on top of components-js primatives Agent SDK - ported on top of components-js primatives Sep 17, 2025
@1egoman 1egoman marked this pull request as ready for review September 17, 2025 20:53
@1egoman 1egoman requested review from lukasIO and pblazej September 17, 2025 20:53
@lukasIO
Copy link
Contributor

lukasIO commented Sep 18, 2025

const audioDevices = useMediaDeviceSelect({ kind: "audioinput", room: conversation.subtle.room });
const microphoneTrack = useTrackToggle({ source: Track.Source.Microphone, room: conversation.subtle.room });
const videoDevices = useMediaDeviceSelect({ kind: "videoinput", room: conversation.subtle.room });
const cameraTrack = useTrackToggle({ source: Track.Source.Camera, room: conversation.subtle.room });

how about proxying some of these on the return value of useConversation ?

@1egoman
Copy link
Contributor Author

1egoman commented Sep 18, 2025

how about proxying some of these on the return value of useConversation ?

I opted to leave that out for now because of the hesitancy from ben/dz around new track abstractions. That being said, I mentioned in the comment above that I had been proposing a new hook, useLocalTrack, which would take in a track reference returned from other abstractions (conversation, agent, or even other non agent related abstractions), but it also could live underneath the conversation too.

It sounds like you are pushing for that to exist now vs deferring it? If so I can add that new hook to this branch or possibly figure out how to fit it into conversation.

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i like the direction where this is going.

1egoman added 14 commits October 1, 2025 11:14
This allows functions to limit whether they just want to take in
TrackReferences from a given source - ie, the VideoTrack could be made
to only accept TrackReference<Track.Source.Camera | Track.Source.Screenshare | Track.Source.Unknown>.
Note that just the return values are changing, not the argument
definitions in other spots, so this shouldn't be a backwards
compatibility issue.
The pre-existing state was broken.
…s going to be important for future multi-agent type use cases
@1egoman
Copy link
Contributor Author

1egoman commented Oct 7, 2025

Now that livekit/client-sdk-js#1645 has been merged, this pull request has been updated to build on top of it.

Some other recent decisions worth noting down:

  • We've decided to hold off on integrating TokenSource <=> Room together and put that glue logic at the Session level. (more info here)
  • For now useSession will be the web's Session implementation with a longer term plan to backport that into the core js sdk as a Session class.
  • useSession in the long term is planned to replace LiveKitRoom / useLiveKitRoom, but at least for now they will both exist in parallel and we will revisit this at the future (maybe the next major version).
  • There's going to be an explicit SessionContext now, and a new SessionProvider component to inject it as well as RoomContext into the react tree. For example:
function OtherComponent() {
  const messages = useSessionMessages();
  // ... use messages for something
  return null;
}

function Main() {
  const session = useSession(...);
  
  return (
    <SessionProvider session={session}>
      {/* example component that uses "session" from context */}
      <OtherComponent />
    
      {/* example component that uses "room" from context */}
      <RoomAudioRenderer />
    </SessionProvider>
  );
}

Copy link
Contributor

github-actions bot commented Oct 7, 2025

size-limit report 📦

Path Size
LiveKitRoom only 6 KB (+0.1% 🔺)
LiveKitRoom with VideoConference 32.23 KB (+5.36% 🔺)
All exports 42.53 KB (+10.19% 🔺)

…rnings

Warning: src/hooks/useAgent.ts:31:18 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:31:31 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:34:18 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:34:34 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:37:18 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:37:44 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:40:20 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:40:34 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
Warning: src/hooks/useAgent.ts:40:50 - (tsdoc-escape-greater-than) The ">" character should be escaped using a backslash to avoid confusion with an HTML tag
I need to include the useSession docstring twice, apparently?
Copy link

@xianshijing-lk xianshijing-lk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, definitely should wait for @lukasIO review.

id: transcription.streamInfo.id,
timestamp: transcription.streamInfo.timestamp,
from: Array.from(room.remoteParticipants.values()).find(
(p) => p.identity === transcription.participantInfo.identity,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if Array.from(room.remoteParticipants.values()).find() returns undefined here ?
want to make sure we handle this use case in an expected way.

Copy link
Contributor Author

@1egoman 1egoman Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - the from property on ReceivedMessageWithType (this data's type) is optional, so I think this is fine:

type ReceivedMessageWithType<Type extends string, Metadata extends object = object> = {
  id: string;
  timestamp: number;

  type: Type;

  from?: Participant;
  attributes?: Record<string, string>;
} & Metadata;

That being said, the default: case in this switch is out of the happy path and hasn't been deeply thought through (see the FIXME comment above it), so it's possible it might be a better idea to throw an error or something instead of trying to fail gracefully and (maybe) resulting in incorrect data.

const room = useRoomContext();
const speakerObserver = React.useMemo(() => activeSpeakerObserver(room), [room]);
const activeSpeakers = useObservableState(speakerObserver, room.activeSpeakers);
export function useSpeakingParticipants(room?: Room) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to avoid passing an optional room to all hooks.

For components it's not a big deal as the props are objects anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about this in slack a bit - this is being done in support of the single file example use case where creating and managing a context is burdensome. Assuming this continues to be something worth prioritizing, I'm not sure exactly how else to accomplish this other than adding an explicit room or session prop / option to both components and hooks.

Maybe a compromise could be introducing an options parameter rather than an explicit new parameter in cases where a brand new parameter would be required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this continues to be something worth prioritizing,

assuming "single-component" is a goal (single file would still be possible with contexts and multiple components defined within), I definitely prefer an options object over explicitly passing the room, yeah.

Longer term I think it might be worth discussing migrating away from
that pattern since react won't be tree-shaken properly in end projects
This was causing prepareConnection to get run constantly since its
reference depends on restOptions which changes every call.

The way I see it, if a user is changing their options and wants
prepareconnection to be run for something beyond the initial set
(probably unusual), they can call it themselves on the returned
session object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants