Skip to content

AssemblyAI STT Model Name Clarification: LiveKit Inference vs Direct Plugin #4459

@cdutr

Description

@cdutr

Bug Description

There's a model name discrepancy between LiveKit Inference (gateway) and the Direct AssemblyAI Plugin that creates uncertainty about correctness and best practices.

My main concern is that in the backstage, the server is calling a deprecated version of Assembly STT, since I noticed a deterioration in performance when calling "assemblyai/universal-streaming:en" vs "universal-streaming-english".

LiveKit Inference (string format):

stt="assemblyai/universal-streaming:en"

Model: universal-streaming (no -english suffix)

Direct Plugin:

from livekit.plugins import assemblyai
stt = assemblyai.STT(model="universal-streaming-english")

Model: universal-streaming-english (with -english suffix)

AssemblyAI's actual API expects:

  • Parameter: speech_model
  • Values: "universal-streaming-english" or "universal-streaming-multilingual"

The Problem: Server-Side Black Box

The LiveKit Inference gateway (https://agent-gateway.livekit.cloud/v1) is a closed-source cloud service. When developers use:

stt="assemblyai/universal-streaming:en"

The SDK sends to the gateway (stt.py:554):

{
  "model": "assemblyai/universal-streaming",
  "settings": {"language": "en"}
}

The gateway must then:

  1. Translate "universal-streaming""universal-streaming-english"
  2. Map to AssemblyAI's API format: speech_model="universal-streaming-english"

We cannot verify this translation from the SDK code because it happens server-side.

Questions

  1. Does the gateway correctly translate "universal-streaming""universal-streaming-english"?
  2. Is using the gateway recommended over the direct plugin for production?
  3. Should documentation clarify this translation layer exists?

Issues Found

1. Missing Model in Type Literal

stt.py:44-47:

AssemblyAIModels = Literal[
    "assemblyai",
    "assemblyai/universal-streaming",
    # Missing: "assemblyai/universal-streaming-multilingual"
]

2. Hardcoded Language in Direct Plugin

assemblyai/stt.py:389:

stt.SpeechData(
    language="en",  # Hardcoded, ignores language parameter
    text=interim_text,
)

The stream(language=...) parameter is accepted but never used in the response data.

Proposed Solutions

  1. Documentation: Explain how the gateway translates model names and when to use gateway vs direct plugin
  2. Type Hints: Add "assemblyai/universal-streaming-multilingual" to AssemblyAIModels
  3. Direct Plugin: Use the language parameter instead of hardcoding "en"

References


Environment: LiveKit Agents latest (commit 0cc77447), Python 3.11+

Thanks for building this excellent framework! Would appreciate clarification on the gateway translation behavior and recommended production approach.

Expected Behavior

Clear documentation on how LiveKit Inference gateway translates "assemblyai/universal-streaming" to AssemblyAI's API format "universal-streaming-english", and guidance on when to use gateway vs direct plugin.

Reproduction Steps

1. Use `stt="assemblyai/universal-streaming:en"` in AgentSession
2. Observe it works, but cannot verify server-side translation
3. Compare with direct plugin using `assemblyai.STT(model="universal-streaming-english")`
4. Note the model name mismatch between approaches

Operating System

Linux (WSL2)

Models Used

No response

Package Versions

- livekit-agents: latest (commit 0cc77447)
- livekit-plugins-assemblyai: latest
- Python: 3.11+

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions