Skip to content

Azure STT: Continuous Language Identification not enabled for multiple languages #4449

@MSameerAbbas

Description

@MSameerAbbas

Bug Description

When initializing the Azure STT plugin with multiple languages (e.g., language=["en-US", "fr-FR"]), the plugin creates an AutoDetectSourceLanguageConfig but fails to explicitly set the SpeechServiceConnection_LanguageIdMode property to Continuous.

According to Microsoft Azure documentation, if this property is omitted, the service defaults to AtStart mode. This means the language is only identified within the first few seconds of audio and never updates, even if the speaker switches languages later in the stream.

Expected Behavior

When multiple languages are provided in the configuration, the plugin should enable Continuous language identification. This ensures that the STT engine actively detects language changes throughout the streaming session.

Reproduction Steps

1. Initialize the Azure STT plugin with multiple languages:
   
   stt = AzureSTT(
       speech_key="...",
       speech_region="...",
       language=["en-US", "fr-FR", "de-DE"]
   )
   
2. Start a streaming recognition session.
3. Speak a sentence in the first language (e.g., English).
4. Wait a moment, then speak a sentence in the second language (e.g., French).
5. **Observation:** The second sentence is transcribed as if it were English (or garbled), and the detected language property does not update.

Operating System

Windows 11

Models Used

Azure STT

Package Versions

livekit-agents[azure,silero]>=1.3.10

Session/Room/Call IDs

No response

Proposed Solution

Update _create_speech_recognizer in livekit/plugins/azure/stt.py to set the LanguageIdMode property when multiple languages are present.

    if config.language and len(config.language) > 1:
        # Fix: Enable Continuous Language ID
        speech_config.set_property(
            speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous"
        )
        kwargs["auto_detect_source_language_config"] = (
            speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=config.language)
        )

Additional Context

Reference: Azure Speech Service Language Identification Docs

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions