-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Bug Description
When initializing the Azure STT plugin with multiple languages (e.g., language=["en-US", "fr-FR"]), the plugin creates an AutoDetectSourceLanguageConfig but fails to explicitly set the SpeechServiceConnection_LanguageIdMode property to Continuous.
According to Microsoft Azure documentation, if this property is omitted, the service defaults to AtStart mode. This means the language is only identified within the first few seconds of audio and never updates, even if the speaker switches languages later in the stream.
Expected Behavior
When multiple languages are provided in the configuration, the plugin should enable Continuous language identification. This ensures that the STT engine actively detects language changes throughout the streaming session.
Reproduction Steps
1. Initialize the Azure STT plugin with multiple languages:
stt = AzureSTT(
speech_key="...",
speech_region="...",
language=["en-US", "fr-FR", "de-DE"]
)
2. Start a streaming recognition session.
3. Speak a sentence in the first language (e.g., English).
4. Wait a moment, then speak a sentence in the second language (e.g., French).
5. **Observation:** The second sentence is transcribed as if it were English (or garbled), and the detected language property does not update.Operating System
Windows 11
Models Used
Azure STT
Package Versions
livekit-agents[azure,silero]>=1.3.10Session/Room/Call IDs
No response
Proposed Solution
Update _create_speech_recognizer in livekit/plugins/azure/stt.py to set the LanguageIdMode property when multiple languages are present.
if config.language and len(config.language) > 1:
# Fix: Enable Continuous Language ID
speech_config.set_property(
speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous"
)
kwargs["auto_detect_source_language_config"] = (
speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=config.language)
)Additional Context
Reference: Azure Speech Service Language Identification Docs
Screenshots and Recordings
No response