Multi-Provider Transcription Support (REST/Batch Path) by Amygos · Pull Request #31 · nethesis/satellite

Amygos · 2026-02-16T09:08:57Z

This pull request introduces support for multiple transcription providers (Deepgram and VoxTral) in the /api/get_transcription endpoint, refactors the codebase to use a provider abstraction, and updates documentation and containerization to reflect these changes. The most important changes are grouped below.

Transcription Provider Abstraction & API Changes:

Refactored api.py to use a provider abstraction (get_provider from transcription/), enabling selection between Deepgram and VoxTral via the TRANSCRIPTION_PROVIDER environment variable or a per-request provider parameter. The endpoint is now provider-agnostic and handles errors/messages generically. [1] [2] [3] [4]
Added provider-specific post-processing for channel/speaker label renaming and improved error handling for provider failures and response parsing.

Containerization & Environment Variables:

Updated Containerfile to copy the transcription/ directory into the container and added new environment variables for TRANSCRIPTION_PROVIDER and MISTRAL_API_KEY (for VoxTral support). [1] [2] [3]

Documentation Updates:

Expanded README.md to document the new provider abstraction, configuration options for both Deepgram and VoxTral, and details on using the provider parameter in API requests. Example curl commands and parameter documentation were updated accordingly. [1] [2] [3]
Added a comprehensive .github/copilot-instructions.md describing the architecture, data flows, conventions, and testing practices for the project.

Testing Improvements:

Updated tests to patch the new provider abstraction and ensure correct mocking of provider responses, reflecting the new modular design. [1] [2]

These changes collectively make the transcription service more flexible, maintainable, and ready for multi-provider support.

Add provider abstraction to support multiple transcription backends. Introduce TranscriptionProvider base class and TranscriptionResult dataclass in new transcription package. This enables flexible provider selection and cleaner separation of concerns while maintaining backward compatibility with existing Deepgram integration. The abstraction supports: - Environment-based provider selection (TRANSCRIPTION_PROVIDER) - Per-request provider override via API parameters - Consistent interface across different backends

Move Deepgram-specific transcription logic from api.py into dedicated DeepgramProvider class. Preserve all existing functionality including parameter mapping, response parsing, and timeout handling. This maintains full backward compatibility while enabling other providers to follow the same pattern.

Replace hardcoded Deepgram logic with generic provider interface. Update get_transcription endpoint to support provider selection via environment variable or per-request parameter. Maintain all existing error handling, persistence, and enrichment workflows while enabling seamless provider switching.

Add 12 new unit tests covering provider factory, Deepgram provider, and VoxTral provider implementations. Tests verify correct API calls, response parsing, parameter handling, and error scenarios. Update existing api.py tests to work with new provider abstraction. Achieve 100% coverage for transcription module.

Update configuration documentation to explain provider selection via TRANSCRIPTION_PROVIDER environment variable. Add examples for using VoxTral provider and per-request provider override. Document required API keys for each provider and supported parameters.

Update copilot-instructions.md to reflect new transcription package and multi-provider support in REST/batch path. Note that real-time streaming path remains Deepgram-only for now. Include details about provider-agnostic post-processing like channel name replacement.

Add MISTRAL_API_KEY environment variable to Containerfile with empty default. Add TRANSCRIPTION_PROVIDER env var defaulting to deepgram. Maintain backward compatibility while enabling VoxTral usage in containerized deployments.

Ensure the transcription/ package is copied to the container image in both the builder and final stages of the multi-stage build. This allows the container to properly import the transcription provider abstraction when the application starts. Previously, the package was missing from the container, causing import errors when running the container. - Add COPY transcription /tmp/transcription to builder stage - Add COPY --from=builder /tmp/transcription /app/transcription to final stage - Maintains multi-stage build optimization for minimal image size

Copilot

Pull request overview

Adds a transcription provider abstraction to the REST/batch /api/get_transcription endpoint so callers can use either Deepgram or VoxTral (Mistral) via env var default or per-request override, and updates tests/docs/container packaging accordingly.

Changes:

Introduces transcription/ provider interface + factory (get_provider) with Deepgram and VoxTral REST implementations.
Refactors api.py to route transcription through the provider abstraction and apply provider-agnostic post-processing/error handling.
Updates tests, README, and container build to reflect multi-provider support and new configuration.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`api.py`	Refactors `/api/get_transcription` to select a provider via `get_provider()` and handle provider-agnostic processing/errors.
`transcription/base.py`	Defines the provider interface and `TranscriptionResult` DTO.
`transcription/__init__.py`	Adds the provider factory (`get_provider`) and exports the abstraction.
`transcription/deepgram.py`	Implements Deepgram REST transcription provider.
`transcription/voxtral.py`	Implements VoxTral (Mistral) REST transcription provider + diarization formatting.
`tests/test_api.py`	Updates API endpoint tests to patch the provider abstraction and Deepgram REST calls.
`tests/test_transcription.py`	Adds unit tests for provider factory and both provider implementations.
`README.md`	Documents provider selection (env + per-request) and provider-specific parameters.
`Containerfile`	Copies `transcription/` into the image and adds provider-related env vars.
`.github/copilot-instructions.md`	Documents architecture, conventions, and testing practices for the project.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T09:14:46Z

api.py

+        raw_transcription = raw_transcription.replace("Channel 0:", f"{channel0_name}:")
+        raw_transcription = raw_transcription.replace("Speaker 0:", f"{channel0_name}:")
+    if channel1_name:
+        raw_transcription = raw_transcription.replace("Channel 1:", f"{channel1_name}:")
+        raw_transcription = raw_transcription.replace("Speaker 1:", f"{channel1_name}:")


Channel/speaker renaming won’t catch Deepgram’s uppercase labels (e.g., "SPEAKER 0:" / "SPEAKER 1:"), so channel0_name / channel1_name won’t apply for the transcript format used in tests and likely production. Consider handling both "Speaker" and "SPEAKER" (or doing a case-insensitive replace / regex anchored on the label).

Suggested change

raw_transcription = raw_transcription.replace("Channel 0:", f"{channel0_name}:")

raw_transcription = raw_transcription.replace("Speaker 0:", f"{channel0_name}:")

if channel1_name:

raw_transcription = raw_transcription.replace("Channel 1:", f"{channel1_name}:")

raw_transcription = raw_transcription.replace("Speaker 1:", f"{channel1_name}:")

raw_transcription = raw_transcription.replace("Channel 0:", f"{channel0_name}:")

raw_transcription = raw_transcription.replace("CHANNEL 0:", f"{channel0_name}:")

raw_transcription = raw_transcription.replace("Speaker 0:", f"{channel0_name}:")

raw_transcription = raw_transcription.replace("SPEAKER 0:", f"{channel0_name}:")

if channel1_name:

raw_transcription = raw_transcription.replace("Channel 1:", f"{channel1_name}:")

raw_transcription = raw_transcription.replace("CHANNEL 1:", f"{channel1_name}:")

raw_transcription = raw_transcription.replace("Speaker 1:", f"{channel1_name}:")

raw_transcription = raw_transcription.replace("SPEAKER 1:", f"{channel1_name}:")

Copilot · 2026-02-16T09:14:46Z

api.py

            except Exception:
                logger.exception("Failed to update transcript state=failed")
-        raise HTTPException(status_code=500, detail="Failed to parse transcription response.")
+        raise HTTPException(status_code=500, detail=f"An error occurred: {str(e)}")


The generic exception handler returns the raw exception message to the client (detail=f"An error occurred: {str(e)}"), which can leak internal details. Prefer a generic client-facing message (and keep the full exception only in logs), similar to the earlier behavior.

Suggested change

raise HTTPException(status_code=500, detail=f"An error occurred: {str(e)}")

raise HTTPException(status_code=500, detail="Unexpected error while processing transcription")

Copilot · 2026-02-16T09:14:46Z

transcription/__init__.py

+def get_provider(name: str | None = None) -> TranscriptionProvider:
+    """
+    Get a transcription provider instance.
+
+    Args:
+        name: Provider name ("deepgram" or "voxtral"). If None, uses TRANSCRIPTION_PROVIDER env var.
+
+    Returns:
+        TranscriptionProvider instance
+
+    Raises:
+        ValueError: If provider name is unknown or required API key is missing
+    """
+    if name is None:
+        name = os.getenv("TRANSCRIPTION_PROVIDER", "deepgram").strip().lower()
+
+    if name == "deepgram":
+        api_key = os.getenv("DEEPGRAM_API_KEY", "").strip()
+        if not api_key:
+            raise ValueError("DEEPGRAM_API_KEY is required for Deepgram provider")
+        return DeepgramProvider(api_key=api_key)
+    elif name == "voxtral":
+        api_key = os.getenv("MISTRAL_API_KEY", "").strip()


get_provider() normalizes the env var value but doesn’t normalize an explicitly provided name, so values like "Deepgram"/" Voxtral " will be treated as unknown. Consider applying .strip().lower() to name before the provider switch for consistent behavior.

Copilot · 2026-02-16T09:14:47Z

transcription/voxtral.py

+
+        # VoxTral parameters
+        data = {
+            "model": params.get("model", "voxtral-mini-latest"),


model is set via params.get("model", "voxtral-mini-latest"); if the request passes model= (empty string), this will send an empty model to the API instead of defaulting. Prefer params.get("model") or "voxtral-mini-latest" (and similarly for other optional string params if applicable).

Suggested change

"model": params.get("model", "voxtral-mini-latest"),

"model": params.get("model") or "voxtral-mini-latest",

Copilot · 2026-02-16T09:14:47Z

transcription/deepgram.py

+
+            response.raise_for_status()
+
+        result = response.json()


response.json() can raise a JSON decode ValueError if the provider returns non-JSON (HTML error page, proxy, etc.). Consider wrapping JSON parsing in a try/except and raising a clearer ValueError (including a short body preview) so api.py returns a more actionable error message.

Suggested change

result = response.json()

try:

result = response.json()

except ValueError as e:

body_preview = response.text[:500] if response.text else ""

raise ValueError(

f"Failed to decode Deepgram JSON response "

f"(status {response.status_code}, "

f"content-type {response.headers.get('Content-Type')}): {e}; "

f"body preview: {body_preview!r}"

)

Copilot · 2026-02-16T09:14:47Z

transcription/voxtral.py

+            response.raise_for_status()
+
+        result = response.json()
+
+        # Parse VoxTral response
+        # Response format: { "text": "...", "language": "...", "segments": [...], "model": "..." }
+        raw_transcription = result.get("text", "").strip()


response.json() can raise a JSON decode ValueError if VoxTral returns non-JSON. Consider catching JSON parse errors here and raising a clearer ValueError (possibly with a short response body preview) so callers get a consistent, actionable parse-failure message.

VoXtral transcripts were missing speaker labels because diarization was opt-in, requiring explicit diarize=true parameter. This made the behavior inconsistent with user expectations and different from Deepgram's multichannel approach. Change diarization from opt-in to opt-out. Now enabled by default unless explicitly disabled with diarize=false. This ensures speaker information is always included in transcripts, matching the common use case for call transcription systems. The API response includes speaker fields in segments only when diarization is enabled. Without it, only plain text is returned with no way to distinguish between speakers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add test case to ensure diarization is enabled by default when no parameters are provided. This prevents regression of the speaker label functionality. The test verifies both that the API request includes diarize=True and that the formatted output contains speaker labels, covering the full code path from parameter handling to output formatting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

VoXtral API returns speaker information in the 'speaker_id' field (e.g., 'speaker_1', 'speaker_2'), not 'speaker'. The code was checking for the wrong field name, causing speaker labels to be omitted from transcripts even when diarization was enabled. Update the speaker detection logic to check for both 'speaker_id' (real API format) and 'speaker' (backward compatibility). Change the output format to display the speaker_id value directly instead of reformatting it, matching the API's naming convention. Discovered during testing with real Mistral API key and audio file. The API response clearly shows speaker_id field in segments when diarization is enabled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Amygos · 2026-02-17T11:01:17Z

There are some issues with the diarization, the speaker seems not to be labeled correctly.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-17T11:07:10Z

transcription/voxtral.py

+                lines.append(f"\n{speaker}: {text}")
+                last_speaker = speaker
+            else:
+                # Continue current speaker's text
+                if lines:
+                    lines.append(text)
+                else:


_format_diarized_transcript adds a leading \n to each new speaker line and then joins with \n, which creates extra blank lines in the output. Also, when the speaker doesn’t change, the code appends text as a new line rather than continuing the current speaker’s line, which doesn’t match the comment (“Continue current speaker’s text”). Consider removing the leading newline and concatenating/merging same-speaker segments onto the current line instead of adding a new line.

Suggested change

lines.append(f"\n{speaker}: {text}")

last_speaker = speaker

else:

# Continue current speaker's text

if lines:

lines.append(text)

else:

lines.append(f"{speaker}: {text}")

last_speaker = speaker

else:

# Continue current speaker's text on the same line

if lines:

# Merge with the last line instead of adding a new one

lines[-1] = f"{lines[-1]} {text}"

else:

# No existing line yet (e.g., missing speaker info)

Copilot · 2026-02-17T11:07:11Z

api.py

    uniqueid = (input_params.get("uniqueid") or "").strip()
    channel0_name = (input_params.get("channel0_name") or "").strip()
    channel1_name = (input_params.get("channel1_name") or "").strip()
+    provider_name = (input_params.get("provider") or "").strip().lower() or None


The provider request parameter introduces new behavior (per-request provider override), but there’s no API-level test exercising the provider=voxtral path. Adding a test that sets provider=voxtral and mocks transcription.voxtral.httpx.AsyncClient would validate provider selection + response handling end-to-end for the endpoint.

Copilot · 2026-02-17T11:07:11Z

README.md

+- `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:` or `Speaker 0:` / `Speaker 1:`)

-Deepgram parameters:
- Most Deepgram `/v1/listen` parameters may be provided as query/form fields and are passed through to Deepgram.
+Provider-specific parameters:
+- **Deepgram**: Most Deepgram `/v1/listen` parameters may be provided as query/form fields (e.g., `model`, `language`, `diarize`, `punctuate`)
+- **VoxTral**: Supports `model` (default: `voxtral-mini-latest`), `language`, `diarize`, `temperature`, `context_bias`, `timestamp_granularities`


Docs say channel0_name/channel1_name renames diarization labels by replacing Channel 0/1: or Speaker 0/1:. VoxTral currently formats diarization labels as speaker_1:/speaker_2: (lowercase + underscore), so these overrides won’t apply for VoxTral unless labels are normalized or the replacement logic is expanded. Please clarify provider-specific behavior here (and/or adjust the code) so users aren’t misled.

Copilot · 2026-02-17T11:07:11Z

Containerfile

+COPY transcription /tmp/transcription

 # Install dependencies
 RUN pip install --no-cache-dir --no-warn-script-location --user -r /tmp/requirements.txt


Container builds install only requirements.txt, but the new transcription providers (and api.py) import/use httpx directly. httpx is currently only listed in requirements-dev.txt, so the runtime image may rely on a transitive dependency (or fail if that transitive dep changes). Add httpx to requirements.txt to make the container/runtime dependency explicit.

Amygos added 8 commits February 13, 2026 16:47

build(docker): add Mistral API key configuration

492e498

Add MISTRAL_API_KEY environment variable to Containerfile with empty default. Add TRANSCRIPTION_PROVIDER env var defaulting to deepgram. Maintain backward compatibility while enabling VoxTral usage in containerized deployments.

Amygos requested review from Stell0 and Copilot February 16, 2026 09:08

Copilot started reviewing on behalf of Amygos February 16, 2026 09:09 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

Amygos and others added 3 commits February 16, 2026 23:21

Amygos requested a review from Copilot February 17, 2026 11:01

Copilot started reviewing on behalf of Amygos February 17, 2026 11:02 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

	raise HTTPException(status_code=500, detail=f"An error occurred: {str(e)}")
	raise HTTPException(status_code=500, detail="Unexpected error while processing transcription")

	"model": params.get("model", "voxtral-mini-latest"),
	"model": params.get("model") or "voxtral-mini-latest",

-        result = response.json()
+        try:
+            result = response.json()
+        except ValueError as e:
+            body_preview = response.text[:500] if response.text else ""
+            raise ValueError(
+                f"Failed to decode Deepgram JSON response "
+                f"(status {response.status_code}, "
+                f"content-type {response.headers.get('Content-Type')}): {e}; "
+                f"body preview: {body_preview!r}"
+            )

Conversation

Amygos commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Amygos commented Feb 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments