feat: Add Sherpa ONNX backend for ASR and TTS#8523
Draft
richiejp wants to merge 6 commits intomudler:masterfrom
Draft
feat: Add Sherpa ONNX backend for ASR and TTS#8523richiejp wants to merge 6 commits intomudler:masterfrom
richiejp wants to merge 6 commits intomudler:masterfrom
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
5f176ef to
f705e60
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sherpa-ONNX Backend Implementation Plan
Overview
This document outlines the plan to integrate Sherpa-ONNX as a new backend for LocalAI, focusing initially on Text-to-Speech (TTS) capabilities with GPU acceleration support, and later expanding to Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), and other audio processing features.
Sherpa-ONNX is a comprehensive speech processing toolkit that provides:
Table of Contents
Component Requirements
Core Components
1. ONNX Runtime
silero-vadbackend pattern). The Makefile selects the correct package (CPU, GPU-CUDA, or ROCm) based onBUILD_TYPE.silero-vadbackend already uses this approach successfully.onnxruntime-linux-x64-gpu-*.tgzonnxruntime-linux-x64-rocm-*.tgzif available2. Sherpa-ONNX C++ Library
7e227a529be6c383134a358c5744d0eb1cb5ae1fgit clone+cmakebuild insidebackend/go/sherpa-onnx/Makefile, linked against the downloaded ONNX Runtime3. Sherpa-ONNX Go Bindings
Dependencies Analysis
Dependency Management Strategy
We will:
silero-vadpattern)git clone+cmakefrom a specific commit/tag, using the downloaded ONNX RuntimeBoth steps happen inside
backend/go/sherpa-onnx/Makefile, executed by the existingbackend/Dockerfile.golang.Native C++ Dependencies
Sherpa-ONNX uses CMake FetchContent to download these dependencies at specific tagged versions:
Fetched by Sherpa-ONNX (Specific Tags)
kaldi-native-fbank v1.22.3
https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.22.3.tar.gzkaldi-decoder v0.2.11
https://github.com/k2-fsa/kaldi-decoder/archive/refs/tags/v0.2.11.tar.gzsimple-sentencepiece v0.7
https://github.com/pkufool/simple-sentencepiece/archive/refs/tags/v0.7.tar.gzpybind11 v3.0.0 (if Python enabled - we won't enable)
https://github.com/pybind/pybind11/archive/refs/tags/v3.0.0.tar.gzgoogletest v1.13.0 (if tests enabled - we won't enable)
https://github.com/google/googletest/archive/refs/tags/v1.13.0.tar.gzTTS-Specific Dependencies (SHERPA_ONNX_ENABLE_TTS=ON)
piper-phonemize
espeak-ng-for-piper
ucd
Other Dependencies (Header-only or minimal)
Build CMake Configuration
cmake -DCMAKE_BUILD_TYPE=Release \ -DSHERPA_ONNX_ENABLE_GPU=ON \ -DSHERPA_ONNX_ENABLE_TTS=ON \ -DSHERPA_ONNX_ENABLE_BINARY=OFF \ -DSHERPA_ONNX_ENABLE_PYTHON=OFF \ -DSHERPA_ONNX_ENABLE_TESTS=OFF \ -DSHERPA_ONNX_ENABLE_C_API=ON \ -DBUILD_SHARED_LIBS=ON \ -DSHERPA_ONNX_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE=ON \ -DONNXRUNTIME_DIR=/path/to/our/onnxruntime/install \ ..Key Point: Sherpa-ONNX's FetchContent already uses specific tagged releases, not arbitrary commits. These are pinned versions, so the builds are reproducible. We don't need to pre-download these unless we want to override versions.
GPU Acceleration Requirements
Overview
GPU acceleration in Sherpa-ONNX is provided through ONNX Runtime execution providers. We download the appropriate prebuilt ONNX Runtime package (CPU, GPU-CUDA, or ROCm) based on
BUILD_TYPE, then build Sherpa-ONNX against it withSHERPA_ONNX_ENABLE_GPU=ON.NVIDIA CUDA Support
Requirements
Build Process
Download prebuilt ONNX Runtime with CUDA (handled by Makefile):
Build Sherpa-ONNX with GPU Support from pinned commit:
Build Go Backend:
Runtime Configuration
provider="cuda"in TTS/ASR configCUDA_VISIBLE_DEVICESenvironment variableAMD ROCm Support
Requirements
Build Process
ROCm SDK: Provided by the base image (
rocm/dev-ubuntu-24.04:6.4.4) viabackend/Dockerfile.golang.Download prebuilt ONNX Runtime with ROCm (handled by Makefile):
Build Sherpa-ONNX: Same as CUDA but using ROCm-enabled ONNX Runtime
Runtime Configuration
provider="migraphx"in configLD_LIBRARY_PATHincludes ROCm librariesDirectML (Windows GPU)
Requirements
Build Process
--use_dmlBuild Strategy
Makefile-Driven Build (using existing
backend/Dockerfile.golang)Instead of a custom multi-stage Dockerfile, we use the existing
backend/Dockerfile.golangwhich runsmake -C /LocalAI/backend/go/${BACKEND} build. All library acquisition and compilation happens inbackend/go/sherpa-onnx/Makefile, following the pattern established bybackend/go/silero-vad/Makefile.The Dockerfile already provides:
git,cmake,make,g++(build-essential),curl, Go toolchain, protoc, and GPU-specific libraries (CUDA, ROCm, Vulkan) based onBUILD_TYPE.Makefile Workflow
The
backend/go/sherpa-onnx/Makefileperforms these steps in order:onnxruntime-linux-x64-${ONNX_VERSION}.tgzonnxruntime-linux-x64-gpu-${ONNX_VERSION}.tgzonnxruntime-linux-x64-rocm-${ONNX_VERSION}.tgz(if available)sources/onnxruntime/git clonetosources/sherpa-onnx/cmakewith-DSHERPA_ONNX_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE=ONmake -j$(nproc)+make installbackend-assets/lib/package.sh(bundles binary + shared libs)Build Variants
sherpa-onnx-cpu): Default CPU-only buildsherpa-onnx-cuda12): NVIDIA GPU support with CUDA 12.x + cuDNN 9sherpa-onnx-cuda11): NVIDIA GPU support with CUDA 11.8 + cuDNN 8 (optional)sherpa-onnx-hipblas): AMD GPU support with ROCm 6.4+Makefile Integration
GitHub Actions Workflow
Add to
.github/workflows/backend.yml:Backend Index
Add to
backend/index.yaml:Implementation Phases
Phase 1: TTS Support (CPU-only)
Goal: Get basic TTS working with CPU inference, establish build system.
Week 1: Backend Structure & Build System
backend/go/sherpa-onnx/directory structurebackend/Dockerfile.golang(CPU-only)Week 2: TTS Implementation
Load(*pb.ModelOptions)for VITS model loadingTTS(*pb.TTSRequest)for text-to-speech generationWeek 3: Integration & Basic Testing
Deliverables:
backend/Dockerfile.golangPhase 2: GPU Acceleration (CUDA)
Goal: Add NVIDIA GPU support, validate GPU inference works.
Week 4-5: CUDA Build System
BUILD_TYPE=cublasand download GPU ONNX Runtime variantSHERPA_ONNX_ENABLE_GPU=ONWeek 6: GPU Testing & Validation
Deliverables:
Phase 3: AMD ROCm Support
Goal: Add AMD GPU support via MIGraphX.
Week 7-8: ROCm Build System
BUILD_TYPE=hipblasand download ROCm ONNX Runtime variantDeliverables:
Phase 4: ASR Support
Goal: Add speech-to-text functionality.
Week 9-10: ASR Implementation
AudioTranscription(*pb.TranscriptRequest)methodWeek 11: ASR Testing
Deliverables:
Phase 5: Additional Features (Future)
Goal: Implement VAD, keyword spotting, speaker diarization.
Architecture Design
Backend Structure
Backend Implementation
Configuration Schema
Testing Strategy
Unit Tests (backend/go/sherpa-onnx/)
Model Loading Test
TTS Functionality Test
Integration Tests
Manual Testing (GPU)
Test Model
We need to identify or create a tiny VITS model (< 50MB) for CI testing:
User Documentation
Users will install the backend and models from the LocalAI gallery. Documentation should focus on:
1. Feature Overview (
docs/content/features/sherpa-onnx.md)NVIDIA GPU (CUDA 12):
AMD GPU (ROCm):
Supported Models
Download TTS models from the gallery or from Sherpa-ONNX pretrained models.
Model Types
VITS: General purpose TTS, multi-speaker support
Matcha: High quality, slower inference
Kokoro: Multi-lingual (Chinese/English)
API Usage
That's the complete documentation needed - users just need to know how to configure and use the backend.
Review Observations
1. Build Optimization
Using prebuilt ONNX Runtime from upstream eliminates the 30-60+ minute source build. Sherpa-ONNX still needs to be built from source (cmake), but this is much faster (a few minutes).
Recommendation:
sources/as a download cache; Docker layer caching handles the rest.2. Gallery Integration
While this plan focuses on the binary backend, the end-to-end user experience requires model definitions.
Requirement:
galleryYAML definitions for popular Sherpa-ONNX models (VITS, Matcha, etc.).3. Runtime Library Resolution
Relying solely on system paths can sometimes be fragile in containerized environments.
Best Practice:
run.shor the entrypoint, explicitly setLD_LIBRARY_PATHto include the backend's library directory.4. CI Model Selection
Using production-quality models for CI will make tests slow and flaky due to download sizes and inference times.
Status: Resolved (Crush review).
sherpa-onnx-tiny-vits-330k-237m(45MB) available at https://k2-fsa.github.io/sherpa/onnx/pretrained_models/.5. Crush AI Review (2026-02-12)
SherpaOnnxOfflineTtsGenerate) validated.6. Build Strategy Revision (2026-02-12)
silero-vadbackend pattern. This eliminates the 30-60+ minute ONNX Runtime compilation.git clone+cmakeinside the Makefile. This is necessary because upstream does not publish prebuilt C/C++ libraries with the exact configuration we need (GPU-enabled, C API, no binaries).backend/Dockerfile.golanginstead of a custom multi-stage Dockerfile. All library acquisition and build logic lives inbackend/go/sherpa-onnx/Makefile.backend/go/silero-vad/Makefilefor ONNX Runtime, extended with a Sherpa-ONNX build step.Summary
Key Points
Pinned Dependencies:
7e227a529be6c383134a358c5744d0eb1cb5ae1f, built from source)Build Strategy:
silero-vad)git clone+cmakein the Makefilebackend/go/sherpa-onnx/Makefile, using existingbackend/Dockerfile.golangGPU Support:
Testing:
Documentation:
Next Steps
sherpa-onnx-tiny-vits-330k-237mReferences