Skip to content

numq/speech-recognition

Repository files navigation

Speech recognition


🌟 Support this project
Bitcoin (BTC) bc1qs6qq0fkqqhp4whwq8u8zc5egprakvqxewr5pmx
Ethereum (ETH) 0x3147bEE3179Df0f6a0852044BFe3C59086072e12
USDT (TRC-20) TKznmR65yhPt5qmYCML4tNSWFeeUkgYSEV

JVM library for speech recognition, written in Kotlin and based on the C++ library whisper.cpp and ML model Silero

See also

Features

  • Recognizes speech in PCM audio data and returns a string with the result
  • Supports any sampling rate and number of channels due to resampling and downmixing

Installation

  • Download latest release

  • Add library dependency

    dependencies {
         implementation(file("/path/to/jar"))
    }

whisper.cpp

  • Unzip binaries

  • Download one of the models here or use any other compatible model

Silero

  • Add ONNX dependency

    dependencies {
         implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0")
    }
  • Download

Usage

TL;DR

See the example module for implementation details

  • Call recognize to process the input data and get recognized string

Step-by-step

  • Load binaries if you are going to use whisper.cpp

    • CPU
      SpeechRecognition.Whisper.loadCPU(
       ggmlBase = "/path/to/ggml-base", 
       ggmlCpu = "/path/to/ggml-cpu",
       ggml = "/path/to/ggml",
       speechRecognitionWhisper = "/path/to/speech-recognition-whisper",
      )
    • CUDA
      SpeechRecognition.Whisper.loadCUDA(
       ggmlBase = "/path/to/ggml-base", 
       ggmlCpu = "/path/to/ggml-cpu",
       ggmlCuda = "/path/to/ggml-cuda",
       ggml = "/path/to/ggml",
       speechRecognitionWhisper = "/path/to/speech-recognition-whisper",
      )
  • Create an instance

    whisper.cpp

    SpeechRecognition.Whisper.create(modelPath = "/path/to/model")

    Silero

    SpeechRecognition.Silero.create(modelPath = "/path/to/model")
  • Call minimumInputSize to get the audio producer buffer size for real-time detection

  • Call adjustTemperature to adjust the temperature parameter

  • Call recognize passing the input data, sample rate, and number of channels as arguments

  • Call reset to reset the internal state - for example when the audio source changes

  • Call close to release resources

Requirements

  • JVM version 9 or higher

License

This project is licensed under the Apache License 2.0

Acknowledgments

About

JVM library for speech-to-text recognition, written in Kotlin and based on the C++ library whisper.cpp

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published