🌟 | Support this project |
---|---|
![]() |
bc1qs6qq0fkqqhp4whwq8u8zc5egprakvqxewr5pmx |
![]() |
0x3147bEE3179Df0f6a0852044BFe3C59086072e12 |
![]() |
TKznmR65yhPt5qmYCML4tNSWFeeUkgYSEV |
JVM library for speech recognition, written in Kotlin and based on the C++ library whisper.cpp and ML model Silero
-
Stretch to change the speed of audio without changing the pitch
-
Voice Activity Detection to extract speech from audio
-
Speech generation to generate voice audio from text
-
Text generation to generate text from prompt
-
Noise reduction to remove noise from audio
- Recognizes speech in PCM audio data and returns a string with the result
- Supports any sampling rate and number of channels due to resampling and downmixing
-
Download latest release
-
Add library dependency
dependencies { implementation(file("/path/to/jar")) }
-
Unzip binaries
-
Download one of the models here or use any other compatible model
-
Add ONNX dependency
dependencies { implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0") }
-
Download
See the example module for implementation details
- Call
recognize
to process the input data and get recognized string
-
Load binaries if you are going to use whisper.cpp
- CPU
SpeechRecognition.Whisper.loadCPU( ggmlBase = "/path/to/ggml-base", ggmlCpu = "/path/to/ggml-cpu", ggml = "/path/to/ggml", speechRecognitionWhisper = "/path/to/speech-recognition-whisper", )
- CUDA
SpeechRecognition.Whisper.loadCUDA( ggmlBase = "/path/to/ggml-base", ggmlCpu = "/path/to/ggml-cpu", ggmlCuda = "/path/to/ggml-cuda", ggml = "/path/to/ggml", speechRecognitionWhisper = "/path/to/speech-recognition-whisper", )
- CPU
-
Create an instance
SpeechRecognition.Whisper.create(modelPath = "/path/to/model")
SpeechRecognition.Silero.create(modelPath = "/path/to/model")
-
Call
minimumInputSize
to get the audio producer buffer size for real-time detection -
Call
adjustTemperature
to adjust the temperature parameter -
Call
recognize
passing the input data, sample rate, and number of channels as arguments -
Call
reset
to reset the internal state - for example when the audio source changes -
Call
close
to release resources
- JVM version 9 or higher
This project is licensed under the Apache License 2.0