Faster Whisper | Runpod Worker

This repository contains a modified version of RunPod's Faster Whisper Worker Worker for RunPod. This worker makes concessions and modifications the Runpod worker would be unable to make for wider adoption.

[!NOTE] This worker only provides users the medium and large-v3 models.

Docker Image

RunPod Endpoint Docs

Model Inputs

Input	Type	Description
`audio`	Path	An Audio file as a path or URL.
`audio_base64`	str	An Audio file as a Base64-encoded string.
`model`	str	A supported Whisper model. Choices: "medium", "large-v3". Default: "medium"
`transcription`	str	A supported format for the resulting transcription. Choices: "plain_text", "formatted_text", "srt", "vtt". Default: "plain_text"
`translate`	bool	If True, translates the text to English. Default: False
`translation`	str	A supported format for the resulting translation. Choices: "plain_text", "formatted_text", "srt", "vtt". Default: "plain_text"
`language`	str	Language spoken in the audio, if known. Otherwise specify None to perform language detection. Default: None
`temperature`	float	Temperature to use for sampling. Default: 0
`best_of`	int	Number of candidates when sampling with non-zero temperature. Default: 5
`beam_size`	int	Number of beams in beam search, only applicable when temperature is zero. Default: 5
`patience`	float	Optional patience value to use in beam decoding. Default: None
`length_penalty`	float	Optional token length penalty coefficient (alpha). Default: None
`initial_prompt`	str	An optional string to provide as a prompt for the first window. Default: None
`condition_on_previous_text`	bool	If True, provide the previous output of the model as a prompt for the next window. Default: True
`temperature_increment_on_fallback`	float	Temperature to increase when falling back when the decoding fails. Default: 0.2
`compression_ratio_threshold`	float	If the gzip compression ratio is higher than this value, treat the decoding as failed. Default: 2.4
`logprob_threshold`	float	If the average log probability is lower than this value, treat the decoding as failed. Default: -1.0
`no_speech_threshold`	float	If the probability of the token is higher than this value, consider the segment as silence. Default: 0.6
`enable_vad`	bool	If True, use the voice activity detection (VAD) to filter out parts of the audio without speech. This step is using the Silero VAD model. Default: False
`word_timestamps`	bool	If True, include word timestamps in the output. Default: False

Test Inputs

The following inputs can be used for testing the model:

{
    "input": {
        "audio": "https://github.com/runpod-workers/sample-inputs/raw/main/audio/gettysburg.wav"
    }
}

Sample output

{
    "segments": [
        {
            "id": 1,
            "seek": 106,
            "start": 0.11,
            "end": 3.11,
            "text": " Hola mundo.",
            "tokens": [
                50364,
                22637,
                7968,
                13,
                50514
            ],
            "temperature": 0.1,
            "avg_logprob": -0.8348079785480325,
            "compression_ratio": 0.5789473684210527,
            "no_speech_prob": 0.1453857421875
        }
    ],
    "detected_language": "es",
    "transcription": "Hola mundo.",
    "translation": null,
    "device": "cuda",
    "model": "large-v2",
    "translation_time": 0.7796223163604736
}

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github		.github
builder		builder
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
locustfile.py		locustfile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Faster Whisper | Runpod Worker

Model Inputs

Test Inputs

Sample output

About

Releases

Packages

Contributors 6

Languages

License

partyhatgg/runpod-faster-whisper

Folders and files

Latest commit

History

Repository files navigation

Faster Whisper | Runpod Worker

Model Inputs

Test Inputs

Sample output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages