Skip to content

Commit

Permalink
Merge pull request #8 from mobiusml/fw_pr
Browse files Browse the repository at this point in the history
Updates to mobius version to comply with SYSTRAN version
  • Loading branch information
Jiltseb authored Apr 11, 2024
2 parents 911c62d + caaa593 commit 538366b
Show file tree
Hide file tree
Showing 14 changed files with 860 additions and 306 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Contributions are welcome! Here are some pointers to help you install the librar
We recommend installing the module in editable mode with the `dev` extra requirements:

```bash
git clone https://github.com/guillaumekln/faster-whisper.git
git clone https://github.com/SYSTRAN/faster-whisper.git
cd faster-whisper/
pip install -e .[dev]
```
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 Guillaume Klein
Copyright (c) 2023 SYSTRAN

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
71 changes: 48 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,11 @@
[![CI](https://github.com/guillaumekln/faster-whisper/workflows/CI/badge.svg)](https://github.com/guillaumekln/faster-whisper/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/faster-whisper.svg)](https://badge.fury.io/py/faster-whisper)
[![CI](https://github.com/SYSTRAN/faster-whisper/workflows/CI/badge.svg)](https://github.com/SYSTRAN/faster-whisper/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/faster-whisper.svg)](https://badge.fury.io/py/faster-whisper)

# Mobius Faster Whisper transcription with CTranslate2

**faster-whisper** is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

Mobius faster-whisper builds on top of faster-whisper v0.10.0 (latest stable version) and support additional functionalities:

- Handling multilingual videos.
- Seed fixing for consistency across runs.
- Use `log_prob_low_threshold` to skip ambiguous segments from transcription.
- Better language prediction using multiple audio segments.
- Batched inference for faster transcription: Around 100x real time speed.
- Streaming (segment-level) or non-streaming options for Batched inference.
- Option for faster feature extraction with torchaudio.

## Benchmark

### Whisper
Expand All @@ -24,7 +14,7 @@ For reference, here's the time and memory usage that are required to transcribe

* [openai/whisper](https://github.com/openai/whisper)@[6dea21fd](https://github.com/openai/whisper/commit/6dea21fd7f7253bfe450f1e2512a0fe47ee2d258)
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[3b010f9](https://github.com/ggerganov/whisper.cpp/commit/3b010f9bed9a6068609e9faf52383aea792b0362)
* [faster-whisper](https://github.com/guillaumekln/faster-whisper)@[cce6b53e](https://github.com/guillaumekln/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e)
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[cce6b53e](https://github.com/SYSTRAN/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e)

### Large-v2 model on GPU

Expand Down Expand Up @@ -127,13 +117,13 @@ pip install faster-whisper
### Install the master branch

```bash
pip install --force-reinstall "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz"
pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz"
```

### Install a specific commit

```bash
pip install --force-reinstall "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"
pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"
```

</details>
Expand Down Expand Up @@ -169,18 +159,53 @@ for segment in segments:
segments, _ = model.transcribe("audio.mp3")
segments = list(segments) # The transcription will actually run here.
```
### Faster-distil-whisper
For usage of `faster-distil-whisper`, please refer to: https://github.com/guillaumekln/faster-whisper/issues/533

### multi-segment language detection

To directly use the model for improved language detection, following code snippet can be used:

```python
model_size = "distil-large-v2"
# model_size = "distil-medium.en"
from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cuda", compute_type="float16")
language_info = model.detect_language_multi_segment("audio.mp3")
```

### Batched faster-whisper

The batched version of faster-whisper is inspired by [whisper-x](https://github.com/m-bain/whisperX) licensed under the BSD-4 Clause license and kaldi-based feature extraction. It improves the speed upto 10x compared to openAI implementation. It works by transcribing semantically meaningful audio chunks as batches leading to faster inference.

The following code snippet illustrates how to run inference with batched version on a specified audio file. Please also refer to the test scripts of batched faster whisper.

```python
from faster_whisper import BatchedInferencePipeline

model = WhisperModel("medium", device="cuda", compute_type="float16")
batched_model = BatchedInferencePipeline(model=model)
result = batched_model.transcribe("audio.mp3", batch_size=16)

for segment, info in result:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```

### Faster Distil-Whisper

The Distil-Whisper checkpoints are compatible with the Faster-Whisper package. In particular, the latest [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3)
checkpoint is intrinsically designed to work with the Faster-Whisper transcription algorithm. The following code snippet
demonstrates how to run inference with distil-large-v3 on a specified audio file:

```python
from faster_whisper import WhisperModel

model_size = "distil-large-v3"

model = WhisperModel(model_size, device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5,
language="en", max_new_tokens=128, condition_on_previous_text=False)
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en", condition_on_previous_text=False)

for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```
NOTE: Empirically, `condition_on_previous_text=True` will degrade the performance of `faster-distil-whisper` for long audio. Degradation on the first chunk was observed with `initial_prompt` too.

For more information about the distil-large-v3 model, refer to the original [model card](https://huggingface.co/distil-whisper/distil-large-v3).

### Word-level timestamps

Expand All @@ -200,7 +225,7 @@ The library integrates the [Silero VAD](https://github.com/snakers4/silero-vad)
segments, _ = model.transcribe("audio.mp3", vad_filter=True)
```

The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the [source code](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:
The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the [source code](https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:

```python
segments, _ = model.transcribe(
Expand All @@ -223,7 +248,7 @@ logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

### Going further

See more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.
See more model and transcription options in the [`WhisperModel`](https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.

## Community integrations

Expand Down
2 changes: 1 addition & 1 deletion faster_whisper/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from faster_whisper.audio import decode_audio
from faster_whisper.transcribe import WhisperModel, BatchedInferencePipeline
from faster_whisper.transcribe import BatchedInferencePipeline, WhisperModel
from faster_whisper.utils import available_models, download_model, format_timestamp
from faster_whisper.version import __version__

Expand Down
Empty file.
28 changes: 14 additions & 14 deletions faster_whisper/feature_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def __init__(
self.mel_filters = self.get_mel_filters(
sampling_rate, n_fft, n_mels=feature_size
)
self.n_mels=feature_size
self.n_mels = feature_size

def get_mel_filters(self, sr, n_fft, n_mels=128, dtype=np.float32):
# Initialize the weights
Expand Down Expand Up @@ -145,16 +145,16 @@ def stft(self, frames, window):
data[f] = np.fft.fft(fft_signal, axis=0)[:num_fft_bins]
return data.T

def __call__(self, waveform, enable_ta = False, padding=True, chunk_length=None):
def __call__(self, waveform, enable_ta=False, padding=True, chunk_length=None):
"""
Compute the log-Mel spectrogram of the provided audio, gives similar results
whisper's original torch implementation with 1e-5 tolerance. Additionally, faster
whisper's original torch implementation with 1e-5 tolerance. Additionally, faster
feature extraction option using kaldi fbank features are available if torchaudio is
available.
"""
if enable_ta:
waveform = waveform.astype(np.float32)

if chunk_length is not None:
self.n_samples = chunk_length * self.sampling_rate
self.nb_max_frames = self.n_samples // self.hop_length
Expand All @@ -165,16 +165,16 @@ def __call__(self, waveform, enable_ta = False, padding=True, chunk_length=None)
if enable_ta:
audio = torch.from_numpy(waveform).unsqueeze(0)
fbank = ta_kaldi.fbank(
audio,
sample_frequency=self.sampling_rate,
window_type="hanning",
num_mel_bins=self.n_mels,
)
log_spec = fbank.numpy().T.astype(np.float32) #ctranslate does not take 64
#normalize
#Audioset values as default mean and std for audio
audio,
sample_frequency=self.sampling_rate,
window_type="hanning",
num_mel_bins=self.n_mels,
)
log_spec = fbank.numpy().T.astype(np.float32) # ctranslate does not take 64

# normalize

# Audioset values as default mean and std for audio
mean_val = -4.2677393
std_val = 4.5689974
scaled_features = (log_spec - (mean_val)) / (std_val * 2)
Expand Down
Loading

0 comments on commit 538366b

Please sign in to comment.