Skip to content

Releases: ggerganov/whisper.cpp

v1.7.3

18 Dec 16:15
3de9dee
Compare
Choose a tag to compare

Overview

  • Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
  • Reduce hallucinations during silence by @jkarthic in #2629
  • Implement no_speech_thold by @jkarthic in #2625
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra Metal tiny 1 1 7.90 1.26 0.35 0.01 ed733e8
M2 Ultra Metal tiny-q5_0 1 1 8.44 1.23 0.36 0.01 ed733e8
M2 Ultra Metal tiny-q5_1 1 1 8.26 1.27 0.37 0.01 ed733e8
M2 Ultra Metal tiny-q8_0 1 1 8.03 1.21 0.35 0.01 ed733e8
M2 Ultra Metal base 1 1 13.77 1.80 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_0 1 1 15.02 1.72 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_1 1 1 14.93 1.74 0.42 0.02 ed733e8
M2 Ultra Metal base-q8_0 1 1 14.26 1.68 0.41 0.02 ed733e8
M2 Ultra Metal small 1 1 39.76 3.54 0.85 0.05 ed733e8
M2 Ultra Metal small-q5_0 1 1 45.07 3.47 0.87 0.05 ed733e8
M2 Ultra Metal small-q5_1 1 1 44.82 3.49 0.87 0.05 ed733e8
M2 Ultra Metal small-q8_0 1 1 41.79 3.30 0.84 0.05 ed733e8
M2 Ultra Metal medium 1 1 106.73 7.28 1.78 0.11 ed733e8
M2 Ultra Metal medium-q5_0 1 1 124.43 6.63 1.83 0.12 ed733e8
M2 Ultra Metal medium-q5_1 1 1 124.19 6.70 1.84 0.12 ed733e8
M2 Ultra Metal medium-q8_0 1 1 113.88 6.52 1.75 0.11 ed733e8
M2 Ultra Metal medium-dis 1 1 94.97 0.97 0.22 0.01 ed733e8
M2 Ultra Metal large-v2 1 1 193.33 10.53 2.65 0.20 ed733e8
M2 Ultra Metal large-v2-q5_0 1 1 229.22 9.52 2.72 0.23 ed733e8
M2 Ultra Metal large-v2-q5_1 1 1 229.40 9.62 2.73 0.23 ed733e8
M2 Ultra Metal large-v2-q8_0 1 1 207.30 9.36 2.59 0.21 ed733e8
M2 Ultra Metal large-v2-dis 1 1 171.43 1.09 0.25 0.02 ed733e8
M2 Ultra Metal large-v3-turbo 1 1 173.45 1.73 0.41 0.03 ed733e8
M2 Ultra Metal large-v3-turbo-q5_0 1 1 205.52 1.52 0.42 0.04 ed733e8
M2 Ultra Metal large-v3-turbo-q8_0 1 1 185.90 1.48 0.40 0.03 ed733e8

What's Changed

New Contributors

Full Changelog: v1.7.2...v1.7.3

v1.7.3-pre

09 Dec 09:34
ed733e8
Compare
Choose a tag to compare
v1.7.3-pre Pre-release
Pre-release

Overview

Massive performance improvements for the Metal backend, especially for beams > 1. Especially for quantized models.
Setting as "pre-release" since there have been major changes to the build system (now using CMake) and I wan't to gather some feedback about how well the project builds now on various platforms. Please leave comments in the discussion to help fix any remaining issues before the official release.

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra Metal tiny 1 1 7.90 1.26 0.35 0.01 ed733e8
M2 Ultra Metal tiny-q5_0 1 1 8.44 1.23 0.36 0.01 ed733e8
M2 Ultra Metal tiny-q5_1 1 1 8.26 1.27 0.37 0.01 ed733e8
M2 Ultra Metal tiny-q8_0 1 1 8.03 1.21 0.35 0.01 ed733e8
M2 Ultra Metal base 1 1 13.77 1.80 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_0 1 1 15.02 1.72 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_1 1 1 14.93 1.74 0.42 0.02 ed733e8
M2 Ultra Metal base-q8_0 1 1 14.26 1.68 0.41 0.02 ed733e8
M2 Ultra Metal small 1 1 39.76 3.54 0.85 0.05 ed733e8
M2 Ultra Metal small-q5_0 1 1 45.07 3.47 0.87 0.05 ed733e8
M2 Ultra Metal small-q5_1 1 1 44.82 3.49 0.87 0.05 ed733e8
M2 Ultra Metal small-q8_0 1 1 41.79 3.30 0.84 0.05 ed733e8
M2 Ultra Metal medium 1 1 106.73 7.28 1.78 0.11 ed733e8
M2 Ultra Metal medium-q5_0 1 1 124.43 6.63 1.83 0.12 ed733e8
M2 Ultra Metal medium-q5_1 1 1 124.19 6.70 1.84 0.12 ed733e8
M2 Ultra Metal medium-q8_0 1 1 113.88 6.52 1.75 0.11 ed733e8
M2 Ultra Metal medium-dis 1 1 94.97 0.97 0.22 0.01 ed733e8
M2 Ultra Metal large-v2 1 1 193.33 10.53 2.65 0.20 ed733e8
M2 Ultra Metal large-v2-q5_0 1 1 229.22 9.52 2.72 0.23 ed733e8
M2 Ultra Metal large-v2-q5_1 1 1 229.40 9.62 2.73 0.23 ed733e8
M2 Ultra Metal large-v2-q8_0 1 1 207.30 9.36 2.59 0.21 ed733e8
M2 Ultra Metal large-v2-dis 1 1 171.43 1.09 0.25 0.02 ed733e8
M2 Ultra Metal large-v3-turbo 1 1 173.45 1.73 0.41 0.03 ed733e8
M2 Ultra Metal large-v3-turbo-q5_0 1 1 205.52 1.52 0.42 0.04 ed733e8
M2 Ultra Metal large-v3-turbo-q8_0 1 1 185.90 1.48 0.40 0.03 ed733e8

What's Changed

Full Changelog: v1.7.2...v1.7.3-pre

v1.7.2

19 Nov 16:55
6266a9f
Compare
Choose a tag to compare

Overview

  • Various improvements in the Metal backend
  • Fix extra memory usage for large samples
  • Remove limit for ggml_context (i.e. more beams and processors are supported)
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 9.51 1.39 0.41 0.01 83ac284
M2 Ultra METAL tiny-q5_0 1 1 9.57 1.41 0.42 0.01 83ac284
M2 Ultra METAL tiny-q5_1 1 1 8.74 1.39 0.42 0.01 83ac284
M2 Ultra METAL tiny-q8_0 1 1 8.36 1.33 0.41 0.01 83ac284
M2 Ultra METAL base 1 1 14.27 1.90 0.63 0.02 83ac284
M2 Ultra METAL base-q5_0 1 1 15.50 1.90 0.65 0.02 83ac284
M2 Ultra METAL base-q5_1 1 1 15.67 1.88 0.65 0.02 83ac284
M2 Ultra METAL base-q8_0 1 1 14.69 1.81 0.63 0.02 83ac284
M2 Ultra METAL small 1 1 40.85 3.77 1.43 0.05 83ac284
M2 Ultra METAL small-q5_0 1 1 45.99 3.90 1.52 0.05 83ac284
M2 Ultra METAL small-q5_1 1 1 46.19 3.83 1.50 0.06 83ac284
M2 Ultra METAL small-q8_0 1 1 42.90 3.65 1.46 0.05 83ac284
M2 Ultra METAL medium 1 1 109.01 7.59 3.24 0.11 83ac284
M2 Ultra METAL medium-q5_0 1 1 126.78 7.55 3.45 0.13 83ac284
M2 Ultra METAL medium-q5_1 1 1 127.71 7.39 3.43 0.13 83ac284
M2 Ultra METAL medium-q8_0 1 1 115.97 7.21 3.35 0.12 83ac284
M2 Ultra METAL medium-dis 1 1 97.74 1.06 0.36 0.01 83ac284
M2 Ultra METAL large-v2 1 1 196.99 11.29 5.06 0.20 83ac284
M2 Ultra METAL large-v2-q5_0 1 1 233.88 10.83 5.56 0.24 83ac284
M2 Ultra METAL large-v2-q5_1 1 1 234.03 10.73 5.46 0.24 83ac284
M2 Ultra METAL large-v2-q8_0 1 1 210.83 10.29 5.23 0.22 83ac284
M2 Ultra METAL large-v2-dis 1 1 175.37 1.18 0.42 0.02 83ac284
M2 Ultra METAL large-v3-turbo 1 1 177.35 1.85 0.73 0.03 83ac284
M2 Ultra METAL large-v3-turbo-q5_0 1 1 209.31 1.69 0.80 0.04 83ac284
M2 Ultra METAL large-v3-turbo-q8_0 1 1 189.55 1.64 0.75 0.03 83ac284

What's Changed

New Contributors

Full Changelog: v1.7.1...v1.7.2

v1.7.2-pre

15 Nov 14:05
f02b40b
Compare
Choose a tag to compare
v1.7.2-pre Pre-release
Pre-release

Overview

This is a pre-release since I think there have been some reports about memory leaks which I haven't had the time to investigate and confirm. If these are resolved in the next days, will add them to the official 1.7.2 release next week.

  • Various improvements in the Metal backend
  • Fix extra memory usage for large samples
  • Remove limit for ggml_context (i.e. more beams and processors are supported)
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 9.51 1.39 0.41 0.01 83ac284
M2 Ultra METAL tiny-q5_0 1 1 9.57 1.41 0.42 0.01 83ac284
M2 Ultra METAL tiny-q5_1 1 1 8.74 1.39 0.42 0.01 83ac284
M2 Ultra METAL tiny-q8_0 1 1 8.36 1.33 0.41 0.01 83ac284
M2 Ultra METAL base 1 1 14.27 1.90 0.63 0.02 83ac284
M2 Ultra METAL base-q5_0 1 1 15.50 1.90 0.65 0.02 83ac284
M2 Ultra METAL base-q5_1 1 1 15.67 1.88 0.65 0.02 83ac284
M2 Ultra METAL base-q8_0 1 1 14.69 1.81 0.63 0.02 83ac284
M2 Ultra METAL small 1 1 40.85 3.77 1.43 0.05 83ac284
M2 Ultra METAL small-q5_0 1 1 45.99 3.90 1.52 0.05 83ac284
M2 Ultra METAL small-q5_1 1 1 46.19 3.83 1.50 0.06 83ac284
M2 Ultra METAL small-q8_0 1 1 42.90 3.65 1.46 0.05 83ac284
M2 Ultra METAL medium 1 1 109.01 7.59 3.24 0.11 83ac284
M2 Ultra METAL medium-q5_0 1 1 126.78 7.55 3.45 0.13 83ac284
M2 Ultra METAL medium-q5_1 1 1 127.71 7.39 3.43 0.13 83ac284
M2 Ultra METAL medium-q8_0 1 1 115.97 7.21 3.35 0.12 83ac284
M2 Ultra METAL medium-dis 1 1 97.74 1.06 0.36 0.01 83ac284
M2 Ultra METAL large-v2 1 1 196.99 11.29 5.06 0.20 83ac284
M2 Ultra METAL large-v2-q5_0 1 1 233.88 10.83 5.56 0.24 83ac284
M2 Ultra METAL large-v2-q5_1 1 1 234.03 10.73 5.46 0.24 83ac284
M2 Ultra METAL large-v2-q8_0 1 1 210.83 10.29 5.23 0.22 83ac284
M2 Ultra METAL large-v2-dis 1 1 175.37 1.18 0.42 0.02 83ac284
M2 Ultra METAL large-v3-turbo 1 1 177.35 1.85 0.73 0.03 83ac284
M2 Ultra METAL large-v3-turbo-q5_0 1 1 209.31 1.69 0.80 0.04 83ac284
M2 Ultra METAL large-v3-turbo-q8_0 1 1 189.55 1.64 0.75 0.03 83ac284

What's Changed

New Contributors

Full Changelog: v1.7.1...v1.7.2-pre

v1.7.1

07 Oct 10:09
ebca09a
Compare
Choose a tag to compare

Overview

  • Fix Vulkan crashes
  • Performance stats for Vulkan on RTX 2060
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 VULKAN tiny 1 0 30.38 1.37 1.04 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_0 1 0 20.98 1.38 0.99 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_1 1 0 20.74 1.30 0.96 0.05 9f346d0
RTX 2060 VULKAN base 1 0 44.69 1.59 1.78 0.09 9f346d0
RTX 2060 VULKAN base-q5_0 1 0 39.72 2.11 1.72 0.08 9f346d0
RTX 2060 VULKAN base-q5_1 1 0 39.45 2.01 1.63 0.08 9f346d0
RTX 2060 VULKAN small 1 0 160.02 3.53 4.64 0.23 9f346d0
RTX 2060 VULKAN small-q5_0 1 0 141.52 4.54 4.44 0.20 9f346d0
RTX 2060 VULKAN small-q5_1 1 0 141.03 4.63 4.18 0.20 9f346d0
RTX 2060 VULKAN medium 1 0 472.66 7.55 11.35 0.56 9f346d0
RTX 2060 VULKAN medium-q5_0 1 0 395.55 9.81 10.64 0.49 9f346d0
RTX 2060 VULKAN medium-q5_1 1 0 398.85 10.16 10.15 0.50 9f346d0
RTX 2060 VULKAN medium-dis 1 0 427.26 1.26 1.20 0.08 9f346d0
RTX 2060 VULKAN large-v2 1 0 924.60 12.36 18.56 1.01 9f346d0
RTX 2060 VULKAN large-v2-q5_0 1 0 774.21 17.25 17.17 0.85 9f346d0
RTX 2060 VULKAN large-v2-q5_1 1 0 779.75 17.44 16.27 0.85 9f346d0
RTX 2060 VULKAN large-v2-dis 1 0 833.35 1.38 1.56 0.10 9f346d0
RTX 2060 VULKAN large-v3-turbo 1 0 839.90 2.11 2.70 0.16 9f346d0
RTX 2060 VULKAN large-v3-turbo-q5_0 1 0 705.49 3.22 2.53 0.14 9f346d0

What's Changed

New Contributors

Full Changelog: v1.7.0...v1.7.1

Binaries

https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590

v1.7.0

05 Oct 14:15
6a94163
Compare
Choose a tag to compare

Overview

  • Fix crashes with high number of beams
  • Reduce overal VRAM usage
  • Optimize Encoder performance

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 8.37 1.44 0.48 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 1 9.81 1.46 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 1 8.80 1.47 0.50 0.01 6a94163
M2 Ultra METAL base 1 1 16.11 1.96 0.74 0.02 6a94163
M2 Ultra METAL base-q5_0 1 1 16.38 1.99 0.78 0.02 6a94163
M2 Ultra METAL base-q5_1 1 1 16.72 2.00 0.77 0.02 6a94163
M2 Ultra METAL small 1 1 41.26 3.88 1.66 0.05 6a94163
M2 Ultra METAL small-q5_0 1 1 46.91 4.02 1.76 0.06 6a94163
M2 Ultra METAL small-q5_1 1 1 47.05 4.00 1.73 0.06 6a94163
M2 Ultra METAL medium 1 1 111.29 7.79 3.63 0.11 6a94163
M2 Ultra METAL medium-q5_0 1 1 129.78 7.71 3.85 0.13 6a94163
M2 Ultra METAL medium-q5_1 1 1 129.29 7.71 3.87 0.13 6a94163
M2 Ultra METAL medium-dis 1 1 99.27 1.09 0.43 0.02 6a94163
M2 Ultra METAL large-v2 1 1 198.81 11.54 5.59 0.20 6a94163
M2 Ultra METAL large-v2-q5_0 1 1 236.18 11.12 6.11 0.24 6a94163
M2 Ultra METAL large-v2-q5_1 1 1 235.88 11.14 6.01 0.24 6a94163
M2 Ultra METAL large-v2-dis 1 1 177.41 1.21 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 1 178.92 1.89 0.83 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 1 211.44 1.73 0.90 0.04 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 0 10.04 1.37 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 0 10.02 1.36 0.53 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 0 11.08 1.37 0.53 0.01 6a94163
M2 Ultra METAL base 1 0 17.84 1.93 0.77 0.02 6a94163
M2 Ultra METAL base-q5_0 1 0 18.57 1.92 0.81 0.02 6a94163
M2 Ultra METAL base-q5_1 1 0 18.66 1.93 0.82 0.02 6a94163
M2 Ultra METAL small 1 0 48.26 3.95 1.73 0.05 6a94163
M2 Ultra METAL small-q5_0 1 0 53.68 3.99 1.85 0.06 6a94163
M2 Ultra METAL small-q5_1 1 0 53.86 4.00 1.82 0.06 6a94163
M2 Ultra METAL medium 1 0 130.09 8.01 3.82 0.13 6a94163
M2 Ultra METAL medium-q5_0 1 0 148.18 7.92 4.11 0.14 6a94163
M2 Ultra METAL medium-q5_1 1 0 147.95 7.94 4.11 0.14 6a94163
M2 Ultra METAL medium-dis 1 0 116.97 1.11 0.42 0.02 6a94163
M2 Ultra METAL large-v2 1 0 232.43 12.34 5.87 0.22 6a94163
M2 Ultra METAL large-v2-q5_0 1 0 269.72 11.68 6.44 0.26 6a94163
M2 Ultra METAL large-v2-q5_1 1 0 269.71 11.82 6.36 0.26 6a94163
M2 Ultra METAL large-v2-dis 1 0 209.25 1.25 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 0 211.09 1.98 0.84 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 0 244.23 1.81 0.92 0.04 6a94163

Ryzen 9 5950X + RTX 2060

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 1 7.35 0.78 0.24 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 1 6.45 0.67 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 1 6.39 0.66 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA base 1 1 10.20 0.88 0.30 0.01 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 1 11.38 0.92 0.21 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 1 11.76 0.91 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 1 33.06 2.00 0.56 0.03 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 1 35.84 1.84 0.43 0.04 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 1 36.89 1.82 0.42 0.04 6a94163
RTX 2060 AVX2 CUDA medium 1 1 90.65 4.54 1.13 0.08 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 1 104.01 3.80 0.91 0.10 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 1 107.98 3.72 0.87 0.10 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 1 79.08 0.68 0.17 0.01 6a94163
RTX 2060 AVX2 CUDA large-v2 1 1 162.00 7.52 1.92 0.14 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 1 184.59 5.64 1.50 0.16 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 1 193.85 5.55 1.44 0.17 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 1 140.75 0.84 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 1 143.38 1.29 0.36 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 1 163.30 0.93 0.28 0.03 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 0 12.49 0.87 0.23 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 0 10.65 0.78 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 0 10.82 0.77 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA base 1 0 18.97 1.04 0.34 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 0 20.22 1.09 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 0 20.48 1.07 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 0 59.52 2.37 0.70 0.05 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 0 62.98 2.23 0.60 0.06 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 0 63.64 2.21 0.59 0.06 6a94163
RTX 2060 AVX2 CUDA medium 1 0 161.53 5.36 1.53 0.13 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 0 174.96 4.64 1.32 0.15 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 0 178.42 4.57 1.29 0.15 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 0 149.65 0.75 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA large-v2 1 0 280.55 8.74 2.51 0.23 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 0 306.87 6.92 2.08 0.25 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 0 314.25 6.82 2.02 0.26 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 0 259.39 0.91 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 0 261.83 1.44 0.41 0.04 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 0 282.99 1.09 0.33 0.04 6a94163

Vulkan:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 VULKAN tiny 1 0 30.38 1.37 1.04 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_0 1 0 20.98 1.38 0.99 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_1 1 0 20.74 1.30 0.96 0.05 9f346d0
RTX 2060 VULKAN base 1 0 44.69 1.59 1.78 0.09 9f346d0
RTX 2060 VULKAN base-q5_0 1 0 39.72 2.11 1.72 0.08 9f346d0
RTX 2060 VULKAN base-q5_1 1 0 39.45 2.01 1.63 0.08 9f346d0
RTX 2060 VULKAN small 1 0 160.02 3.53 4.64 0.23 9f346d0
RTX 2060 VULKAN small-q5_0 1 0 141.52 4.54 4.44 0.20 9f346d0
RTX 2060 VULKA...
Read more

v1.6.2

27 May 07:36
c7b6988
Compare
Choose a tag to compare

Overview

Bugfix when using multiple whisper_state in parallel: #2182

What's Changed

New Contributors

Full Changelog: v1.6.1...v1.6.2

v1.6.1

21 May 15:46
c10db6e
Compare
Choose a tag to compare

Minor release adding initial ffmpeg support in the examples #2133 (thx @WilliamTambellini)

What's Changed

New Contributors

Full Changelog: v1.6.0...v1.6.1

v1.6.0

15 May 07:13
08981d1
Compare
Choose a tag to compare

Overview

  • Can optionally enable Flash Attention for faster processing on CUDA and Metal devices (#2152)
  • Faster ppc64 performance (40aeeee) (not tested)
  • Fix main slowdown bug (#2070)

Shoutout to @JohannesGaessler for contributing efficient FA CUDA kernels

Some performance numbers for this release:

M1 Pro

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M1 Pro METAL tiny 1 0 39.21 1.74 0.61 0.04 22c96b4
M1 Pro METAL base 1 0 70.76 2.60 0.93 0.06 22c96b4
M1 Pro METAL small 1 0 217.28 6.42 2.14 0.17 22c96b4
M1 Pro METAL medium 1 0 596.74 14.43 4.75 0.45 22c96b4
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M1 Pro METAL tiny 1 1 30.77 1.59 0.54 0.03 22c96b4
M1 Pro METAL base 1 1 60.42 2.29 0.81 0.05 22c96b4
M1 Pro METAL small 1 1 183.82 5.12 1.81 0.14 22c96b4
M1 Pro METAL medium 1 1 517.92 11.60 4.01 0.38 22c96b4

M2 Ultra

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 0 12.32 1.35 0.49 0.01 22c96b4
M2 ULTRA METAL tiny-q5_0 1 0 11.65 1.30 0.51 0.01 22c96b4
M2 ULTRA METAL tiny-q5_1 1 0 12.08 1.30 0.51 0.01 22c96b4
M2 ULTRA METAL base 1 0 17.58 1.90 0.76 0.02 22c96b4
M2 ULTRA METAL base-q5_0 1 0 18.89 1.86 0.79 0.02 22c96b4
M2 ULTRA METAL base-q5_1 1 0 20.69 1.88 0.79 0.02 22c96b4
M2 ULTRA METAL small 1 0 49.32 3.85 1.71 0.05 22c96b4
M2 ULTRA METAL small-q5_0 1 0 54.91 3.81 1.82 0.06 22c96b4
M2 ULTRA METAL small-q5_1 1 0 54.92 3.81 1.79 0.06 22c96b4
M2 ULTRA METAL medium 1 0 134.34 8.04 3.82 0.13 22c96b4
M2 ULTRA METAL medium-q5_0 1 0 151.68 7.59 4.07 0.14 22c96b4
M2 ULTRA METAL medium-q5_1 1 0 151.58 7.67 4.07 0.14 22c96b4
M2 ULTRA METAL medium-dis 1 0 120.82 1.07 0.41 0.02 22c96b4
M2 ULTRA METAL large-v2 1 0 235.63 12.27 5.85 0.22 22c96b4
M2 ULTRA METAL large-v2-q5_0 1 0 273.38 11.17 6.40 0.26 22c96b4
M2 ULTRA METAL large-v2-q5_1 1 0 272.44 11.32 6.29 0.26 22c96b4
M2 ULTRA METAL large-v2-dis 1 0 212.51 1.20 0.47 0.02 22c96b4
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 1 9.07 1.33 0.45 0.01 22c96b4
M2 ULTRA METAL tiny-q5_0 1 1 9.74 1.33 0.47 0.01 22c96b4
M2 ULTRA METAL tiny-q5_1 1 1 8.93 1.31 0.46 0.01 22c96b4
M2 ULTRA METAL base 1 1 15.75 1.87 0.71 0.02 22c96b4
M2 ULTRA METAL base-q5_0 1 1 17.04 1.83 0.74 0.02 22c96b4
M2 ULTRA METAL base-q5_1 1 1 17.17 1.83 0.74 0.02 22c96b4
M2 ULTRA METAL small 1 1 42.33 3.64 1.60 0.05 22c96b4
M2 ULTRA METAL small-q5_0 1 1 47.61 3.63 1.70 0.05 22c96b4
M2 ULTRA METAL small-q5_1 1 1 47.70 3.66 1.68 0.05 22c96b4
M2 ULTRA METAL medium 1 1 114.42 7.53 3.55 0.11 22c96b4
M2 ULTRA METAL medium-q5_0 1 1 132.63 7.02 3.77 0.13 22c96b4
M2 ULTRA METAL medium-q5_1 1 1 132.28 7.10 3.76 0.13 22c96b4
M2 ULTRA METAL medium-dis 1 1 102.34 1.01 0.42 0.01 22c96b4
M2 ULTRA METAL large-v2 1 1 203.01 11.03 5.45 0.20 22c96b4
M2 ULTRA METAL large-v2-q5_0 1 1 240.05 10.18 5.98 0.23 22c96b4
M2 ULTRA METAL large-v2-q5_1 1 1 239.22 10.23 5.87 0.23 22c96b4
M2 ULTRA METAL large-v2-dis 1 1 181.14 1.14 0.48 0.02 22c96b4

Ryzen 9 5950X + RTX 2060

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
Ryzen 9 5950X AVX2 tiny 8 0 195.29 1.57 0.51 0.26 22c96b4
Ryzen 9 5950X AVX2 tiny-q5_0 8 0 213.33 1.10 0.50 0.30 22c96b4
Ryzen 9 5950X AVX2 tiny-q5_1 8 0 219.38 1.18 0.53 0.32 22c96b4
Ryzen 9 5950X AVX2 base 8 0 424.85 3.71 1.03 0.46 22c96b4
Ryzen 9 5950X AVX2 base-q5_0 8 0 473.61 1.81 0.82 0.52 22c96b4
Ryzen 9 5950X AVX2 base-q5_1 8 0 484.14 1.92 0.85 0.56 22c96b4
Ryzen 9 5950X AVX2 small 8 0 1458.32 12.66 3.09 1.26 22c96b4
Ryzen 9 5950X AVX2 small-q5_0 8 0 1673.22 6.42 2.18 1.45 22c96b4
Ryzen 9 5950X AVX2 small-q5_1 8 0 1724.78 6.72 2.32 1.52 22c96b4
Ryzen 9 5950X AVX2 medium 8 0 4333.87 36.80 8.56 3.37 22c96b4
Ryzen 9 5950X AVX2 medium-q5_0 8 0 5194.09 19.21 5.71 3.97 22c96b4
Ryzen 9 5950X AVX2 medium-q5_1 8 0 5450.39 20.01 5.99 4.17 22c96b4
Ryzen 9 5950X AVX2 medium-dis 8 0 3995.19 5.08 1.21 0.55 22c96b4
Ryzen 9 5950X AVX2 large-v2 8 0 8056.16 69.74 16.11 6.13 22c96b4
Ryzen 9 5950X AVX2 large-v2-q5_0 8 0 9799.58 35.16 10.49 7.28 22c96b4
Ryzen 9 5950X AVX2 large-v2-q5_1 8 0 ms 36.74 11.02 7.65 22c96b4
Ryzen 9 5950X AVX2 large-v2-dis 8 0 7490.03 7.40 1.70 0.72 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 8 0 12.54 0.93 0.29 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_0 8 0 12.73 0.98 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_1 8 0 12.72 0.99 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA base 8 0 24.14 1.28 0.41 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_0 8 0 24.58 1.38 0.35 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_1 8 0 24.58 1.37 0.35 0.03 22c96b4
RTX 2060 AVX2 CUDA small 8 0 74.70 2.91 0.84 0.07 22c96b4
RTX 2060 AVX2 CUDA small-q5_0 8 0 76.12 2.84 0.77 0.08 22c96b4
RTX 2060 AVX2 CUDA small-q5_1 8 0 76.14 2.84 0.76 0.08 22c96b4
RTX 2060 AVX2 CUDA medium 8 0 200.69 6.46 1.83 0.17 22c96b4
RTX 2060 AVX2 CUDA medium-q5_0 8 0 204.80 5.90 1.65 0.19 22c96b4
RTX 2060 AVX2 CUDA medium-q5_1 8 0 205.61 5.85 1.61 0.19 22c96b4
RTX 2060 AVX2 CUDA medium-dis 8 0 186.17 0.86 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA large-v2 8 0 347.22 10.36 2.82 0.29 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_0 8 0 357.06 8.81 2.58 0.34 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_1 8 0 356.97 8.62 2.49 0.33 22c96b4
RTX 2060 AVX2 CUDA large-v2-dis 8 0 318.05 1.03 0.34 0.04 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 8 1 7.21 0.76 0.29 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_0 8 1 7.42 0.82 0.18 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_1 8 1 7.38 0.82 0.18 0.02 22c96b4
RTX 2060 AVX2 CUDA ...
Read more

v1.5.5

16 Apr 11:14
7395c70
Compare
Choose a tag to compare

Overview

Many small incremental updates + Token level timestamps with DTW by @denersc in #1485
Feedback is welcome!

Full Changelog: v1.5.4...v1.5.5

What's Changed

New Contributors

Read more