whisper : add support for new distilled Whisper models #1424

ggerganov · 2023-11-03T11:30:21Z

Initial support for https://huggingface.co/distil-whisper

Currently, the chunk-based transcription strategy is not implemented, so there can be sub-optimal quality when using the distilled models with whisper.cpp.

# clone OpenAI whisper and whisper.cpp
git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp

# get the models
cd whisper.cpp/models
git clone https://huggingface.co/distil-whisper/distil-medium.en
git clone https://huggingface.co/distil-whisper/distil-large-v2

# convert to ggml
python3 ./convert-h5-to-ggml.py ./distil-medium.en/ ../../whisper .
mv ggml-model.bin ggml-medium.en-distil.bin

python3 ./convert-h5-to-ggml.py ./distil-large-v2/ ../../whisper .
mv ggml-model.bin ggml-large-distil.bin

Run the transcription as usual:

make -j && ./main -m models/ggml-medium.en-distil.bin -f samples/gb0.wav

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/gb0.wav' (2037686 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =     3.52 MB, ( 1726.62 / 21845.34)

[00:00:00.000 --> 00:00:30.000]   Good morning. This Tuesday is election day. After months of spirited debate and vigorous campaigning. The time has come for Americans to make important decisions about our nation's future. I encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats and independents can find common ground on at least one point. Our system of
[00:00:30.000 --> 00:01:00.000]   representative democracy is one of America's greatest strengths. The United States was founded on the belief that all men are created equal. Every election day, millions of Americans of all races, religions, and backgrounds step into voting booths throughout the nation. Whether they are rich or poor, old or young, each of them has an equal share in choosing the path that our country will take. And every ballot they cast is a reminder that our founding principles are alive and well. Voting is one of the great privileges of America. Voting
[00:01:00.000 --> 00:01:30.000]   American citizenship. And it has always required brave defenders. As you head to the polls next week, remember the sacrifices that had been made by generations of Americans in uniform to preserve our way of life. From Bunker Hill to Baghdad, the men and women of American armed forces have been devoted guardians of our democracy. All of us owe them and their families a special debt of gratitude on Election Day. Americans should also remember the important example that our elections set throughout the world.
[00:01:30.000 --> 00:02:00.000]   Young democracies from Georgia and Ukraine to Afghanistan and Iraq and look to the united States for proof that self-government can endure and nations that still live under tyranny and oppression can find hope and inspiration in our commitment to liberty. For more than two centuries Americans have demonstrated the ability of free people to choose their own leaders. Our nation has flourished because of its commitment to trusting the wisdom of our citizenry. In this year's election, we will see this tradition continue.
[00:02:00.000 --> 00:02:30.000]   that we are blessed to live in a free nation guided by the will of the people. Thank you for listening.


whisper_print_timings:     load time =   367.80 ms
whisper_print_timings:     fallbacks =   8 p /   0 h
whisper_print_timings:      mel time =    64.09 ms
whisper_print_timings:   sample time =  1023.11 ms /  2007 runs (    0.51 ms per run)
whisper_print_timings:   encode time =  2848.93 ms /     5 runs (  569.79 ms per run)
whisper_print_timings:   decode time =  4086.64 ms /  1986 runs (    2.06 ms per run)
whisper_print_timings:   prompt time =    39.07 ms /    13 runs (    3.01 ms per run)
whisper_print_timings:    total time =  8500.04 ms

make -j && ./main -m models/ggml-large-distil.bin -f samples/gb0.wav

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/gb0.wav' (2037686 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =     4.39 MB, ( 3278.36 / 21845.34)

[00:00:01.000 --> 00:00:30.000]   Good today, this Tuesday is election day. After months of spirited debate and vigorous campaigning, the time has come for Americans to make importing decisions about our nation's future. I encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy, but as the campaigns come to a close, Republicans, Democrats, an independents can find common ground on at least one point.
[00:00:30.740 --> 00:01:00.000]   Democratic democracy is one of America's greatest strengths. The United States was founded on the belief that all men are created equal. Every election day, millions of Americans of all races, religions and backgrounds step into voting boosts throughout t nation, whether they are their rich or poor, old or young. Each of them has an equal share in choosing the path that our country will take. And every ballot they cast is a reminder that our founding principles are alive and well. Voting is one of the grepilagu the
[00:01:00.700 --> 00:01:30.000]   American citizenship, and it has always required brave defenders. As you head to the polls next week, remember the sacrifices that have been made by generations of Americans in uniform to preserve our way of life, from Bucker Hill to Baghdad. The men and women of American armed forces have been devoted guardians of our democracy. All of us owe them and their families a special debt of gratitude on Election Day. Americans should also remember the important example that our elections set throughout the world.
[00:01:30.760 --> 00:02:00.000]   Young democracies from Georgia and Ukraine to Afghanistan and Iraq can look to the United States for proof that self-government can endure, and nations that still live under tyranny and oppression can find hope and inspiration in our comitement to liberty. For more than two centuries, Americans have demonstrated the ability of free people to choose their own leaders. Our nation has flourished because of its commitment to trusting the wisdom of our citizenry. In thi year's election, we will see this tradition continue.
[00:02:01.000 --> 00:02:30.000]   that we are blessed to live in a free nation guided by the will of the people. Thank you for listening.


whisper_print_timings:     load time =   628.78 ms
whisper_print_timings:     fallbacks =   8 p /   0 h
whisper_print_timings:      mel time =    61.44 ms
whisper_print_timings:   sample time =  1195.35 ms /  2339 runs (    0.51 ms per run)
whisper_print_timings:   encode time =  4966.54 ms /     5 runs (  993.31 ms per run)
whisper_print_timings:   decode time =  5783.05 ms /  2318 runs (    2.49 ms per run)
whisper_print_timings:   prompt time =    49.82 ms /    13 runs (    3.83 ms per run)
whisper_print_timings:    total time = 12764.79 ms

nchudleigh · 2023-11-04T18:21:29Z

Benched on M1 Pro, looks promising

Commit	Model	Hardware	Recording Length (seconds)	Thread	Processor Count	Load Time (ms)	Sample Time (ms)	Encode Time (ms)	Decode Time (ms)	Sample Time per Run (ms)	Encode Time per Run (ms)	Decode Time per Run (ms)	Total Time (ms)
`b8c93c5`	tiny.en	Apple M1 Pro	28.225	8	1	45.83	56.11	51.28	234.15	0.43	51.28	1.8	459.06
`b8c93c5`	base.en	Apple M1 Pro	28.225	8	1	78.47	54.31	86.47	352.43	0.4	86.47	2.61	648.84
`b8c93c5`	medium-distil	Apple M1 Pro	28.225	8	1	366.75	49.91	607.9	254.47	0.43	607.9	2.19	1371.29
`b8c93c5`	small.en	Apple M1 Pro	28.225	8	1	227.93	55.84	237.69	814.98	0.4	237.69	5.91	1427.06
`b8c93c5`	medium.en	Apple M1 Pro	28.225	8	1	582.13	56.07	663.41	1788.39	0.42	663.41	13.45	3198.35
`b8c93c5`	medium	Apple M1 Pro	28.225	8	1	594.81	56.27	668.03	1857.32	0.4	668.03	13.46	3331.47
`b8c93c5`	large-distil	Apple M1 Pro	28.225	8	1	695.16	233.35	1099.83	1259.63	0.49	1099.83	2.65	3384.95
`b8c93c5`	large	Apple M1 Pro	28.225	8	1	1724.06	55.01	1200.55	2870.04	0.42	1200.55	21.91	6039.54

royshil · 2023-11-05T02:11:44Z

Distil Whisper on HF model now provides GGML prebuilt (no need to convert?):
e.g.

* whisper : add support for new distilled Whisper models * whisper : print log when using distilled models

vgarleanu · 2024-01-05T17:27:51Z

Apologies for resurrecting a merged PR @ggerganov, but whats the reasoning behind disabling timestamps if the model is distilled?

ggerganov · 2024-01-06T13:58:20Z

AFAIK distilled models are not trained with timestamps, so the inference should not try to predict those

vgarleanu · 2024-01-07T19:15:45Z

AFAIK distilled models are not trained with timestamps, so the inference should not try to predict those

I see. I find that interesting tho, as when I commented the line that disabled timestamps out, correct word level timestamps were generated by distil-whisper, and were accurate.

NightMachinery · 2024-07-02T20:58:44Z

I downloaded the latest distilled model:

wget https://huggingface.co/distil-whisper/distil-large-v3-ggml/resolve/main/ggml-distil-large-v3.bin -P ./models

But when running this model using:

./stream -m models/ggml-distil-large-v3.bin -t 6 --step 0 --length 30000 -vth 0.6

I don't see the message using distilled model - forcing no_timestamps. Is this expected behavior? Is it using the so called chunked algorithm?

* whisper : add support for new distilled Whisper models * whisper : print log when using distilled models

whisper : add support for new distilled Whisper models

b8c93c5

ggerganov mentioned this pull request Nov 3, 2023

[Distil-Whisper] Add support for Distil-Whisper #1423

Open

whisper : print log when using distilled models

673c55c

ggerganov merged commit 39cfad0 into master Nov 5, 2023
67 of 68 checks passed

emcodem mentioned this pull request Nov 7, 2023

distil-large support request. Const-me/Whisper#187

Open

vonstring pushed a commit to vonstring/whisper.cpp that referenced this pull request Nov 7, 2023

whisper : add support for new distilled Whisper models (ggerganov#1424)

ba08fd6

* whisper : add support for new distilled Whisper models * whisper : print log when using distilled models

edmundmiller mentioned this pull request Nov 13, 2023

Support distilled model natrys/whisper.el#15

Open

felrock pushed a commit to felrock/whisper.cpp that referenced this pull request Nov 18, 2023

whisper : add support for new distilled Whisper models (ggerganov#1424)

8d138b9

* whisper : add support for new distilled Whisper models * whisper : print log when using distilled models

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023

whisper : add support for new distilled Whisper models (ggerganov#1424)

77f8354

* whisper : add support for new distilled Whisper models * whisper : print log when using distilled models

hlevring mentioned this pull request Feb 2, 2024

Webspher often produces hallucinations #1824

Open

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024

whisper : add support for new distilled Whisper models (ggerganov#1424)

0e7a8e3

* whisper : add support for new distilled Whisper models * whisper : print log when using distilled models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : add support for new distilled Whisper models #1424

whisper : add support for new distilled Whisper models #1424

ggerganov commented Nov 3, 2023 •

edited

Loading

nchudleigh commented Nov 4, 2023 •

edited

Loading

royshil commented Nov 5, 2023

vgarleanu commented Jan 5, 2024

ggerganov commented Jan 6, 2024

vgarleanu commented Jan 7, 2024

NightMachinery commented Jul 2, 2024

whisper : add support for new distilled Whisper models #1424

whisper : add support for new distilled Whisper models #1424

Conversation

ggerganov commented Nov 3, 2023 • edited Loading

nchudleigh commented Nov 4, 2023 • edited Loading

royshil commented Nov 5, 2023

vgarleanu commented Jan 5, 2024

ggerganov commented Jan 6, 2024

vgarleanu commented Jan 7, 2024

NightMachinery commented Jul 2, 2024

ggerganov commented Nov 3, 2023 •

edited

Loading

nchudleigh commented Nov 4, 2023 •

edited

Loading