Keeping model lists in sync, clarify `make` behavior (docs) #1807

mrienstra · 2024-01-24T22:52:12Z

While working on #1806, I came across the following:

The list of models in ./models/README.md is currently a subset of the list of models in ./models/download-ggml-model.sh, which is itself a subset of what is available from https://huggingface.co/ggerganov/whisper.cpp/tree/main (bulk of options), https://ggml.ggerganov.com/ (duplicates of first only? Didn't check carefully), and https://huggingface.co/akashmjn/tinydiarize-whisper.cpp/tree/main (just one). Is there some logic to as to which models are included / not included? I was tempted to add more to both lists, but wanted to ask first about inclusion criteria.
In ./README.md#more-audio-samples, there's a list of make commands starting with make tiny.en, perhaps this used to be in a different section higher up? Are they still current / helpful? Oh, huh, I probably should've also changed this bit (in the "Quick start" section):
Now build the main example and transcribe an audio file like this:
```
# build the main example
make

# transcribe an audio file
./main -f samples/jfk.wav
```
... to clarify that make alone defaults to the base model, and also downloads the model if necessary (mentioned lower down: "The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples."), meaning there's no need to run bash ./models/download-ggml-model.sh base.en if you just want to try the base model. Maybe I'll take another pass at improving the "Quick start" section after #1806 merges (or is closed). For someone who wants to just dive in, seems a little confusing to have basic useful info about make after a bunch of console output.
The list showing memory usage (./README.md#memory-usage) only shows 5 models, should there be an issue about bringing this up-to-date? Or at least adding -v1 to large, if that's the case?
The list of models in ./models/README.md gives SHA-1 file hashes, which is nice because they aren't long (keeps the table tidy), but also a smidge inconvenient, since Hugging Face provides SHA-256 files hashes (e.g. for large-v3). I suppose this could be considered a bonus, as one could -- after downloading a model -- compute locally & compare both values from both sources (./models/README.md & Hugging Face) for addition peace of mind / security.
./Makefile doesn't seem to support all models, only those it is explicitly aware of, e.g. it doesn't know about large-v3-q5_0 or small.en-tdrz, so it fails with make: *** No rule to make target large-v3-q5_0'. Stop.So the list of models in./Makefileshould be keep somewhat in sync with e.g../models/download-ggml-model.sh` (see point 1)? Maybe a single source of truth would be nice?

The text was updated successfully, but these errors were encountered:

mrienstra mentioned this issue Jan 24, 2024

Docs: try to make model options / model install methods clearer #1806

Merged

mrienstra changed the title ~~Keeping model lists in sync, etc.~~ Keeping model lists in sync, clarify make behavior (docs) Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keeping model lists in sync, clarify `make` behavior (docs) #1807

Keeping model lists in sync, clarify `make` behavior (docs) #1807

mrienstra commented Jan 24, 2024 •

edited

Loading

Keeping model lists in sync, clarify make behavior (docs) #1807

Keeping model lists in sync, clarify make behavior (docs) #1807

Comments

mrienstra commented Jan 24, 2024 • edited Loading

Keeping model lists in sync, clarify `make` behavior (docs) #1807

Keeping model lists in sync, clarify `make` behavior (docs) #1807

mrienstra commented Jan 24, 2024 •

edited

Loading