Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping model lists in sync, clarify make behavior (docs) #1807

Open
mrienstra opened this issue Jan 24, 2024 · 0 comments
Open

Keeping model lists in sync, clarify make behavior (docs) #1807

mrienstra opened this issue Jan 24, 2024 · 0 comments

Comments

@mrienstra
Copy link
Contributor

mrienstra commented Jan 24, 2024

While working on #1806, I came across the following:

  1. The list of models in ./models/README.md is currently a subset of the list of models in ./models/download-ggml-model.sh, which is itself a subset of what is available from https://huggingface.co/ggerganov/whisper.cpp/tree/main (bulk of options), https://ggml.ggerganov.com/ (duplicates of first only? Didn't check carefully), and https://huggingface.co/akashmjn/tinydiarize-whisper.cpp/tree/main (just one). Is there some logic to as to which models are included / not included? I was tempted to add more to both lists, but wanted to ask first about inclusion criteria.

  2. In ./README.md#more-audio-samples, there's a list of make commands starting with make tiny.en, perhaps this used to be in a different section higher up? Are they still current / helpful? Oh, huh, I probably should've also changed this bit (in the "Quick start" section):

    Now build the main example and transcribe an audio file like this:

    # build the main example
    make
    
    # transcribe an audio file
    ./main -f samples/jfk.wav

    ... to clarify that make alone defaults to the base model, and also downloads the model if necessary (mentioned lower down: "The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples."), meaning there's no need to run bash ./models/download-ggml-model.sh base.en if you just want to try the base model. Maybe I'll take another pass at improving the "Quick start" section after #1806 merges (or is closed). For someone who wants to just dive in, seems a little confusing to have basic useful info about make after a bunch of console output.

  3. The list showing memory usage (./README.md#memory-usage) only shows 5 models, should there be an issue about bringing this up-to-date? Or at least adding -v1 to large, if that's the case?

  4. The list of models in ./models/README.md gives SHA-1 file hashes, which is nice because they aren't long (keeps the table tidy), but also a smidge inconvenient, since Hugging Face provides SHA-256 files hashes (e.g. for large-v3). I suppose this could be considered a bonus, as one could -- after downloading a model -- compute locally & compare both values from both sources (./models/README.md & Hugging Face) for addition peace of mind / security.

  5. ./Makefile doesn't seem to support all models, only those it is explicitly aware of, e.g. it doesn't know about large-v3-q5_0 or small.en-tdrz, so it fails with make: *** No rule to make target large-v3-q5_0'. Stop.So the list of models in./Makefileshould be keep somewhat in sync with e.g../models/download-ggml-model.sh` (see point 1)? Maybe a single source of truth would be nice?

@mrienstra mrienstra changed the title Keeping model lists in sync, etc. Keeping model lists in sync, clarify make behavior (docs) Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant