-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python bindings (C-style API) #9
Comments
Some work around: Buildingmain: ggml.o main.o
g++ -L ggml.o -c -fPIC main.cpp -o main.o
g++ -L ggml.o -shared -Wl,-soname,main.so -o main.so main.o ggml.o
g++ -pthread -o main ggml.o main.o
./main -h
ggml.o: ggml.c ggml.h
gcc -O3 -mavx -mavx2 -mfma -mf16c -c -fPIC ggml.c -o ggml.o
gcc -shared -Wl,-soname,ggml.so -o ggml.so ggml.o
main.o: main.cpp ggml.h
g++ -pthread -O3 -std=c++11 -c main.cpp Run mainimport ctypes
import pathlib
if __name__ == "__main__":
# Load the shared library into ctypes
libname = pathlib.Path().absolute() / "main.so"
whisper = ctypes.CDLL(libname)
whisper.main.restype = None
whisper.main.argtypes = ctypes.c_int, ctypes.POINTER(ctypes.c_char_p)
args = (ctypes.c_char_p * 9)(
b"-nt",
b"--language", b"ru",
b"-t", b"8",
b"-m", b"../models/ggml-model-tiny.bin",
b"-f", b"../audio/cuker1.wav"
)
whisper.main(len(args), args) And its works! |
But with specific functions it is already more difficult:
It might be worth considering running python and c++ in different threads/processes and sharing information between them, when its needed. |
Thank you very much for your interest in the project! I think we first need a proper C-style wrapper of the model loading / encode and decode functionality / sampling strategies. After that we will easily create python and other language bindings. I've done similar work in my 'ggwave' project. I agree that the encode and decode functionality should be exposed through the API as you suggested. It would give more flexibility to the users of the library/bindings. |
@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch? |
The initial API is now available on https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h The first part allows more fine-grained control over the inference and also allows the user to implement their own sampling strategy using the predicted probabilities for each token. The second part of the API includes methods for full inference - you simply provide the audio samples and choose the sampling parameters. Most likely the API will change with time, but this is a good starting point. |
This is as far as I got trying to get the API working in Python. It loads the model successfully, but gets a segmentation fault on whisper_full. Any ideas? import ctypes
import pathlib
if __name__ == "__main__":
libname = pathlib.Path().absolute() / "whisper.so"
whisper = ctypes.CDLL(libname)
modelpath = b"models/ggml-medium.bin"
model = whisper.whisper_init(modelpath)
params = whisper.whisper_full_default_params(b"WHISPER_DECODE_GREEDY")
w = open('samples/jfk.wav', "rb").read()
result = whisper.whisper_full(model, params, w, b"16000")
# Segmentation fault
Edit - Got some debugging info from gdb but it didn't help much: |
Here is one way to achieve this: # build shared libwhisper.so
gcc -O3 -std=c11 -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so Use it from Python like this: import ctypes
import pathlib
# this is needed to read the WAV file properly
from scipy.io import wavfile
libname = "libwhisper.so"
fname_model = "models/ggml-tiny.en.bin"
fname_wav = "samples/jfk.wav"
# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
_fields_ = [
("strategy", ctypes.c_int),
("n_threads", ctypes.c_int),
("offset_ms", ctypes.c_int),
("translate", ctypes.c_bool),
("no_context", ctypes.c_bool),
("print_special_tokens", ctypes.c_bool),
("print_progress", ctypes.c_bool),
("print_realtime", ctypes.c_bool),
("print_timestamps", ctypes.c_bool),
("language", ctypes.c_char_p),
("greedy", ctypes.c_int * 1),
]
if __name__ == "__main__":
# load library and model
libname = pathlib.Path().absolute() / libname
whisper = ctypes.CDLL(libname)
# tell Python what are the return types of the functions
whisper.whisper_init.restype = ctypes.c_void_p
whisper.whisper_full_default_params.restype = WhisperFullParams
whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p
# initialize whisper.cpp context
ctx = whisper.whisper_init(fname_model.encode("utf-8"))
# get default whisper parameters and adjust as needed
params = whisper.whisper_full_default_params(0)
params.print_realtime = True
params.print_progress = False
# load WAV file
samplerate, data = wavfile.read(fname_wav)
# convert to 32-bit float
data = data.astype('float32')/32768.0
# run the inference
result = whisper.whisper_full(ctypes.c_void_p(ctx), params, data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), len(data))
if result != 0:
print("Error: {}".format(result))
exit(1)
# print results from Python
print("\nResults from Python:\n")
n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
for i in range(n_segments):
t0 = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
t1 = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)
print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")
# free the memory
whisper.whisper_free(ctypes.c_void_p(ctx)) |
Thank you @ggerganov - really appreciate your work! Still getting a seg fault with your code, but I'll assume it's a me problem:
|
Got a segfault in the same place on an Intel 12th gen CPU and M1 Macbook with no changes to the above Python script. Anyone else tried it? Were you using the same codebase as master @ggerganov ? |
Yeah, the |
Could you possibly make a binding to the stream program as well? Would be super cool to be able to register a callback once user speech is done and silence/non-speech is detected so the final text can be processed within python. This would allow for some really cool speech assistant like hacks. |
You can easily modify this script to use Whisper.cpp instead of DeepSpeech. |
@pachacamac I made a hacked together fork of Buzz which uses whisper.cpp It's buggy and thrown together, but works. Just make sure you build the shared library as libwhisper.so and put it in the project directory. There's no install package, so you'll need to run main.py directly. Edit: I also made a simple stand-alone script using Whisper.cpp + Auditok (to detect voices) |
Breaking changes in the C-api in last commit: e30cf83 |
I seem to be having some trouble making a shared lib on Windows (#9 (comment) works great on UNIX). Using:
And calling from Python as: whisper_cpp = ctypes.CDLL("libwhisper.so")
# Calling any one of the functions errors
whisper_cpp.whisper_init('path/to/model.bin'.encode('utf-8'))
whisper_cpp.whisper_lang_id('en'.encode('utf-8')) I get:
|
@ggerganov thanks for all your help so far. I seem to be having an issue with the Python binding (similar to one you posted, not Windows). class WhisperFullParams(ctypes.Structure):
_fields_ = [
("strategy", ctypes.c_int),
("n_threads", ctypes.c_int),
("offset_ms", ctypes.c_int),
("translate", ctypes.c_bool),
("no_context", ctypes.c_bool),
("print_special_tokens", ctypes.c_bool),
("print_progress", ctypes.c_bool),
("print_realtime", ctypes.c_bool),
("print_timestamps", ctypes.c_bool),
("language", ctypes.c_char_p),
("greedy", ctypes.c_int * 1),
]
model_path = 'ggml-model-whisper-tiny.bin'
audio_path = './whisper.cpp/samples/jfk.wav'
libname = './whisper.cpp/libwhisper.dylib'
whisper_cpp = ctypes.CDLL(
str(pathlib.Path().absolute() / libname))
whisper_cpp.whisper_init.restype = ctypes.c_void_p
whisper_cpp.whisper_full_default_params.restype = WhisperFullParams
whisper_cpp.whisper_full_get_segment_text.restype = ctypes.c_char_p
ctx = whisper_cpp.whisper_init(model_path.encode('utf-8'))
params = whisper_cpp.whisper_full_default_params(0)
params.print_realtime = True
params.print_progress = True
samplerate, audio = wavfile.read(audio_path)
audio = audio.astype('float32')/32768.0
result = whisper_cpp.whisper_full(
ctypes.c_void_p(ctx), params, audio.ctypes.data_as(
ctypes.POINTER(ctypes.c_float)), len(audio))
if result != 0:
raise Exception(f'Error from whisper.cpp: {result}')
n_segments = whisper_cpp.whisper_full_n_segments(
ctypes.c_void_p(ctx))
print(f'n_segments: {n_segments}') Prints: whisper_model_load: loading model from 'ggml-model-whisper-tiny.bin'
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 1
whisper_model_load: mem_required = 476.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size = 73.58 MB
whisper_model_load: memory size = 11.41 MB
whisper_model_load: model size = 73.54 MB
176000, length of samples
log_mel_spectrogram: n_samples = 176000, n_len = 1100
log_mel_spectrogram: recording length: 11.000000 s
length of spectrogram is less than 1s
n_segments: 0 I added an extra log line to show that |
The |
Of course. Thanks a lot! |
@chidiwilliams |
@thakurudit Yes, it did. I use ctypesgen to generate bindings for Buzz. |
Most python bindings I found in the last week were outdated or breaking with the current API, so I made a project (https://github.com/carloscdias/whisper-cpp-python) following the same pattern in ggerganov original answer and also followed his suggestion on providing a way to automatically generate the python bindings from |
thank you!! checking it out! |
That's great! This should be added to https://github.com/ggerganov/whisper.cpp#bindings. |
Looking around at the available python bindings, none currently seem to support the latest branch of whisper.cpp with GPU acceleration for cuda or metal. Does anyone have a working version? A lot has changed in whisper.cpp, and it seems most of the python bindings are based on an older version that lacks a lot of the more recent functions. |
I´m on the same boat. I can run whisper.cpp with rocm on the cli, but I keep gettint segmentation fault or other type of crash on all wrappers I saw. It´s now whisper.cpp fault. Let´s hope someone comes up with help. |
Just to give some feedback, I wanted to try whisper.cpp because I'm using an amd rx 5700 xt with 8gb vram. I wanted to use whisper large model. I endeded using hugging face transformers and i could fit this model on the gpu. |
Disappointing there are so many unmaintained Python bindings. Update: As I needed this on CUDA, I've tried to fix it myself. Seems to work OK on Ubuntu+Nvidia GPU, but not yet on Mac. Please test on Windows and report back! |
Unmaintained is putting it mildly. Even the ones that work don't work well. Every single Python binding of the C++ implementation I've tested is significantly slower than the pure-Python version, which is mind boggling. Terrible implementations like whisper-cpp-python, which doesn't even publish its code anywhere, takes 5 minutes to transcribe a 10 second file that the pure Python implementation can handle in a few seconds using the same large model... |
@chrisspen Did you try my PR? I'm using it for a real-time LLM chatbot. Using distil-whisper, I can get Voice->text and then text->voice in a few hundred ms. |
@dnhkng Yes. The problem seems to be the C++ code. Might work fine with a gpu, but on cpu, it runs slower than pure Python on a 10 year old machine. And if C++ needs an expensive gpu to be faster than Python on a cpu, it's not good code. I'm finding faster_whisper is much more usable for Python and far more cost effective. |
Is there a streaming function in the original python/pytorch implementation ? |
can I use faster_whisper for real time transcription tasks? |
Probably not. faster_whisper is a lot faster than the pure Python implementation, but a lot slower than this C++ version. I'd only recommend faster_whisper when you want good performance but don't have a GPU needed to run whisper.cpp. |
After some struggle with the python bindings documented in the README and also trying whisper-cpp-python to no success, I landed on pywhispercpp. Might be worth adding to the list in README @ggerganov |
I agree. I tested it out and it works alright, but it doesn't have gpu acceleration yet. The maintainer said it's just a time commitment thing, which I can understand. Would love to get some python bindings from somewhere that also support gpu so I can do some more benchmarking. |
@BBC-Esq I have gotten pywhispercpp to run with gpu support. You can clone it from source and build it with CUDA support enabled, just like you do with whisper.cpp itself. I am gonna warn you directly that there are some issues with installing directly from source as you can read in my Issue over there. Here is how I currently do it: # Clone from source
git clone --branch v1.2.0 --recurse-submodules https://github.com/abdeladim-s/pywhispercpp.git
# Build a python wheel with CUDA support
# FYI: The submodule whisper.cpp in pywhispercpp is currently pinned at version 1.5.4,
# which still uses the old cmake flag for CUDA support, later versions use -DWHISPER_CUDA=1
cd pywhispercpp
CMAKE_ARGS="-DWHISPER_CUBLAS=1" python3 -m build --wheel
# Install the wheel into your python environment
python3 -m pip install dist/pywhispercpp-*.whl |
abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?) It was referenced in issue ggerganov#9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
* Update README.md Fix broken C-style API link * Update whisper_processor.py Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call. * Add pywhispercpp to the Pybind11 Python wrapper list abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?) It was referenced in issue #9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
* Update README.md Fix broken C-style API link * Update whisper_processor.py Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call. * Add pywhispercpp to the Pybind11 Python wrapper list abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?) It was referenced in issue ggerganov#9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
* Update README.md Fix broken C-style API link * Update whisper_processor.py Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call. * Add pywhispercpp to the Pybind11 Python wrapper list abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?) It was referenced in issue ggerganov#9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
* Update README.md Fix broken C-style API link * Update whisper_processor.py Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call. * Add pywhispercpp to the Pybind11 Python wrapper list abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?) It was referenced in issue ggerganov#9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
Good day everyone!
I'm thinking about bindings for Python.
So far, I'm interested in 4 functionalities:
Perhaps in the near future, I will try to take up this task. But I had no experience with python bindings. So, if there are craftsmen who can do it quickly (if it can be done quickly... 😃), that would be cool!
The text was updated successfully, but these errors were encountered: