-
Hello there.. Looking at the signature of the transcribe() method, I can see that it supports ndarrays. Base: I get audio as bytes (audio_bytes) What I have tried ?
2 was described in this thread
Error 1: RuntimeError: "reflection_pad1d" not implemented for 'Byte' Would really appreciate a helping hand |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
The comment to failed to take into account of that there is preprocessing done by ffmpeg in def load_audio(file: (str, bytes), sr: int = 16000):
"""
Open an audio file and read as mono waveform, resampling as necessary
Parameters
----------
file: (str, bytes)
The audio file to open or bytes of audio file
sr: int
The sample rate to resample the audio if necessary
Returns
-------
A NumPy array containing the audio waveform, in float32 dtype.
"""
if isinstance(file, bytes):
inp = file
file = 'pipe:'
else:
inp = None
try:
# This launches a subprocess to decode audio while down-mixing and resampling as necessary.
# Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
out, _ = (
ffmpeg.input(file, threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)
.run(cmd="ffmpeg", capture_stdout=True, capture_stderr=True, input=inp)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0 To use it: # audio_bytes are the bytes of the audio file
mel = whisper.log_mel_spectrogram(load_audio(audio_bytes )) |
Beta Was this translation helpful? Give feedback.
-
how can we do this from just an ndarray? I have the same issue but using a numpy array of audio signal not a bytes object |
Beta Was this translation helpful? Give feedback.
The comment to failed to take into account of that there is preprocessing done by ffmpeg in
load_audio()
. So it shouldn't be the bytes of the audio file but bytes from the output of ffmpeg.Here's a modified version of load_audio that should work with bytes of the audio file directly.