Skip to content

Using ndarray as input to transcribe method #380

Answered by jianfch
ColeDrain asked this question in Q&A
Discussion options

You must be logged in to vote

The comment to failed to take into account of that there is preprocessing done by ffmpeg in load_audio(). So it shouldn't be the bytes of the audio file but bytes from the output of ffmpeg.
Here's a modified version of load_audio that should work with bytes of the audio file directly.

def load_audio(file: (str, bytes), sr: int = 16000):
    """
    Open an audio file and read as mono waveform, resampling as necessary

    Parameters
    ----------
    file: (str, bytes)
        The audio file to open or bytes of audio file

    sr: int
        The sample rate to resample the audio if necessary

    Returns
    -------
    A NumPy array containing the audio waveform, in float32 dtype.
    """

Replies: 2 comments 10 replies

Comment options

You must be logged in to vote
8 replies
@ColeDrain
Comment options

@yesha999
Comment options

@chrisgchiang
Comment options

@UmutAlihan
Comment options

@Shivansh-yadav13
Comment options

Answer selected by ColeDrain
Comment options

You must be logged in to vote
2 replies
@brian316
Comment options

@elpidiovaldez
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
8 participants