Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suppress or remove annoying print statement #40

Closed
BBC-Esq opened this issue Feb 25, 2024 · 6 comments
Closed

suppress or remove annoying print statement #40

BBC-Esq opened this issue Feb 25, 2024 · 6 comments

Comments

@BBC-Esq
Copy link

BBC-Esq commented Feb 25, 2024

Can we please please have a way to remove this message...Every time I run the program from my python script it checks for ffmpeg, which is fine, but i wish there was a way to remove or temporarily suppress it. I have important messages printed to the command prompt when my program runs and this clutters it up...

Also, is there a way to REMOVE FFMPEG requirement entirely? For example, the pyav library includes it when you pip install that library.

https://pypi.org/project/av/

This is why the faster-whisper library uses it. See here:
image

https://github.com/SYSTRAN/faster-whisper

Anyways, here is the print that's annoying me:

ffmpeg version 6.1.1-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil      58. 29.100 / 58. 29.100
libavcodec     60. 31.102 / 60. 31.102
libavformat    60. 16.100 / 60. 16.100
libavdevice    60.  3.100 / 60.  3.100
libavfilter     9. 12.100 /  9. 12.100
libswscale      7.  5.100 /  7.  5.100
libswresample   4. 12.100 /  4. 12.100
libpostproc    57.  3.100 / 57.  3.100
@shashikg
Copy link
Owner

In future all the print statements will get replaced by a logger (so that users can set the logging level as per their need).

I will suppress the above FFMPEG log in next PR. No plan to remove FFMPEG, direct calls to FFMPEG is clean and faster than most of the wrappers like ffmpeg_python (open-whisper uses it) or PyAV. Direct calls also make it super-easy to run the resampling command in background in a separate thread.

@BBC-Esq
Copy link
Author

BBC-Esq commented Feb 25, 2024

Just to play devil's advocate, don't you think that it introduces unneeded complexity for users of various platforms though? I mean...when I started learning about programming I didn't even know what a computer system "path" was let alone how to install ffmpeg and add it to the path. Plus, there's a lot of different platforms out there and different installation procedures for each one. With something like pyav you simply pip install the library...How much of a slowdown are we discussing versus directly calling ffmpeg?

Anyways, if you're curious, here's how I did the resampling with pyav in my script:

    def convert_to_wav(self, audio_file):
        output_file = Path(audio_file).stem + "_converted.wav"
        output_path = Path(__file__).parent / output_file
        
        container = av.open(audio_file)
        stream = next(s for s in container.streams if s.type == 'audio')
        
        resampler = av.AudioResampler(
            format='s16',
            layout='mono',
            rate=16000,
        )
        
        output_container = av.open(str(output_path), mode='w')
        output_stream = output_container.add_stream('pcm_s16le', rate=16000)
        output_stream.layout = 'mono'
        
        for frame in container.decode(audio=0):
            frame.pts = None
            resampled_frames = resampler.resample(frame)
            if resampled_frames is not None:
                for resampled_frame in resampled_frames:
                    for packet in output_stream.encode(resampled_frame):
                        output_container.mux(packet)
        
        for packet in output_stream.encode(None):
            output_container.mux(packet)
        
        output_container.close()
        
        return str(output_path)

I couldn't get the resampling to work automatically using whisperS2T so that's why I had to add pyav...Not sure if I did it wrong though.

In another script of mine, resampling is avoided by directly sampling into 16000 mono from the beginning. This pertains to a voice recorder functionality though, not an audio file that's who knows what the original sample rate is...

import os
import gc
import torch
import pyaudio
import wave
import tempfile
from pathlib import Path
import whisper_s2t
from PySide6.QtCore import QThread, Signal
from utilities import my_cprint

class TranscriptionThread(QThread):
    transcription_complete = Signal(str)

    def __init__(self, audio_file, voice_recorder):
        super().__init__()
        self.audio_file = audio_file
        self.voice_recorder = voice_recorder

    def run(self):
        device = "cpu"
        compute_type = "float32"
        model_identifier = "ctranslate2-4you/whisper-small.en-ct2-float32"
        cpu_threads = max(4, os.cpu_count() - 4)
        model_kwargs = {
            'compute_type': compute_type,
            'model_identifier': model_identifier,
            'backend': 'CTranslate2',
            "device": device,
            "cpu_threads": cpu_threads,
        }
        self.model = whisper_s2t.load_model(**model_kwargs)

        out = self.model.transcribe_with_vad([self.audio_file],
                                             lang_codes=['en'],
                                             tasks=['transcribe'],
                                             initial_prompts=[None],
                                             batch_size=16)

        transcription_text = " ".join([_['text'] for _ in out[0]]).strip()

        my_cprint("Transcription completed.", 'white')
        self.transcription_complete.emit(transcription_text)
        Path(self.audio_file).unlink()
        self.voice_recorder.ReleaseTranscriber()

class RecordingThread(QThread):
    def __init__(self, voice_recorder):
        super().__init__()
        self.voice_recorder = voice_recorder

    def run(self):
        self.voice_recorder.record_audio()

class VoiceRecorder:
    def __init__(self, gui_instance, format=pyaudio.paInt16, channels=1, rate=16000, chunk=1024):
        self.gui_instance = gui_instance
        self.format, self.channels, self.rate, self.chunk = format, channels, rate, chunk
        self.is_recording, self.frames = False, []
        self.recording_thread = None
        self.transcription_thread = None

    def record_audio(self):
        p = pyaudio.PyAudio()
        stream = p.open(format=self.format, channels=self.channels, rate=self.rate, input=True, frames_per_buffer=self.chunk)
        self.frames = []
        while self.is_recording:
            data = stream.read(self.chunk, exception_on_overflow=False)
            self.frames.append(data)
        stream.stop_stream()
        stream.close()
        p.terminate()

    def save_audio(self):
        self.is_recording = False
        temp_file = Path(tempfile.mktemp(suffix=".wav"))
        with wave.open(str(temp_file), "wb") as wf:
            wf.setnchannels(self.channels)
            wf.setsampwidth(pyaudio.PyAudio().get_sample_size(self.format))
            wf.setframerate(self.rate)
            wf.writeframes(b"".join(self.frames))
        self.frames.clear()

        self.transcription_thread = TranscriptionThread(str(temp_file), self)
        self.transcription_thread.transcription_complete.connect(self.gui_instance.update_transcription)
        self.transcription_thread.start()

    def start_recording(self):
        if not self.is_recording:
            self.is_recording = True
            self.recording_thread = RecordingThread(self)
            self.recording_thread.start()

    def stop_recording(self):
        self.is_recording = False
        if self.recording_thread is not None:
            self.recording_thread.wait()
            self.save_audio()

    def ReleaseTranscriber(self):
        if hasattr(self, 'model'):
            if hasattr(self.model, 'model'):
                del self.model.model
            if hasattr(self.model, 'feature_extractor'):
                del self.model.feature_extractor
            if hasattr(self.model, 'hf_tokenizer'):
                del self.model.hf_tokenizer
            del self.model
        
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        gc.collect()
        my_cprint("Whisper model removed from memory.", 'red')

@shashikg
Copy link
Owner

shashikg commented Feb 25, 2024

How much of a slowdown are we discussing versus directly calling ffmpeg?

Depends on the file size for example say if resampling takes 5 secs for some 1 hour files. And you have 20 such files in a request. Overall reduction will be ~ 5*19 secs. Because WhisperS2T runs ffmpeg cmd in a separate thread > it resamples the first audio file and send it for transcription and > in parallel resampling of other audio files are done in the background. Same thing can be done with PyAV (not sure though -- depends if PyAV interface is blocking or non-blocking).

Plus, there's a lot of different platforms out there and different installation procedures for each one. I couldn't get the resampling to work automatically using whisperS2T

Weird.. what system you are using? What's the exact issue?

@BBC-Esq
Copy link
Author

BBC-Esq commented Feb 25, 2024

Thanks for the explanation, makes sense and is interesting to know...Every little bit helps I suppose when you're talking about improving speed overall. If I have time today I'll try to revert my script to what it was when I encountered the error - i.e. before I implemented pyav...but unfortunately it's difficult because I don't use any "versioning" workflow. I'm just using Notepad++...don't laugh. ;-) I do know that this script wasn't automatically resampling and was giving an error that it couldn't process the audio file without it being resampled first. ALSO THIS MAY HAVE BEEN with a different file than the Sam Altman. I'm testing it on .mp3, .wma, .flac, and .wav. Hope that helps...

@shashikg
Copy link
Owner

@BBC-Esq removed redundant ffmpeg logs: 33e305f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants