This repository holds examplary code on how to extract text from video or audio files.
Start by cloning the Git repsitory:
git clone https://github.com/lena-will/video-to-text.git
The input to the model is a wav file. To transcripe text from video, first extract the audio into a wav file.
For the transcription, OpenAI's Whisper model is used. A version of the model is publicly available on HuggingFace: https://huggingface.co/openai/whisper-large-v3.