This research aims to enhance video subtitle alignment and segmentation for better accessibility and viewing experiences. Key objectives include:
-
Utilizing fine tuned Whisper models for translating speech to text.
-
Implementing text segmentation techniques with state-of-the-art language models for generating refined subtitles of reasonable length.
-
Creating a robust methodology to validate caption quality across content types.
-
Aligning subtitles appropriately to speech without altering original video timestamps.
The app_transcribe.py file includes code for implemeting the GUI.
Use the following command to run the app:
streamlit run app_transcribe.py
.
Here is a snapshot of the interface, downloading audio and video from a YouTube link given as input and generating Improved SRT file for Subtitles:
Custom segmentation helps to refine the captions, ensuring accurate and well-structured subtitles. The refined SRT files are embedded into the video using FFmpeg, creating a captioned video output. With options to preview the video and download the SRT file, the pipeline offers a complete solution for showcasing the workflow and results in an interactive demo.