Refactor main.py, add utils.py, and update README

abhirooptalasila · Jan 11, 2022 · 542eda0 · 542eda0
1 parent 33fe2bc
commit 542eda0
Show file tree

Hide file tree

Showing 7 changed files with 190 additions and 231 deletions.
diff --git a/README.md b/README.md
@@ -1,124 +1,107 @@
 # AutoSub
 
-- [About](#about)
-- [Motivation](#motivation)
-- [Installation](#installation)
-- [Docker](#docker)
-- [How-to example](#how-to-example)
-- [How it works](#how-it-works)
-- [TO-DO](#to-do)
-- [Contributing](#contributing)
-- [References](#references)
+- [AutoSub](#autosub)
+ - [About](#about)
+ - [Installation](#installation)
+ - [Docker](#docker)
+ - [How-to example](#how-to-example)
+ - [How it works](#how-it-works)
+ - [Motivation](#motivation)
+ - [Contributing](#contributing)
+ - [References](#references)
 
 ## About
 
 AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using [Mozilla DeepSpeech](https://github.com/mozilla/DeepSpeech). I use the DeepSpeech Python API to run inference on audio segments and [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) to split the initial audio on silent segments, producing multiple small files.
 
 ⭐ Featured in [DeepSpeech Examples](https://github.com/mozilla/DeepSpeech-examples) by Mozilla
 
-## Motivation
-
-In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it. 
-
-
 ## Installation
 
-* Clone the repo. All further steps should be performed while in the `AutoSub/` directory
+* Clone the repo
  ```bash
  $ git clone https://github.com/abhirooptalasila/AutoSub
  $ cd AutoSub
  ```
-* Create a pip virtual environment to install the required packages
+* Create a virtual environment to install the required packages. All further steps should be performed while in the `AutoSub/` directory
  ```bash
- $ python3 -m venv sub
+ $ python3 -m pip install --user virtualenv
+ $ virtualenv sub
  $ source sub/bin/activate
- $ pip3 install -r requirements.txt
  ```
-* Download the model and scorer files from DeepSpeech repo. The scorer file is optional, but it greatly improves inference results.
+* Use the corresponding requirements file depending on whether you have a GPU or not. Make sure you have the appropriate [CUDA](https://deepspeech.readthedocs.io/en/v0.9.3/USING.html#cuda-dependency-inference) version
  ```bash
- # Model file (~190 MB)
- $ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
- # Scorer file (~950 MB)
- $ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
+ $ pip3 install -r requirements.txt
+ OR
+ $ pip3 install -r requirements-gpu.txt
  ```
-* Create two folders `audio/` and `output/` to store audio segments and final SRT and VTT file
+* Use `getmodels.sh` to download the model and scorer files with the version number as argument
  ```bash
- $ mkdir audio output
+ $ ./getmodels.sh 0.9.3
  ```
-* Install FFMPEG. If you're running Ubuntu, this should work fine.
+* Install FFMPEG. If you're on Ubuntu, this should work fine
  ```bash
  $ sudo apt-get install ffmpeg
  $ ffmpeg -version # I'm running 4.1.4
  ```
-
-* [OPTIONAL] If you would like the subtitles to be generated faster, you can use the GPU package instead. Make sure to install the appropriate [CUDA](https://deepspeech.readthedocs.io/en/v0.9.3/USING.html#cuda-dependency-inference) version. 
- ```bash
- $ source sub/bin/activate
- $ pip3 install deepspeech-gpu
- ```
 
 
 ## Docker
 
-* Installation using Docker is pretty straight-forward. 
- * First start by downloading training models by specifying which version you want:
- * if you have your own, then skip this step and just ensure they are placed in project directory with .pbmm and .scorer extensions
+* If you don't have the model files, get them
  ```bash
- $ ./getmodel.sh 0.9.3
+ $ ./getmodels.sh 0.9.3
  ```
-
- * Then for a CPU build, run: 
+* For a CPU build
  ```bash
  $ docker build -t autosub .
  $ docker run --volume=`pwd`/input:/input --name autosub autosub --file /input/video.mp4
  $ docker cp autosub:/output/ .
  ```
-
- * For a GPU build that is reusable (saving time on instantiating the program):
+* For a GPU build that is reusable (saving time on instantiating the program)
  ```bash
  $ docker build --build-arg BASEIMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --build-arg DEPSLIST=requirements-gpu.txt -t autosub-base . && \
  docker run --gpus all --name autosub-base autosub-base --dry-run || \
  docker commit --change 'CMD []' autosub-base autosub-instance
  ```
- * Then
+* Finally
  ```bash
- $ docker run --volume=`pwd`/input:/input --name autosub autosub-instance --file video.mp4
+ $ docker run --volume=`pwd`/input:/input --name autosub autosub-instance --file ~/video.mp4
  $ docker cp autosub:/output/ .
  ```
 
 ## How-to example
 
-* Make sure the model and scorer files are in the root directory. They are automatically loaded
-* After following the installation instructions, you can run `autosub/main.py` as given below. The `--file` argument is the video file for which SRT file is to be generated
+* The model files should be in the repo root directory and will be loaded automatically. But incase you have multiple versions, use the `--model` and `--scorer` args while executing
+* After following the installation instructions, you can run `autosub/main.py` as given below. The `--file` argument is the video file for which subtitles are to be generated
  ```bash
  $ python3 autosub/main.py --file ~/movie.mp4
  ```
 * After the script finishes, the SRT file is saved in `output/`
-* Open the video file and add this SRT file as a subtitle, or you can just drag and drop in VLC.
 * The optional `--split-duration` argument allows customization of the maximum number of seconds any given subtitle is displayed for. The default is 5 seconds
  ```bash
  $ python3 autosub/main.py --file ~/movie.mp4 --split-duration 8
  ```
-* By default, AutoSub outputs in a number of formats. To only produce the file formats you want use the `--format` argument:
+* By default, AutoSub outputs SRT, VTT and TXT files. To only produce the file formats you want, use the `--format` argument
  ```bash
  $ python3 autosub/main.py --file ~/movie.mp4 --format srt txt
  ```
+* Open the video file and add this SRT file as a subtitle. You can just drag and drop in VLC.
+
 
 
 ## How it works
 
-Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you first run the script, I use FFMPEG to **extract the audio** from the video and save it in `audio/`. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate. 
+Mozilla DeepSpeech is an open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you run the script, I use FFMPEG to **extract the audio** from the video and save it in `audio/`. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate. 
 
-Then, I use [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in `autosub/featureExtraction.py` and `autosub/trainAudio.py` All these audio files are stored in `audio/`. Then for each audio segment, I perform DeepSpeech inference on it, and write the inferred text in a SRT file. After all files are processed, the final SRT file is stored in `output/`.
+Then, I use [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in `autosub/featureExtraction.py` and `autosub/trainAudio.py`. All these audio files are stored in `audio/`. Then for each audio segment, I perform DeepSpeech inference on it, and write the inferred text in a SRT file. After all files are processed, the final SRT file is stored in `output/`.
 
-When I tested the script on my laptop, it took about **40 minutes to generate the SRT file for a 70 minutes video file**. My config is an i5 dual-core @ 2.5 Ghz and 8 gigs of RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file. 
+When I tested the script on my laptop, it took about **40 minutes to generate the SRT file for a 70 minutes video file**. My config is an i5 dual-core @ 2.5 Ghz and 8GB RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file. 
 
 
-## TO-DO
+## Motivation
 
-* Pre-process inferred text before writing to file (prettify)
-* Add progress bar to `extract_audio()`
-* GUI support (?)
+In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it. 
 
 
 ## Contributing