Skip to content

CNN model which transcribes sentences without audio, just by lip reading

Notifications You must be signed in to change notification settings

rahul-m-patel/Lip_Reading_Project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Transcribing Using Lip Reading

Problem Statement

The goal of this project is to predict sentences through lip reading, utilizing the GRID dataset.

demo.mp4

Project Overview

1. Frame Cropping to Focus on the Mouth Region

  • Given that our dataset comprises stationary, front-facing camera videos with minimal movement, we opted for manual frame cropping.

  • We also verified our results with a pretrained dlib model can dynamically crop the mouth region and observed similar results.

  • Example of cropped frames:

    Cropped Frames

2. Model Training

  • The model architecture is inspired from LipNet, focusing on sentence prediction via lip reading.

  • To train the model, run the following command in your terminal:

    python lipreading.py
  • After training, the model is saved in /results as checkpoint.pth, and convergence plots are saved in the root folder as convergence_plots.png.

    Convergence Plot

3. Inference

  • Predictions on five videos:

    Prediction 1

About

CNN model which transcribes sentences without audio, just by lip reading

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.8%
  • Shell 3.2%