Voyage | Visual Speech to text

Introduction

Visual speech recognition,is model created in Hackathon also known as lip-reading, relies on lip movements to recognise speech without relying on the audio stream. This is particularly useful in noisy environments where the audio signal is corrupted; Automatic lip reading aims to recognise the speech content by watching videos. It has lots of potential applications in both noisy and silent environments and it will help deaf and dumb people to communicate as this system works both ways it has also module which can convert sign language in text with accuracy of 96.5% in visual model.


0. Input Image(mp4 or Live Camera)	1. Output

What it does

This is a system which predicts what a person is saying on the basis of lip movement of the person,through which one can get an idea what a person is saying and also convert sign language into text.

Inspiration

In this world which is full of noise, song, speeches, slogans there are few who can't listen to these and this is my small contribution which can make their life easier so that they can easily communicate in this world.

How we built it

This system is build in 2 parts

1. Preprocessing

We took all the video and preprocess in 4 steps given below all the videos are from Dataset - LRW which has more than 50,000 video to splitted in train,test and validation which sum up to 500 words of various spekers and after compliting these 4 steps each video it converted into npz file which of numpy arrays to make computation faster:-


0. Original	1. Detection	2. Transformation	3. Mouth ROIs

2. Training

To train this model we use articture model where all videos are passed from 3D-CNN after 3DCNN where afterwords it passed into 18 layer of resnet and before passing it to softmax we pushed it to TCN.

How to get model

Dowload file from this link and place in training folder - GoogleDrive

How to install environment

Clone the repository into a directory

git clone https://github.com/chiragarora01/Visual-Speech-To-Text-Team_Voyage.git

Install all required packages.

pip install -r requirements.txt

How to Run

To run this script you must have CUDA + CuDnn installed with minimum of 8Gb Ram and a GPU

CUDA_VISIBLE_DEVICES=0 python main.py

Accomplishments that we're proud of

It take video input,place landmarks on it and predict what it is saying
It can do this same by live fead.

What's next for Visual Speech to Text

To make it a commericaly viable product we need to make some sort of IOT device or to make a system where it take feed from one's device ,pass it to cloud,do computation their and return the desired output as this software need high computation power

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
labels		labels
lipreading		lipreading
preprocessing		preprocessing
screenshot		screenshot
test		test
training		training
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voyage | Visual Speech to text

Introduction

What it does

Inspiration

How we built it

1. Preprocessing

2. Training

How to get model

How to install environment

How to Run

Accomplishments that we're proud of

What's next for Visual Speech to Text

About

Releases

Packages

Languages

chiragarora01/Visual-Speech-To-Text-Team_Voyage

Folders and files

Latest commit

History

Repository files navigation

Voyage | Visual Speech to text

Introduction

What it does

Inspiration

How we built it

1. Preprocessing

2. Training

How to get model

How to install environment

How to Run

Accomplishments that we're proud of

What's next for Visual Speech to Text

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages