Lip2Speech [PDF]

A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content, i.e Lip to Speech Synthesis.

Video Input	Processed Input	Speech Output

Architecture Overview

LRW

Alignment Plot	Melspectogram Output

Usage

Demo

The pretrained model is available here [265.12 MB]

Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.

python3 demo.py

Default arguments

dataset: LRW (10 Samples)
root: Datasets/SAMPLE_LRW
model_path: savedmodels/lip2speech_final.pth
encoding: voice

Evaluate

Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)

python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth

Train

To train the model, we run train.py

python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth

finetune_model_path - Use as base model to finetune to dataset. (optional)

Acknowledgement

tacotron2

Citation

If you use this software in your work, please cite it using the following metadata.

@software{Millerdurai_Lip2Speech_2021,
author = {Millerdurai, Christen and Abdel Khaliq, Lotfy and Ulrich, Timon},
month = {8},
title = {{Lip2Speech}},
url = {https://github.com/Chris10M/Lip2Speech},
version = {1.0.0},
year = {2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
Datasets/SAMPLE_LRW		Datasets/SAMPLE_LRW
datasets		datasets
images		images
model		model
speaker_encoder		speaker_encoder
train_utils		train_utils
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
Report.pdf		Report.pdf
arg_parser.py		arg_parser.py
demo.py		demo.py
evaluate.py		evaluate.py
hparams.py		hparams.py
logger.py		logger.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lip2Speech [PDF]

Architecture Overview

LRW

Usage

Demo

Default arguments

Evaluate

Train

Acknowledgement

Citation

About

Contributors 3

Languages

License

Chris10M/Lip2Speech

Folders and files

Latest commit

History

Repository files navigation

Lip2Speech [PDF]

Architecture Overview

LRW

Usage

Demo

Default arguments

Evaluate

Train

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages