GitHub - Incorrectone/Sign-Language-to-Speech: A Real-Time Sign Language to Speech Translation System using Deep Learning

📹 Sign Language to Speech — Real-Time System

A real-time Indian Sign Language (ISL) to Speech system that combines YOLO, MediaPipe, and a custom CNN+LSTM model to translate sign language videos into spoken words.

Project Overview

This project enables real-time sign language recognition from webcam video and converts it to spoken text in the browser.

Key Highlights:

Real-time webcam capture in the browser.
Landmark extraction with MediaPipe Holistic.
YOLOv11 for robust person detection and frame cropping during training.
Sequence modeling with a hybrid CNN + LSTM PyTorch model.
Interactive web interface with WebSocket-based streaming.
Automatic speech output using browser Text-to-Speech (TTS).

⚙️ Training Pipeline

How the model was built

Collect Videos:

Raw videos of sign language gestures.

Extract Frames:

Videos are split into frame sequences for processing.

YOLOv11 Detection:

Each frame is passed through YOLOv11 to detect and crop the region containing the signer.
This improves landmark extraction accuracy by focusing only on the signer.

Extract Landmarks:

MediaPipe Holistic is used to extract:
- Pose landmarks
- Face landmarks
- Left & right hand landmarks

Masking:

Binary masks track which landmarks are present/missing per frame.
This helps the model learn variable-length, partially visible features robustly.

CNN + LSTM Model:

CNN layers learn spatial features from the landmark sequences.
LSTM layers capture temporal dependencies across frames.
The final model classifies the sign gesture into one of the predefined sign classes.

🌐 Web App Inference Pipeline

How real-time recognition works

User starts the camera from the browser.
Frames are processed client-side with MediaPipe Holistic to draw pose, face, and hand landmarks for live feedback.
The raw frame is also sent over WebSocket to the FastAPI backend.
The server:

Optionally runs YOLO crop (can be skipped).
Runs MediaPipe Holistic again on the cropped/received frame.
Maintains a rolling buffer (deque) of landmark sequences. 5. When the user presses s:
The buffer is RESET and inference is STARTED.
When enough frames are collected, the server feeds the landmark sequence through the CNN + LSTM model. 6. The predicted sign word is sent back to the browser in real-time. 7. The browser displays the word and can speak it aloud using the Web Speech API.

How to Run

Install Python dependencies

conda create -n isl-speech python=3.9
conda activate isl-speech
pip install -r requirements.txt

Place models

YOLO weights (yolo11n.pt) in models/
Trained PyTorch model (best_web_model.pth) in models/

Start the FastAPI server

uvicorn web_app.app_ws:app --reload

Open the Web App

Navigate to http://localhost:8000
Click Start Camera
Use s to start inference, r to reset buffer

Controls

Key	Action
`s`	Reset buffer and start inference
`r`	Reset buffer only
Play Translation	Click the Speak Translation button to hear the translated sign

📂 Project Structure

web_app/
 ├── static/
 │    └── index.html         # Frontend HTML
 ├── app_ws.py               # FastAPI server with WebSocket
 ├── model_def.py            # PyTorch CNN + LSTM model definition
 └── models/
       ├── yolo11n.pt
       └── best_web_model.pth

👥 Credits

Sainava Modak
Kartik Rajput

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
inference		inference
models		models
notebooks		notebooks
scripts		scripts
src		src
web_app		web_app
.gitignore		.gitignore
CHALLENGES_AND_FUTURE.md		CHALLENGES_AND_FUTURE.md
README.md		README.md
WORKFLOW_DIAGRAM.md		WORKFLOW_DIAGRAM.md
optuna_tune_summary.csv		optuna_tune_summary.csv
requirements.txt		requirements.txt
training_history.csv		training_history.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📹 Sign Language to Speech — Real-Time System

Project Overview

⚙️ Training Pipeline

🌐 Web App Inference Pipeline

How to Run

Controls

📂 Project Structure

👥 Credits

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Incorrectone/Sign-Language-to-Speech

Folders and files

Latest commit

History

Repository files navigation

📹 Sign Language to Speech — Real-Time System

Project Overview

⚙️ Training Pipeline

🌐 Web App Inference Pipeline

How to Run

Controls

📂 Project Structure

👥 Credits

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages