To build a neural network to recognise five different gestures to control a smart TV (with webcam) without using a remote. The gestures are as follows:
- Thumbs up: Increase the volume
- Thumbs down: Decrease the volume
- Left swipe: 'Jump' backwards 10 seconds
- Right swipe: 'Jump' forward 10 seconds
- Stop: Pause the movie
The training data consists of several hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames. These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use.
- Tensorflow 2.11.0
- OpenCV 4.7.0
- Matplotlib 2.5.3
Nvidia A100 Tensor Core GPU
- CUDA Version: 12.0
- Driver: NVIDIA-SMI 525.85.12
- GPU RAM: 40 GB
We would like to express our deepest appreciation to Rui Hou, Chen Chen & Mubarak Shah for their research paper: An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos.
- Sachin Shekhar
- Ashish Kulkarni
- Tejashwini Junjoor