Real time facial expression recognition for webcam application

Real time Demo (using one RGB camera)

Frameworks

Face Dectection is accomplished by mediapipe developed by Google.
Facial Expression Recognition is trained using DCNN (Deep Convolutional Neural Network) with FER+ dataset which is held by Microsoft.

(Thanks to @zc of the image)

Language & Dependencies

Language: python3.6
Dependencies:
- pytorch
- opencv-python
- mediapipe (modified)
- CUDA10.1 (optional)
- ...
you may install all the dependencies via command python -m pip install -r requirements.txt

Details

Usage:
- Run from prebuilt exe: see release-windows-v0.1 (Recommended)
- Run from source:
  - Replace drawing_utils.py in mediapipe with src/drawing_utils.py in which I slightly modified.
  - Contact me by Email to get the trained model.
```
python camdemo.py --camera 0
```
Performance:
- Absolutely REALTIME! The model could achieve above the average 60 FPS on a plain PC. If possible, try using a GPU to gain better performance!
- My poor computer: Intel i7-7700K CPU (4.2GHz) with NVIDIA Quadro P2000 (5G memory)
Model Structure: the model is quite simple though. It uses ResNet50 as backbone for feature extraction after which it is stacked with two fully connected layer. The output is a 10-size digits vector corresponding to 10 emotion classes.

Accuracy: the model achieves 79.8% accuracy evaluated by FER+ valid subset after 14 epochs of training using softCE loss.

epoch	KLdiv	softCE	weightedSoftCE
0	0.005	0.005	0.005
1	0.55	0.598	0.56
2	0.58	0.652	0.668
3	*	0.695	0.697
4	*	0.726	0.71
5	*	0.753	0.68
6	*	0.76	0.665
...	...	...	...
14	*	0.798	0.742

Loss function:
- Rather than original FER, each image in FER+ has been labeled by 10 crowd-sourced taggers but the default implementation of cross-entropy in pytorch uses just one hard label to compute the loss which abandons the information of 10 soft labels. So I implemented the soft cross-entropy to train the model fitting the probability distribution of emotion class which got pretty good results.
- One more reason to use softCE loss is that for emotion classification, some human emotions cannot be distinguished well such as happiness and surprise.
- As FER+ is a very imbalanced dataset (see image below) so I've tried use weightedSoftCE( like the idea of focal loss) but no good which I don't quite get it yet. If you happen to know why, tell me! Also when using weightedSoftCE during training, I found the loss rising upside and down a lot, which means it's not that numerical stable.

Expressions	neutral	happiness	surprise	sadness	anger	disgust	fear	contempt	unknown	NF
index	0	1	2	3	4	5	6	7	8	9

Potential applications

Online education for children, which could be used to identify whether children listen carefully; For on-site meeting or school classroom, to judge the quality of the speech.

On-site Human–Machine Interaction.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
camdemo.py		camdemo.py
campose.py		campose.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real time facial expression recognition for webcam application

Real time Demo (using one RGB camera)

Frameworks

Language & Dependencies

Details

Potential applications

About

Releases

Packages

Languages

License

Zju-George/RealtimeFER

Folders and files

Latest commit

History

Repository files navigation

Real time facial expression recognition for webcam application

Real time Demo (using one RGB camera)

Frameworks

Language & Dependencies

Details

Potential applications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages