libfaceid, a Face Recognition library for everybody

FaceRecognition Made Easy. libfaceid is a Python library for facial recognition that seamlessly integrates multiple face detection and face recognition models.

From Zero to Hero. Learn the basics of Face Recognition and experiment with different models. libfaceid enables beginners to learn various models and simplifies prototyping of facial recognition solutions by providing a comprehensive list of models to choose from. Multiple models for detection and encoding/embedding including classification models are supported from the basic models (Haar Cascades + LBPH) to the more advanced models (MTCNN + FaceNet). The models are seamlessly integrated so that user can mix and match models. Each detector model has been made compatible with each embedding model to abstract you from the differences. Each model differs in speed, accuracy, memory requirements and 3rd-party library dependencies. This enables users to easily experiment with various solutions appropriate for their specific use cases and system requirements.

Awesome Design. The library is designed so that it is easy to use, modular and robust. Selection of model is done via the constructors while the expose function is simply detect() or estimate() making usage very easy. The files are organized into modules so it is very intuitive to understand and debug. The robust design allows supporting new models in the future to be very straightforward.

Have Some Fun. The library contains models for predicting your age, gender, emotion and facial landmarks. It also contains text-to-speech synthesizer to generate audio file for each person in the image dataset to enable system to play the generated audio to greet you after recognizing your face. Voice-enabled face recognition. How cool is that? Web app is supported for some test applications using Flask so you would be able to view the video capture remotely on another computer in the same network via a web browser.

News:

Date	Milestones
2018, Dec 19	Integrated Google's Tacotron text-to-speech synthesizer (implementation by keithito)
2018, Dec 13	Integrated Google's FaceNet face embedding (implementation by David Sandberg)
2018, Nov 30	Committed libfaceid to Github

Background:

With Apple incorporating face recognition technology in iPhone X last year, 2017 and with China implementing nation-wide wide-spread surveillance for social credit system in a grand scale, Face Recognition has become one of the most popular technologies where Deep Learning is used. Face recognition is used for identity authentication, access control, passport verification in airports, law enforcement, forensic investigations, social media platforms, disease diagnosis, police surveillance, casino watchlists and many more.

Modern state of the art Face Recognition solutions leverages graphics processor technologies, GPU, which has dramatically improved over the decades. (In particular, Nvidia released the CUDA framework which allowed C and C++ applications to utilize the GPU for massive parallel computing.) It utilizes Deep Learning (aka Neural Networks) which requires GPU power to perform massive compute operations in parallel. Deep Learning is one approach to Artificial Intelligence that simulates how the brain functions by teaching software through examples, several examples (big data), instead of harcoding the logic rules and decision trees in the software. (One important contribution in Deep Learning is the creation of ImageNet dataset. It pioneered the creation of millions of images, a big data collection of images that were labelled and classified to teach computer for image classifications.) Neural networks are basically layers of nodes where each nodes are connected to nodes in the next layer feeding information. Deepnets are very deep neural networks with several layers made possible using GPU compute power. Many neural networks topologies exists such as Convolutional Neural Networks (CNN) architecture which particulary applies to Computer Vision, from image classification to face recognition.

Introduction:

A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. At a minimum, a simple real-time facial recognition system is composed of the following pipeline:

Face Enrollment. Registering faces to a database which includes pre-computing the face embeddings and training a classifier on top of the face embeddings of registered individuals.
Face Capture. Reading a frame image from a camera source.
Face Detection. Detecting faces in a frame image.
Face Encoding/Embedding. Generating a mathematical representation of each face (coined as embedding) in the frame image.
Face Identification. Infering each face embedding in an image with face embeddings of known people in a database.

More complex systems include features such as Face Liveness Detection (to counter spoofing attacks via photo, video or 3d mask), face alignment, face augmentation (to increase the number of dataset of images) and face verification to improve accuracy.

Problem:

libfaceid democratizes learning Face Recognition. Popular models such as FaceNet and OpenFace are not straightforward to use and don't provide easy-to-follow guidelines on how to install and setup. So far, dlib has been the best in terms of documentation and usage but it is slow on CPU and has too many abstractions (abstracts OpenCV as well). Simple models such as OpenCV is good but too basic and lacks documentation of the parameter settings, on classification algorithms and end-to-end pipeline. Pyimagesearch has been great having several tutorials with easy to understand explanations but not much emphasis on model comparisons and seems to aim to sell books so intentions to help the community are not so pure after all (I hate the fact that you need to wait for 2 marketing emails to arrive just to download the source code for the tutorials. But I love the fact that he replies to all questions in the threads). With all this said, I've learned a lot from all these resources so I'm sure you will learn a lot too.

libfaceid was created to somehow address these problems and fill-in the gaps from these resources. It seamlessly integrates multiple models for each step of the pipeline enabling anybody specially beginners in Computer Vision and Deep Learning to easily learn and experiment with a comprehensive face recognition end-to-end pipeline models. No strings attached. Once you have experimented will all the models and have chosen specific models for your specific use-case and system requirements, you can explore the more advanced models like FaceNet.

Design:

libfaceid is designed so that it is easy to use, modular and robust. Selection of model is done via the constructors while the expose function is simply detect() or estimate() making usage very easy. The files are organized into modules so it is very intuitive to understand and debug. The robust design allows supporting new models in the future to be very straightforward.

Only pretrained models will be supported. Transfer learning is the practice of applying a pretrained model (that is trained on a very large dataset) to a new dataset. It basically means that it is able to generalize models from one dataset to another when it has been trained on a very large dataset, such that it is 'experienced' enough to generalize the learnings to new environment to new datasets. It is one of the major factors in the explosion of popularity in Computer Vision, not only for face recognition but most specially for object detection. And just recently, mid-2018 this year, transfer learning has been making good advances to Natural Language Processing ( BERT by Google and ELMo by Allen Institute ). Transfer learning is really useful and it is the main goal that the community working on Reinforcement Learning wants to achieve for robotics.

Features:

Having several dataset of images per person is not possible for some use cases of Face Recognition. So finding the appropriate model for that balances accuracy and speed on target hardware platform (CPU, GPU, embedded system) is necessary. The trinity of AI is Data, Algorithms and Compute. libfaceid allows selecting each model/algorithm in the pipeline.

libfaceid library supports several models for each step of the Face Recognition pipeline. Some models are faster while some models are more accurate. You can mix and match the models for your specific use-case, hardware platform and system requirements.

Face Detection models for detecting face locations

Face Encoding models for generating face embeddings on detected faces

Classification algorithms for Face Identification using face embeddings

Naïve Bayes
Linear SVM
RVF SVM
Nearest Neighbors
Decision Tree
Random Forest
Neural Net
Adaboost
QDA

Text To Speech synthesizer models for generating audio given some text

Additional models (bonus features for PR):

Face Pose estimator models for predicting face landmarks (face landmark detection)
Face Age estimator models for predicting age (age detection)
Face Gender estimator models for predicting gender (gender detection)
Face Emotion estimator models for predicting facial expression (emotion detection)

Compatibility:

The library and example applications have been tested on Raspberry Pi 3B+ (Python 3.5.3) and Windows 7 (Python 3.6.6) using OpenCV 3.4.3.18, Tensorflow 1.8.0 and Keras 2.0.8. For complete dependencies, refer to requirements.txt. Tested with built-in laptop camera and with a Logitech C922 Full-HD USB webcam.

I encountered DLL issue with OpenCV 3.4.3.18 on my Windows 7 laptop. If you encounter such issue, use OpenCV 3.4.1.15 or 3.3.1.11 instead. Also note that opencv-python and opencv-contrib-python must always have the same version.

Usage:

Installation:

    1. Install Python 3 and Python PIP
       Use Python 3.5.3 for Raspberry Pi 3B+ and Python 3.6.6 for Windows
    2. Install the required Python PIP package dependencies using requirements.txt
       pip install -r requirements.txt

       This will install the following dependencies below:
       opencv-python==3.4.3.18
       opencv-contrib-python==3.4.3.18
       numpy==1.15.4
       imutils==0.5.1
       dlib==19.16.0
       scipy==1.1.0
       scikit-learn==0.20.0
       mtcnn==0.0.8
       tensorflow==1.8.0
       keras==2.0.8
       h5py==2.8.0
       facenet==1.0.3
       flask==1.0.2

    3. Optional: Install the required Python PIP package dependencies for text-to-speech synthesizer for voice capability 
       pip install -r requirements_with_synthesizer.txt

       This will install additional dependencies below:
       playsound==1.2.2
       inflect==0.2.5
       librosa==0.5.1
       unidecode==0.4.20
       pyttsx3==2.7
       pypiwin32==223

Quickstart (Dummy Guide):

    1. Add your dataset
       ex. datasets/person1/1.jpg, datasets/person2/1.jpg
    2. Train your model with your dataset
       Update facial_recognition_training.bat to specify your chosen models
       Run facial_recognition_training.bat
    3. Test your model
       Update facial_recognition_testing_image.bat to specify your chosen models
       Run facial_recognition_testing_image.bat

Folder structure:

    libfaceid
    |
    |   facial_estimation_poseagegenderemotion_webcam.py
    |   facial_recognition.py
    |   facial_recognition_testing_image.py
    |   facial_recognition_testing_webcam.py
    |   facial_recognition_testing_webcam_voiceenabled.py
    |   facial_recognition_training.py
    |   requirements.txt
    |   requirements_with_synthesizer.txt
    |   
    +---libfaceid
    |   |   age.py
    |   |   classifier.py
    |   |   detector.py
    |   |   emotion.py
    |   |   encoder.py
    |   |   gender.py
    |   |   liveness.py
    |   |   pose.py
    |   |   synthesizer.py
    |   |   __init__.py
    |   |   
    |   \---tacotron
    |           
    +---models
    |   +---detection
    |   |       deploy.prototxt
    |   |       haarcascade_frontalface_default.xml
    |   |       mmod_human_face_detector.dat
    |   |       res10_300x300_ssd_iter_140000.caffemodel
    |   |       
    |   +---encoding
    |   |       dlib_face_recognition_resnet_model_v1.dat
    |   |       facenet_20180402-114759.pb
    |   |       openface_nn4.small2.v1.t7
    |   |       shape_predictor_5_face_landmarks.dat
    |   |           
    |   +---estimation
    |   |       age_deploy.prototxt
    |   |       age_net.caffemodel
    |   |       emotion_deploy.json
    |   |       emotion_net.h5
    |   |       gender_deploy.prototxt
    |   |       gender_net.caffemodel
    |   |       shape_predictor_68_face_landmarks.dat
    |   |       shape_predictor_68_face_landmarks.jpg
    |   |               
    |   +---synthesis
    |   |   \---tacotron-20180906
    |   |           model.ckpt.data-00000-of-00001
    |   |           model.ckpt.index
    |   |           
    |   \---training // This is generated during training (ex. facial_recognition_training.py)
    |           dlib_le.pickle
    |           dlib_re.pickle
    |           facenet_le.pickle
    |           facenet_re.pickle
    |           lbph.yml
    |           lbph_le.pickle
    |           openface_le.pickle
    |           openface_re.pickle
    |
    +---audiosets // This is generated during training (ex. facial_recognition_training.py)
    |       Person1.wav
    |       Person2.wav
    |       Person3.wav
    |       
    +---datasets // This is generated by user
    |   +---Person1
    |   |       1.jpg
    |   |       2.jpg
    |   |       ...
    |   |       X.jpg
    |   |       
    |   +---Person2
    |   |       1.jpg
    |   |       2.jpg
    |   |       ...
    |   |       X.jpg
    |   |       
    |   \---Person3
    |           1.jpg
    |           2.jpg
    |           ...
    |           X.jpg
    |           
    \---templates

Pre-requisites:

    1. Add the dataset of images under the datasets directory
       The datasets folder should be in the same location as the test applications.
       Having more images per person makes accuracy much better.
       If only 1 image is possible, then do data augmentation.
         Example:
         datasets/Person1 - contain images of person name Person1
         datasets/Person2 - contain images of person named Person2 
         ...
         datasets/PersonX - contain images of person named PersonX 
    2. Train the model using the datasets. 
       Can use facial_recognition_training.py
       Make sure the models used for training is the same for actual testing for better accuracy.

Examples:

    detector models:       0-HAARCASCADE, 1-DLIBHOG, 2-DLIBCNN, 3-SSDRESNET, 4-MTCNN, 5-FACENET
    encoder models:        0-LBPH, 1-OPENFACE, 2-DLIBRESNET, 3-FACENET
    classifier algorithms: 0-NAIVE_BAYES, 1-LINEAR_SVM, 2-RBF_SVM, 3-NEAREST_NEIGHBORS, 4-DECISION_TREE, 
                           5-RANDOM_FOREST, 6-NEURAL_NET, 7-ADABOOST, 8-QDA
    camera resolution:     0-QVGA, 1-VGA, 2-HD, 3-FULLHD
    synthesizer models:    0-TACOTRON

    1. facial_recognition_training.py
        Usage: python facial_recognition_training.py --detector 0 --encoder 0 --classifier 0
        Usage: python facial_recognition_training.py --detector 0 --encoder 3 --classifier 1 --setsynthesizer True --synthesizer 0

    2. facial_recognition_testing_image.py
        Usage: python facial_recognition_testing_image.py --detector 0 --encoder 0 --image datasets/rico/1.jpg

    3. facial_recognition_testing_webcam.py
        Usage: python facial_recognition_testing_webcam.py --detector 0 --encoder 0 --webcam 0 --resolution 0
    4. facial_recognition_testing_webcam_flask.py
        Usage: python facial_recognition_testing_webcam_flask.py
               Then open browser and type http://127.0.0.1:5000 or http://ip_address:5000

    5. facial_estimation_poseagegenderemotion_webcam.py
        Usage: python facial_estimation_poseagegenderemotion_webcam.py --detector 0 --webcam 0 --resolution 0
    6. facial_estimation_poseagegenderemotion_webcam_flask.py
        Usage: python facial_estimation_poseagegenderemotion_webcam_flask.py
               Then open browser and type http://127.0.0.1:5000 or http://ip_address:5000

Training models with dataset of images:

    from libfaceid.detector import FaceDetectorModels, FaceDetector
    from libfaceid.encoder  import FaceEncoderModels, FaceEncoder
    from libfaceid.classifier  import FaceClassifierModels

    INPUT_DIR_DATASET         = "datasets"
    INPUT_DIR_MODEL_DETECTION = "models/detection/"
    INPUT_DIR_MODEL_ENCODING  = "models/encoding/"
    INPUT_DIR_MODEL_TRAINING  = "models/training/"

    face_detector = FaceDetector(model=FaceDetectorModels.DEFAULT, path=INPUT_DIR_MODEL_DETECTION)
    face_encoder = FaceEncoder(model=FaceEncoderModels.DEFAULT, path=INPUT_DIR_MODEL_ENCODING, path_training=INPUT_DIR_MODEL_TRAINING, training=True)
    face_encoder.train(face_detector, path_dataset=INPUT_DIR_DATASET, verify=verify, classifier=FaceClassifierModels.NAIVE_BAYES)

    // generate audio samples for image datasets using text to speech synthesizer
    OUTPUT_DIR_AUDIOSET       = "audiosets/"
    INPUT_DIR_MODEL_SYNTHESIS = "models/synthesis/"
    from libfaceid.synthesizer import TextToSpeechSynthesizerModels, TextToSpeechSynthesizer
    synthesizer = TextToSpeechSynthesizer(model=TextToSpeechSynthesizerModels.DEFAULT, path=INPUT_DIR_MODEL_SYNTHESIS, path_output=OUTPUT_DIR_AUDIOSET)
    synthesizer.synthesize_datasets(INPUT_DIR_DATASET)

Face Recognition on images:

    import cv2
    from libfaceid.detector import FaceDetectorModels, FaceDetector
    from libfaceid.encoder  import FaceEncoderModels, FaceEncoder

    INPUT_DIR_MODEL_DETECTION = "models/detection/"
    INPUT_DIR_MODEL_ENCODING  = "models/encoding/"
    INPUT_DIR_MODEL_TRAINING  = "models/training/"

    image = cv2.VideoCapture(imagePath)
    face_detector = FaceDetector(model=FaceDetectorModels.DEFAULT, path=INPUT_DIR_MODEL_DETECTION)
    face_encoder = FaceEncoder(model=FaceEncoderModels.DEFAULT, path=INPUT_DIR_MODEL_ENCODING, path_training=INPUT_DIR_MODEL_TRAINING, training=False)

    frame = image.read()
    faces = face_detector.detect(frame)
    for (index, face) in enumerate(faces):
        (x, y, w, h) = face
        face_id, confidence = face_encoder.identify(frame, (x, y, w, h))
        label_face(frame, (x, y, w, h), face_id, confidence)
    cv2.imshow(window_name, frame)
    cv2.waitKey(5000)

    image.release()
    cv2.destroyAllWindows()

Real-Time Face Recognition (w/a webcam):

    import cv2
    from libfaceid.detector import FaceDetectorModels, FaceDetector
    from libfaceid.encoder  import FaceEncoderModels, FaceEncoder

    INPUT_DIR_MODEL_DETECTION = "models/detection/"
    INPUT_DIR_MODEL_ENCODING  = "models/encoding/"
    INPUT_DIR_MODEL_TRAINING  = "models/training/"

    camera = cv2.VideoCapture(webcam_index)
    face_detector = FaceDetector(model=FaceDetectorModels.DEFAULT, path=INPUT_DIR_MODEL_DETECTION)
    face_encoder = FaceEncoder(model=FaceEncoderModels.DEFAULT, path=INPUT_DIR_MODEL_ENCODING, path_training=INPUT_DIR_MODEL_TRAINING, training=False)

    while True:
        frame = camera.read()
        faces = face_detector.detect(frame)
        for (index, face) in enumerate(faces):
            (x, y, w, h) = face
            face_id, confidence = face_encoder.identify(frame, (x, y, w, h))
            label_face(frame, (x, y, w, h), face_id, confidence)
        cv2.imshow(window_name, frame)
        cv2.waitKey(1)

    camera.release()
    cv2.destroyAllWindows()

Voice-Enabled Real-Time Face Recognition (w/a webcam):

    import cv2
    from libfaceid.detector import FaceDetectorModels, FaceDetector
    from libfaceid.encoder  import FaceEncoderModels, FaceEncoder
    from libfaceid.synthesizer import TextToSpeechSynthesizerModels, TextToSpeechSynthesizer

    INPUT_DIR_MODEL_DETECTION = "models/detection/"
    INPUT_DIR_MODEL_ENCODING  = "models/encoding/"
    INPUT_DIR_MODEL_TRAINING  = "models/training/"
    INPUT_DIR_AUDIOSET        = "audiosets"

    camera = cv2.VideoCapture(webcam_index)
    face_detector = FaceDetector(model=FaceDetectorModels.DEFAULT, path=INPUT_DIR_MODEL_DETECTION)
    face_encoder = FaceEncoder(model=FaceEncoderModels.DEFAULT, path=INPUT_DIR_MODEL_ENCODING, path_training=INPUT_DIR_MODEL_TRAINING, training=False)
    tts_synthesizer = TextToSpeechSynthesizer(model=model_synthesizer, path=None, path_output=None, training=False)

    frame_count = 0
    while True:
        frame = camera.read()
        faces = face_detector.detect(frame)
        for (index, face) in enumerate(faces):
            (x, y, w, h) = face
            face_id, confidence = face_encoder.identify(frame, (x, y, w, h))
            label_face(frame, (x, y, w, h), face_id, confidence)
            if (frame_count % 120 == 0):
                tts_synthesizer.playaudio(INPUT_DIR_AUDIOSET, face_id, block=False)
        cv2.imshow(window_name, frame)
        cv2.waitKey(1)
        frame_count += 1

    camera.release()
    cv2.destroyAllWindows()

Real-Time Face Pose/Age/Gender/Emotion Estimation (w/a webcam):

    import cv2
    from libfaceid.detector import FaceDetectorModels, FaceDetector
    from libfaceid.pose import FacePoseEstimatorModels, FacePoseEstimator
    from libfaceid.age import FaceAgeEstimatorModels, FaceAgeEstimator
    from libfaceid.gender import FaceGenderEstimatorModels, FaceGenderEstimator
    from libfaceid.emotion import FaceEmotionEstimatorModels, FaceEmotionEstimator

    INPUT_DIR_MODEL_DETECTION       = "models/detection/"
    INPUT_DIR_MODEL_ENCODING        = "models/encoding/"
    INPUT_DIR_MODEL_TRAINING        = "models/training/"
    INPUT_DIR_MODEL_ESTIMATION      = "models/estimation/"

    camera = cv2.VideoCapture(webcam_index)
    face_detector = FaceDetector(model=FaceDetectorModels.DEFAULT, path=INPUT_DIR_MODEL_DETECTION)
    face_pose_estimator = FacePoseEstimator(model=FacePoseEstimatorModels.DEFAULT, path=INPUT_DIR_MODEL_ESTIMATION)
    face_age_estimator = FaceAgeEstimator(model=FaceAgeEstimatorModels.DEFAULT, path=INPUT_DIR_MODEL_ESTIMATION)
    face_gender_estimator = FaceGenderEstimator(model=FaceGenderEstimatorModels.DEFAULT, path=INPUT_DIR_MODEL_ESTIMATION)
    face_emotion_estimator = FaceEmotionEstimator(model=FaceEmotionEstimatorModels.DEFAULT, path=INPUT_DIR_MODEL_ESTIMATION)

    while True:
        frame = camera.read()
        faces = face_detector.detect(frame)
        for (index, face) in enumerate(faces):
            (x, y, w, h) = face
            age = face_age_estimator.estimate(frame, face_image)
            gender = face_gender_estimator.estimate(frame, face_image)
            emotion = face_emotion_estimator.estimate(frame, face_image)
            shape = face_pose_estimator.detect(frame, face)
            face_pose_estimator.add_overlay(frame, shape)
            label_face(age, gender, emotion)
        cv2.imshow(window_name, frame)
        cv2.waitKey(1)

    camera.release()
    cv2.destroyAllWindows()

Case Study - Face Recognition for Identity Authentication:

One of the use cases of face recognition is for security identity authentication. This is a convenience feature to authenticate with system using one's face instead of inputting passcode or scanning fingerprint. Passcode is often limited by the maximum number of digits allowed while fingerprint scanning often has problems with wet fingers or dry skin. Face authentication offers a more reliable and secure way to authenticate.

When used for identity authentication, face recognition specifications will differ a lot from general face recognition systems like Facebook's automated tagging and Google's search engine; it will be more like Apple's Face ID in IPhone X. Below are guidelines for drafting specifications for your face recognition solution. Note that Apple's Face ID technology will be used as the primary baseline in this case study of identity authentication use case of face recognition. Refer to this Apple's Face ID white paper for more information.

Face Enrollment

Should support dynamic enrollment of faces. Tied up with the maximum number of users the existing system supports.
Should ask user to move/rotate face (in a circular motion) in order to capture different angles of the face. This gives the system enough flexbility to recognize you at different face angles.
IPhone X Face ID face enrollment is done twice for some reason. It is possible that the first scan is for liveness detection only.
How many images should be captured? We can store as much image as possible for better accuracy but memory footprint is the limiting factor. Estimate based on size of 1 picture and the maximum number of users.
For security purposes and memory related efficiency, images used during enrollment should not be saved. Only the mathematical representations (128-dimensional vector) of the face should be used.

Face Capture

Camera will be about 1 foot away from user (Apple Face ID: 10-20 inches).
Camera resolution will depend on display panel size and display resolutions. QVGA size is acceptable for embedded solutions.
Take into consideration a bad lighting and extremely dark situation. Should camera have a good flash/LED to emit some light. Iphone X has an infrared light to better perform on dark settings.

Face Detection

Only 1 face per frame is detected.
Face is expected to be within a certain location (inside a fixed box or circular region).
Detection of faces will be triggered by a user action - clicking some button. (Not automatic detection).
Face alignment may not be helpful as users can be enforced or directed to have his face inside a fixed box or circular region so face is already expected to be aligned for the most cases. But if adding this feature does not affect speed performance, then face alignment ahould be added if possible.
Should verify if face is alive via anti-spoofing techniques against picture-based attacks, video-based attacks and 3D mask attacks. Two popular example of liveness detection is counting of eye blinks and dectecting smile.

Face Encoding/Embedding

Speed is not a big factor. Face embedding and face identification can take 3-5 seconds.
Accuracy is critically important. False match rate should be low as much as possible.
Can do multiple predictions and get the highest count. Or apply different models for predictions for double checking.

Face Identification

Classification model should consider the maximum number of users to support. For example, SVM is known to be good for less than 100 classes/persons only.
Should support unknown identification by setting a threshold on the best prediction. If best prediction is too low, then consider as Unknown.
Set the number of consecutive failed attempts allowed before disabling face recognition feature. Should fallback to passcode authentication if identification encounters trouble recognizing people.
Images used for successful scan should be added to the existing dataset images during face enrollment making it adaptive and updated so that a person can be recognized with better accuracy in the future even with natural changes in the face appearance (hairstyle, mustache, pimples, etc.)

In addition to these guidelines, the face recognition solution should provide a way to disable/enable this feature as well as resetting the stored datasets during face enrollment.

Case Study - Face Recognition for Home/Office/Hotel Greeting System:

One of the use cases of face recognition is for greeting system used in smart homes, office and hotels. To enable voice capability feature, we use text-to-speech synthesis to dynamically create audio files given some input text.

Speech Synthesis

Speech synthesis is the artificial simulation of human speech by a computer device. It is mostly used for translating text into audio to make the system voice-enabled. Products such as Apple's Siri, Microsoft's Cortana, Amazon Echo and Google Assistant uses speech synthesis. A good speech synthesizer is one that produces accurate outputs that naturally sounds like a real human in near real-time. State-of-the-art speech synthesis includes Deepmind's WaveNet and Google's Tacotron.

Speech Synthesis can be used for some use-cases of Face Recognition to enable voice capability feature. One example is to greet user as he approaches the terminal or kiosk system. Given some input text, the speech synthesizer can generate an audio which can be played upon recognizing a face. For example, upon detecting person arrival, it can be set to say 'Hello PersonX, welcome back...'. Upon departure, it can be set to say 'Goodbye PersonX, see you again soon...'. It can be used in smart homes, office lobbies, luxury hotel rooms, and modern airports.

Face Enrollment

For each person who registers/enrolls to the system, create an audio file "PersonX.wav" for some input text such as "Hello PersonX".

Face Identification

When a person is identified to be part of the database, we play the corresponding audio file "PersonX.wav".

Performance Optimizations:

Speed and accuracy is often a trade-off. Performance can be optimized depending on your specific use-case and system requirements. Some models are optimized for speed while others are optimized for accuracy. Be sure to test all the provided models to determine the appropriate model for your specific use-case, target platform (CPU, GPU or embedded) and specific requirements. Below are additional suggestions to optimize performance.

Speed

Reduce the frame size for face detection.
Perform face recognition every X frames only
Use threading in reading camera source frames or in processing the camera frames.
Update the library and configure the parameters directly.

Accuracy

Add more datasets if possible (ex. do data augmentation). More images per person will often result to higher accuracy.
Add face alignment if faces in the datasets are not aligned or when faces may be unaligned in actual deployment.
Update the library and configure the parameters directly.

References:

Below are links to valuable resoures. Special thanks to all of these guys for sharing their work on Face Recognition. Without them, learning Face Recognition would be difficult.

Codes

Google and Facebook have access to large database of pictures being the best search engine and social media platform, respectively. Below are the face recognition models they have designed for their own system. Be sure to take time to read these papers for better understanding of high-quality face recognition models.

Papers

Contribute:

Have a good idea for improving libfaceid? Please message me in twitter. If libfaceid has helped you in learning or prototyping face recognition system, please be kind enough to give this repository a 'Star'.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
libfaceid		libfaceid
models		models
teaser		teaser
templates		templates
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
facial_estimation_poseagegenderemotion_webcam.bat		facial_estimation_poseagegenderemotion_webcam.bat
facial_estimation_poseagegenderemotion_webcam.py		facial_estimation_poseagegenderemotion_webcam.py
facial_estimation_poseagegenderemotion_webcam_flask.py		facial_estimation_poseagegenderemotion_webcam_flask.py
facial_recognition.py		facial_recognition.py
facial_recognition_testing_image.bat		facial_recognition_testing_image.bat
facial_recognition_testing_image.py		facial_recognition_testing_image.py
facial_recognition_testing_webcam.bat		facial_recognition_testing_webcam.bat
facial_recognition_testing_webcam.py		facial_recognition_testing_webcam.py
facial_recognition_testing_webcam_flask.py		facial_recognition_testing_webcam_flask.py
facial_recognition_testing_webcam_voiceenabled.bat		facial_recognition_testing_webcam_voiceenabled.bat
facial_recognition_testing_webcam_voiceenabled.py		facial_recognition_testing_webcam_voiceenabled.py
facial_recognition_training.bat		facial_recognition_training.bat
facial_recognition_training.py		facial_recognition_training.py
requirements.txt		requirements.txt
requirements_with_synthesizer.txt		requirements_with_synthesizer.txt

License

edurdag/libfaceid

Folders and files

Latest commit

History

Repository files navigation

libfaceid, a Face Recognition library for everybody

News:

Background:

Introduction:

Problem:

Design:

Features:

Face Detection models for detecting face locations

Face Encoding models for generating face embeddings on detected faces

Classification algorithms for Face Identification using face embeddings

Text To Speech synthesizer models for generating audio given some text

Additional models (bonus features for PR):

Compatibility:

Usage:

Installation:

Quickstart (Dummy Guide):

Folder structure:

Pre-requisites:

Examples:

Training models with dataset of images:

Face Recognition on images:

Real-Time Face Recognition (w/a webcam):

Voice-Enabled Real-Time Face Recognition (w/a webcam):

Real-Time Face Pose/Age/Gender/Emotion Estimation (w/a webcam):

Case Study - Face Recognition for Identity Authentication:

Face Enrollment

Face Capture

Face Detection

Face Encoding/Embedding

Face Identification

Case Study - Face Recognition for Home/Office/Hotel Greeting System:

Speech Synthesis

Face Enrollment

Face Identification

Performance Optimizations:

Speed

Accuracy

References:

Codes

Papers

Contribute:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages