Participation in the Intel Edge AI Udacity Scholarship Show Case Group Project
- Follow Udacity Git Commit Message Style Guide: https://udacity.github.io/git-styleguide/
Collaborators:
Name | Slack Name |
---|---|
Sarah Majors | Sarah Majors |
Harkirat Singh | Harkirat |
Hsin Wen Chang | Bearbear |
Halwai Aftab Hasan | aftab |
Anshu Trivedi | Anshu Trivedi |
Frida Rode | Frida |
Our story start from trying to mitigation traffic jam in every city globally. In the early stage perform statistics inference detected objects on the video stream to inform traffic status in real-time. In this Intel® Edge AI Group showcase project, our goal is to extend our learning experience from Intel® Edge AI Scholarship Foundation Course Nanodegree and Hands-on Implementation as follow:
- Import OpenVINO Toolkit, build and compile successfully on Google Colab.
- Load pre-trained models.
- Perform model optimization.
- Integrate with TensorFlow Object Counting(TOC) API, the TOC API is the object detection implementation that runs and infers using the TensorFlow models at the backend; and is optimized to run at the edge using OpenVINO toolkit from Intel.
- Perform statistics inference detected objects on video stream.
- You can use public or pre-trained models. To download the pre-trained models, use the Tensorflow Model Downloader or go to PreTrain Model Download
- Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using The Model Optimizer
In this project, we feed the model into the Model Optimizer, and get the Intermediate Representation. The frozen models will need TensorFlow-specific parameters like --tensorflow_use_custom_operations_config
and --tensorflow_object_detection_api_pipeline_config
. Also, --reverse_input_channels
is usually needed, as TF model zoo models are trained on RGB images, while OpenCV usually loads as BGR. Certain models, like YOLO, DeepSpeech, and more, have their own separate pages.
- Quantization is the process of reducing the precision of a model. In the deep learning research field, the predominant numerical format used for research and for deployment has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and computational energy consumption of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activations can be represented using INT8 without incurring significant loss in accuracy. The use of even lower bit-widths, such as 4/2/1-bits, is an active field of research that has also shown great progress.
- The OpenVINO™ Toolkit, models usually default to FP32, or 32-bit floating point values, while FP16 and INT8, for 16-bit floating point and 8-bit integer values, are also available (INT8 is only currently available in the Pre-Trained Models; the Model Optimizer does not currently support that level of precision). FP16 and INT8 will lose some accuracy, but the model will be smaller in memory and compute times faster. Therefore, quantization is a common method used for running models at the edge.
INT8 Operation | Energy Saving vs FP32 | Area Saving vs FP32 |
---|---|---|
Add | 30x | 116x |
Multiply | 18.5x | 27x |
The following Intel® hardware devices are supported for optimal performance with the OpenVINO™ Toolkit’s Inference Engine:
Device Types |
---|
CPUs |
GPUs |
VPUs |
FPGAs |
The TensorFlow Object Counting API is used as a base for object counting on this project, more info can be found on this repo. And a modified version repo for run on Google Colab
In this section, we further enhance real-time object counting by using TensorFlow Object Counting API to process input video stream perform real-time object detetction, track and count.
In this section, we use what we learn from The Intel® Edge AI Scholarship Foundation Course Nanodegree Lesson 5 Deploying an edge app section 4 Handling Input Streams implement 'cv2.VideoCapture' lifecycle and 'cv2.VideoWriter' fulfilled Object Tracking and count detected Objects work flow.
In here, we use the Single Shot Detector with MobileNet from TensorFlow Detection Model Zoo. SSD is designed for object detection in real-time. SSD speeds up the process by eliminating the need of the region proposal network. SSD applies multi-scale features and default boxes to recover the drop in accuracy. These improvements allow SSD to match the Faster R-CNN’s accuracy using lower resolution images thus pushes the speed higher.
@ONLINE{tfocapi,
author = "Ahmet Özlü",
title = "TensorFlow Object Counting API",
year = "2018",
url = "https://github.com/ahmetozlu/tensorflow_object_counting_api"
}
This system is available under the GNU - 3.0 license. See the LICENSE file for more info.
The following is our current progress status to perform statistics inference on video stream.
General Count | Multi class count |
---|---|
Barcelona | |
![]() |
![]() |
Taipei City | |
![]() |
![]() |
India | |
![]() |
![]() |
In this project, we successfully perform statistics inference detected objects on the video stream to inform traffic status in real-time. The next step we will implement the part to detect car accident through surveillance camera systems and throw alert in real-time inform the need of to mitigation traffic route can be applicable in every city globally.