From 0deb503d92d252ec3d5f3eddab215e646613256e Mon Sep 17 00:00:00 2001 From: Marcelo Rovai Date: Mon, 16 Sep 2024 12:36:16 -0300 Subject: [PATCH 1/3] Add files via upload Updating the use of specific model sections and correcting typos --- contents/labs/raspi/setup/setup.qmd | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/contents/labs/raspi/setup/setup.qmd b/contents/labs/raspi/setup/setup.qmd index 2e092fa5..4110320b 100644 --- a/contents/labs/raspi/setup/setup.qmd +++ b/contents/labs/raspi/setup/setup.qmd @@ -4,7 +4,7 @@ This chapter will guide you through setting up Raspberry Pi Zero 2 W (*Raspi-Zero*) and Raspberry Pi 5 (*Raspi-5*) models. We'll cover hardware setup, operating system installation, initial configuration, and tests. -> The general instructions for the *Rasp-5* also apply to the older Raspberry Pi versions, such as the Rasp-3 and Raspi-4. +> The general instructions for the *Raspi-5* also apply to the older Raspberry Pi versions, such as the Raspi-3 and Raspi-4. ## Introduction @@ -78,6 +78,8 @@ This tutorial will guide you through setting up the most common Raspberry Pi mod - **Ports**: 2 × micro HDMI ports, 2 × USB 3.0 ports, 2 × USB 2.0 ports, CSI camera port, DSI display port - **Power**: 5V DC via USB-C connector (3A) +> In the labs, we will use different names to address the Raspberry: `Raspi`, `Raspi-5`, `Raspi-Zero`, etc. Usually, `Raspi` is used when the instructions or comments apply to every model. + ## Installing the Operating System ### The Operating System (OS) @@ -127,7 +129,7 @@ Follow the steps to install the OS in your Raspi. ![img](images/png/zero-burn.png) - > Due to its reduced SDRAM (512MB), the recommended OS for the Rasp Zero is the 32-bit version. However, to run some machine learning models, such as the YOLOv8 from Ultralitics, we should use the 64-bit version. Although Raspi-Zero can run a *desktop*, we will choose the LITE version (no Desktop) to reduce the RAM needed for regular operation. + > Due to its reduced SDRAM (512MB), the recommended OS for the Raspi-Zero is the 32-bit version. However, to run some machine learning models, such as the YOLOv8 from Ultralitics, we should use the 64-bit version. Although Raspi-Zero can run a *desktop*, we will choose the LITE version (no Desktop) to reduce the RAM needed for regular operation. - For **Raspi-5**: We can select the full 64-bit version, which includes a desktop: `Raspberry Pi OS (64-bit)` @@ -141,7 +143,7 @@ Follow the steps to install the OS in your Raspi. 8. Write the image to the microSD card. -> In the examples here, we will use different hostnames: raspi, raspi-5, raspi-Zero, etc. You should replace by the one that you are using. +> In the examples here, we will use different hostnames depending on the device used: raspi, raspi-5, raspi-Zero, etc. It would help if you replaced it with the one you are using. ### Initial Configuration @@ -155,7 +157,7 @@ Follow the steps to install the OS in your Raspi. ### SSH Access -The easiest way to interact with the Rasp-Zero is via SSH ("Headless"). You can use a Terminal (MAC/Linux), [PuTTy (](https://www.putty.org/)Windows), or any other. +The easiest way to interact with the Raspi-Zero is via SSH ("Headless"). You can use a Terminal (MAC/Linux), [PuTTy (](https://www.putty.org/)Windows), or any other. 1. Find your Raspberry Pi's IP address (for example, check your router). @@ -310,7 +312,7 @@ CONF_SWAPSIZE=2000 And save the file. -Next, turn on the swapfile again and reboot the Rasp-zero: +Next, turn on the swapfile again and reboot the Raspi-zero: ```bash sudo dphys-swapfile setup @@ -324,7 +326,7 @@ When your device is rebooted (you should enter with the SSH again), you will rea ## Installing a Camera -The Raspi is an excellent device for computer vision applications; a camera is needed for it. We can install a standard USB webcam on the micro-USB port using a USB OTG adapter (Raspi-Zero and Rasp-5) or a camera module connected to the Raspi CSI (Camera Serial Interface) port. +The Raspi is an excellent device for computer vision applications; a camera is needed for it. We can install a standard USB webcam on the micro-USB port using a USB OTG adapter (Raspi-Zero and Raspi-5) or a camera module connected to the Raspi CSI (Camera Serial Interface) port. > USB Webcams generally have inferior quality to the camera modules that connect to the CSI port. They can also not be controlled using the `raspistill` and `rasivid` commands in the terminal or the `picamera` recording package in Python. Nevertheless, there may be reasons why you want to connect a USB camera to your Raspberry Pi, such as because of the benefit that it is much easier to set up multiple cameras with a single Raspberry Pi, long cables, or simply because you have such a camera on hand. @@ -564,16 +566,19 @@ While we've primarily interacted with the Raspberry Pi using terminal commands v ## Model-Specific Considerations -### Raspberry Pi Zero +### Raspberry Pi Zero (Raspi-Zero) + - Limited processing power, best for lightweight projects -- Use headless setup (SSH) to conserve resources. +- It is better to use a headless setup (SSH) to conserve resources. - Consider increasing swap space for memory-intensive tasks. +- It can be used for Image Classification and Object Detection Labs but not for the LLM (SLM). -### Raspberry Pi 4 or 5 +### Raspberry Pi 4 or 5 (Raspi-4 or Raspi-5) - Suitable for more demanding projects, including AI and machine learning. -- Can run full desktop environment smoothly. -- For Pi 5, consider using an active cooler for temperature management during intensive tasks. +- It can run the whole desktop environment smoothly. +- Raspi-4 can be used for Image Classification and Object Detection Labs but will not work well with LLMs (SLM). +- For Raspi-5, consider using an active cooler for temperature management during intensive tasks, as in the LLMs (SLMs) lab. -Remember to adjust your project requirements based on the specific Raspberry Pi model you're using. The Pi Zero is great for low-power, space-constrained projects, while the Pi 4/5 models are better suited for more computationally intensive tasks. +Remember to adjust your project requirements based on the specific Raspberry Pi model you're using. The Raspi-Zero is great for low-power, space-constrained projects, while the Raspi-4 or 5 models are better suited for more computationally intensive tasks. From 304310210379445a38e11707abe8fa36004fe383 Mon Sep 17 00:00:00 2001 From: Marcelo Rovai Date: Mon, 16 Sep 2024 12:37:33 -0300 Subject: [PATCH 2/3] correcting Typos correcting Typos --- .../raspi/image_classification/image_classification.qmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/contents/labs/raspi/image_classification/image_classification.qmd b/contents/labs/raspi/image_classification/image_classification.qmd index e8f916b3..0acf4203 100644 --- a/contents/labs/raspi/image_classification/image_classification.qmd +++ b/contents/labs/raspi/image_classification/image_classification.qmd @@ -943,7 +943,7 @@ The final dense layer of our model will have 0 neurons with a 10% dropout for ov ![](images/png/result-train.png) -The result is excellent, with a reasonable 35ms of latency (for a Rasp-4), which should result in around 30 fps (frames per second) during inference. A Raspi-Zero should be slower, and the Rasp-5, faster. +The result is excellent, with a reasonable 35ms of latency (for a Raspi-4), which should result in around 30 fps (frames per second) during inference. A Raspi-Zero should be slower, and the Raspi-5, faster. ### Trading off: Accuracy versus speed @@ -1213,7 +1213,7 @@ Let's download a smaller model, such as the one trained for the [Nicla Vision La ![](images/png/infer-int8-96.png) -The model lost some accuracy, but it is still OK once our model does not look for many details. Regarding latency, we are around **ten times faster** on the Rasp-Zero. +The model lost some accuracy, but it is still OK once our model does not look for many details. Regarding latency, we are around **ten times faster** on the Raspi-Zero. ## Live Image Classification @@ -1505,7 +1505,7 @@ The code creates a web application for real-time image classification using a Ra ## Conclusion: -Image classification has emerged as a powerful and versatile application of machine learning, with significant implications for various fields, from healthcare to environmental monitoring. This chapter has demonstrated how to implement a robust image classification system on edge devices like the Raspi-Zero and Rasp-5, showcasing the potential for real-time, on-device intelligence. +Image classification has emerged as a powerful and versatile application of machine learning, with significant implications for various fields, from healthcare to environmental monitoring. This chapter has demonstrated how to implement a robust image classification system on edge devices like the Raspi-Zero and Raspi-5, showcasing the potential for real-time, on-device intelligence. We've explored the entire pipeline of an image classification project, from data collection and model training using Edge Impulse Studio to deploying and running inferences on a Raspi. The process highlighted several key points: From 98b51bbb9719e27f7b47169620b8d57dfcbf82ae Mon Sep 17 00:00:00 2001 From: Marcelo Rovai Date: Mon, 16 Sep 2024 12:38:32 -0300 Subject: [PATCH 3/3] Correcting Typos Correcting Typos --- .../object_detection/object_detection.qmd | 120 ++++++++++-------- 1 file changed, 67 insertions(+), 53 deletions(-) diff --git a/contents/labs/raspi/object_detection/object_detection.qmd b/contents/labs/raspi/object_detection/object_detection.qmd index f314854e..1bd4dac1 100644 --- a/contents/labs/raspi/object_detection/object_detection.qmd +++ b/contents/labs/raspi/object_detection/object_detection.qmd @@ -26,17 +26,19 @@ As we put our hands into object detection, we'll build upon the concepts and tec - FOMO (Faster Objects, More Objects), and - YOLO (You Only Look Once). -We will explore those object detection models, using +> To learn more about object detection models, follow the tutorial [A Gentle Introduction to Object Recognition With Deep Learning](https://machinelearningmastery.com/object-recognition-with-deep-learning/). -- TensorFlow Lite Runtime, -- Edge Impulse Linux Python SDK, and +We will explore those object detection models using + +- TensorFlow Lite Runtime (now changed to [LiteRT](https://ai.google.dev/edge/litert)), +- Edge Impulse Linux Python SDK and - Ultralitics ![](images/png/block.png) -In general, throughout this lab, we'll cover the fundamentals of object detection and how it differs from image classification and learn how to train, fine-tune, test, optimize, and deploy those popular object detection architectures using a dataset created from scratch. +Throughout this lab, we’ll cover the fundamentals of object detection and how it differs from image classification. We'll also learn how to train, fine-tune, test, optimize, and deploy popular object detection architectures using a dataset created from scratch. -### Object detection Fundamentals +### Object Detection Fundamentals Object detection builds upon the foundations of image classification but extends its capabilities significantly. To understand object detection, it's crucial first to recognize its key differences from image classification: @@ -45,7 +47,7 @@ Object detection builds upon the foundations of image classification but extends **Image Classification:** - Assigns a single label to an entire image -- Answers the question: "What is the primary object or scene in this image?" +- Answers the question: “What is this image's primary object or scene?” - Outputs a single class prediction for the whole image **Object Detection:** @@ -82,15 +84,15 @@ There are two main approaches to object detection: 1. Two-stage detectors: These first propose regions of interest and then classify each region. Examples include R-CNN and its variants (Fast R-CNN, Faster R-CNN). -2. Single-stage detectors: These predict bounding boxes and class probabilities in one forward pass of the network. Examples include YOLO (You Only Look Once), EfficientDet, SSD (Single Shot Detector), and FOMO (Faster Objects, More Objects). These are often faster and more suitable for edge devices like Raspberry Pi. +2. Single-stage detectors: These predict bounding boxes (or centroids) and class probabilities in one forward pass of the network. Examples include YOLO (You Only Look Once), EfficientDet, SSD (Single Shot Detector), and FOMO (Faster Objects, More Objects). These are often faster and more suitable for edge devices like Raspberry Pi. #### Evaluation Metrics Object detection uses different metrics compared to image classification: -- Intersection over Union (IoU): Measures the overlap between predicted and ground truth bounding boxes. -- Mean Average Precision (mAP): Combines precision and recall across all classes and IoU thresholds. -- Frames Per Second (FPS): Measures detection speed, crucial for real-time applications on edge devices. +- **Intersection over Union (IoU)**: Measures the overlap between predicted and ground truth bounding boxes. +- **Mean Average Precision (mAP)**: Combines precision and recall across all classes and IoU thresholds. +- **Frames Per Second (FPS)**: Measures detection speed, crucial for real-time applications on edge devices. ## Pre-Trained Object Detection Models Overview @@ -98,13 +100,13 @@ As we saw in the introduction, given an image or a video stream, an object detec > You can test some common models online by visiting [Object Detection - MediaPipe Studio](https://mediapipe-studio.webapps.google.com/studio/demo/object_detector) -On [Kaggle](https://www.kaggle.com/models?id=298,130,299) we can find the most common pre-trained tflite models to use with the Raspi, [ssd_mobilenet_v1,](https://www.kaggle.com/models/tensorflow/ssd-mobilenet-v1/tfLite) and [efficiendet](https://www.kaggle.com/models/tensorflow/efficientdet/tfLite). Those models were trained on the COCO (Common Objects in Context) dataset, with over 200,000 labeled images in 91 categories. Go, download the models, and upload them to the `./models` folder in the Raspi. +On [Kaggle](https://www.kaggle.com/models?id=298,130,299), we can find the most common pre-trained tflite models to use with the Raspi, [ssd_mobilenet_v1,](https://www.kaggle.com/models/tensorflow/ssd-mobilenet-v1/tfLite) and [efficiendet](https://www.kaggle.com/models/tensorflow/efficientdet/tfLite). Those models were trained on the COCO (Common Objects in Context) dataset, with over 200,000 labeled images in 91 categories. Go, download the models, and upload them to the `./models` folder in the Raspi. -> Alternatively, on [GitHub](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/tree/main/OBJ_DETEC/models), you can find the models and the COCO labels. +> Alternatively[,](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/tree/main/OBJ_DETEC/models) you can find the models and the COCO labels on [GitHub](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/tree/main/OBJ_DETEC/models). -For the first part of this lab, we will focus on a pre-trained 300x300 SSD-Mobilenet V1 model, comparing it with the 320x320 EfficientDet-lite0, also trained using the COCO 2017 dataset. Both models were converted to a TensorFlow Lite format (4.2MB for the SSD Mobilenet and 4.6MB for the EfficienDet). +For the first part of this lab, we will focus on a pre-trained 300x300 SSD-Mobilenet V1 model and compare it with the 320x320 EfficientDet-lite0, also trained using the COCO 2017 dataset. Both models were converted to a TensorFlow Lite format (4.2MB for the SSD Mobilenet and 4.6MB for the EfficienDet). -> For transfer learning projects, is recomanded to use SSD-Mobilenet V2 or V3, but once the V1 TFLite model is publicly available, we will use it for this analysis. +> SSD-Mobilenet V2 or V3 is recommended for transfer learning projects, but once the V1 TFLite model is publicly available, we will use it for this overview. ![](images/png/model-deploy.png) @@ -126,7 +128,7 @@ source ~/tflite/bin/activate - Installing Additional Python Libraries (inside the environment) -### Creating a working directory: +### Creating a Working Directory: Considering that we have created the `Documents/TFLITE` folder in the last Lab, let's now create the specific folders for this object detection lab: @@ -174,9 +176,9 @@ The **output details** include not only the labels ("classes") and probabilities ![](images/png/inference result.png) -So, for the above example, using the same cat image used with the *Image Classification Lab*, looking for the output, we have a **76% probability** of having found an object with a **class ID of 16** on an area delimited by a **bounding box of [0.028011084, 0.020121813, 0.9886069, 0.802299]**. Those four numbers are related to `ymin`, `xmin`, `ymax` and `xmax`, the box coordinates. +So, for the above example, using the same cat image used with the *Image Classification Lab* looking for the output, we have a **76% probability** of having found an object with a **class ID of 16** on an area delimited by a **bounding box of [0.028011084, 0.020121813, 0.9886069, 0.802299]**. Those four numbers are related to `ymin`, `xmin`, `ymax` and `xmax`, the box coordinates. -Taking into consideration that **y** goes from the top `(ymin`) to the bottom (`ymax`) and **x** goes from left (`xmin`) to the right (`xmax`), we have, in fact, the coordinates of the top/left corner and the bottom/right one. With both edges and knowing the shape of the picture, it is possible to draw a rectangle around the object as shown in the figure below: +Taking into consideration that **y** goes from the top `(ymin`) to the bottom (`ymax`) and **x** goes from left (`xmin`) to the right (`xmax`), we have, in fact, the coordinates of the top/left corner and the bottom/right one. With both edges and knowing the shape of the picture, it is possible to draw a rectangle around the object, as shown in the figure below: ![](images/png/boulding-boxes.png) @@ -278,7 +280,7 @@ EfficientDet is not technically an SSD (Single Shot Detector) model, but it shar 2. Similarities to SSD: - Both are single-stage detectors, meaning they perform object localization and classification in a single forward pass. - - Both use multi-scale feature maps for detecting objects at different scales. + - Both use multi-scale feature maps to detect objects at different scales. 3. Key differences: - Backbone: SSD typically uses VGG or MobileNet, while EfficientDet uses EfficientNet. @@ -291,7 +293,8 @@ EfficientDet is not technically an SSD (Single Shot Detector) model, but it shar While EfficientDet is not an SSD model, it can be seen as an evolution of single-stage detection architectures, incorporating more advanced techniques to improve efficiency and accuracy. When using EfficientDet, we can expect similar output structures to SSD (e.g., bounding boxes and class scores). -On GitHub, you can find another [notebook](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/blob/main/OBJ_DETEC/notebooks/SSD_EfficientDet.ipynb) exploring the EfficientDet model that we did with SSD MobileNet. +> On GitHub, you can find another [notebook](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/blob/main/OBJ_DETEC/notebooks/SSD_EfficientDet.ipynb) exploring the EfficientDet model that we did with SSD MobileNet. +> ## Object Detection Project @@ -346,11 +349,13 @@ When we have enough images, we can press `Stop Capture`. The captured images are > Get around 60 images. Try to capture different angles, backgrounds, and light conditions. Filezilla can transfer the created raw dataset to your main computer. -### Labeling data +### Labeling Data + +The next step in an Object Detect project is to create a labeled dataset. We should label the raw dataset images, creating bounding boxes around each picture's objects (box and wheel). We can use labeling tools like [LabelImg,](https://pypi.org/project/labelImg/) [CVAT,](https://www.cvat.ai/) [Roboflow,](https://roboflow.com/annotate) or even the [Edge Impulse Studio.](https://edgeimpulse.com/) Once we have explored the Edge Impulse tool in other labs, let’s use Roboflow here. -The next step in an Object Detect project is to create a labeled dataset. We should label the raw dataset images, creating bounding boxes around the objects (box and wheel) in each picture. We can use labeling tools like [LabelImg](https://pypi.org/project/labelImg/), [CVAT](https://www.cvat.ai/), [Roboflow](https://roboflow.com/annotate), or even the [Edge Impulse Studio](https://edgeimpulse.com/). Let's use Roboflow. +> We are using Roboflow (free version) here for two main reasons. 1) We can have auto-labeler, and 2) The annotated dataset is available in several formats and can be used both on Edge Impulse Studio (we will use it for MobileNet V2 and FOMO train) and on CoLab (YOLOv8 train), for example. Having the annotated dataset on Edge Impulse (Free account), it is not possible to use it for training on other platforms. -We should upload the raw dataset to [Roboflow](https://roboflow.com/). There, we should create a free account and start a new project, for example, ("box-versus-wheel"). +We should upload the raw dataset to [Roboflow.](https://roboflow.com/) Create a free account there and start a new project, for example, (“box-versus-wheel”). ![](images/png/create-project-rf.png) @@ -382,7 +387,7 @@ Now, you should export the annotated dataset in a format that Edge Impulse, Ultr ![](images/png/download-dataset.png) -Here is possible to review how the dataset was structured +Here, it is possible to review how the dataset was structured ![](images/png/dataset-struct.png) @@ -422,7 +427,7 @@ Repeat the process for the test data (upload both folders, test, and validation) ### The Impulse Design -The first thing to define when we enter the `Create impulse` step is to describe the target device for deployment. A pop-up window will appear. We will select Raspberry 4, an intermediary device between the Rasp-Zero and the Rasp-5. +The first thing to define when we enter the `Create impulse` step is to describe the target device for deployment. A pop-up window will appear. We will select Raspberry 4, an intermediary device between the Raspi-Zero and the Raspi-5. > This choice will not interfere with the training; it will only give us an idea about the latency of the model on that specific target. @@ -759,13 +764,13 @@ detect_objects(img_path, conf=0.3,iou=0.05) ## Training a FOMO Model at Edge Impulse Studio -The inference with the SSD MobileNet model worked well, but the latency was significantly high. On a Rasp-Zero, the inference varied from 0.5 to 1.3 seconds, which means around or less than 1 FPS (1 frame per second). One alternative to speed up the process is to use FOMO (Faster Objects, More Objects). +The inference with the SSD MobileNet model worked well, but the latency was significantly high. The inference varied from 0.5 to 1.3 seconds on a Raspi-Zero, which means around or less than 1 FPS (1 frame per second). One alternative to speed up the process is to use FOMO (Faster Objects, More Objects). -This novel machine learning algorithm lets us count multiple objects and find their location in an image in real-time using up to 30x less processing power and memory than MobileNet SSD or YOLO. The main reason this is possible is that while other models calculate the object's size by drawing a square around it (bounding box), FOMO ignores the size of the image, providing only the information about where the object is located in the image through its centroid coordinates. +This novel machine learning algorithm lets us count multiple objects and find their location in an image in real-time using up to 30x less processing power and memory than MobileNet SSD or YOLO. The main reason this is possible is that while other models calculate the object’s size by drawing a square around it (bounding box), FOMO ignores the size of the image, providing only the information about where the object is located in the image through its centroid coordinates. ### How FOMO works? -In a typical object detection pipeline, the first stage is to extract features from the input image. **FOMO leverages MobileNetV2 to perform this task**. MobileNetV2 processes the input image to produce a feature map that captures essential characteristics, such as textures, shapes, and object edges, in a computationally efficient way. +In a typical object detection pipeline, the first stage is extracting features from the input image. **FOMO leverages MobileNetV2 to perform this task**. MobileNetV2 processes the input image to produce a feature map that captures essential characteristics, such as textures, shapes, and object edges, in a computationally efficient way. ![](images/png/fomo-1.png) @@ -779,18 +784,18 @@ FOMO divides the image into blocks of pixels using a factor of 8. For the input **Trade-off Between Speed and Precision**: -- **Grid Resolution**: FOMO uses a grid of fixed resolution, meaning each cell of the grid can detect if an object is present in that part of the image. While it doesn't provide high localization accuracy, it makes a trade-off by being fast and computationally light, which is crucial for edge devices. +- **Grid Resolution**: FOMO uses a grid of fixed resolution, meaning each cell can detect if an object is present in that part of the image. While it doesn’t provide high localization accuracy, it makes a trade-off by being fast and computationally light, which is crucial for edge devices. - **Multi-Object Detection**: Since each cell is independent, FOMO can detect multiple objects simultaneously in an image by identifying multiple centers. ### Impulse Design, new Training and Testing -Return to Edge Impulse Studio, and in the `Experiments` tab, create another impulse, where now the input images should be 160x160 (this is the expected input size for MobilenetV2). +Return to Edge Impulse Studio, and in the `Experiments` tab, create another impulse. Now, the input images should be 160x160 (this is the expected input size for MobilenetV2). ![](images/png/impulse-2.png) -On the `Image` tab generate the features and go to `Object detection` tab. +On the `Image` tab, generate the features and go to the `Object detection` tab. -For training, we should select a pre-trained model. Let's use the **FOMO (Faster Objects, More Objects) MobileNetV2 0.35.** +We should select a pre-trained model for training. Let’s use the **FOMO (Faster Objects, More Objects) MobileNetV2 0.35.** ![](images/png/model-choice.png) @@ -800,15 +805,15 @@ Regarding the training hyper-parameters, the model will be trained with: - Batch size: 32 - Learning Rate: 0.001. -For validation during training, 20% of the dataset (*validation_dataset*) will be spared. For the remaining 80% (*train_dataset*), we will not apply Data Augmentation because our dataset was already augmented during the labeling phase at Roboflow. +For validation during training, 20% of the dataset (*validation_dataset*) will be spared. We will not apply Data Augmentation for the remaining 80% (*train_dataset*) because our dataset was already augmented during the labeling phase at Roboflow. As a result, the model ends with an overall F1 score of 93.3% with an impressive latency of 8ms (Raspi-4), around 60X less than we got with the SSD MovileNetV2. ![](images/png/fomo-train-result.png) -> Note that FOMO automatically added a 3rd label background to the two previously defined, *box* (0) and *wheel* (1). +> Note that FOMO automatically added a third label background to the two previously defined *boxes* (0) and *wheels* (1). -On the `Model testing` tab, we can see that the accuracy was 94%. Here one of the test sample result: +On the `Model testing` tab, we can see that the accuracy was 94%. Here is one of the test sample results: ![](images/png/fomo-test.png) @@ -885,13 +890,13 @@ Run a notebook locally (on the Raspi-4 or 5 with desktop) jupyter notebook ``` -or on browser in your computer: +or on the browser on your computer: ```bash jupyter notebook --ip=192.168.4.210 --no-browser ``` -Let's start a new [notebook](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/blob/main/OBJ_DETEC/notebooks/EI-Linux-FOMO.ipynb) to follow all the steps to detect cubes and wheels on an image using the FOMO model and the Edge Impulse Linux Python SDK. +Let's start a new [notebook](https://github.com/Mjrovai/EdgeML-with-Raspberry-Pi/blob/main/OBJ_DETEC/notebooks/EI-Linux-FOMO.ipynb) by following all the steps to detect cubes and wheels on an image using the FOMO model and the Edge Impulse Linux Python SDK. Import the needed libraries: @@ -925,7 +930,7 @@ runner = ImageImpulseRunner(model_path) model_info = runner.init() ``` -The `model_info` will contain the critical information of our model. Still, different from what we did with the TFLite interpreter, now the EI Linux Python SDK library will be responsible for preparing the model for inference. +The `model_info` will contain critical information about our model. However, unlike the TFLite interpreter, the EI Linux Python SDK library will now prepare the model for inference. So, let's open the image and show it (Now, for compatibility, we will use OpenCV, the CV Library used internally by EI. OpenCV reads the image as BGR, so we will need to convert it to RGB : @@ -973,7 +978,7 @@ Found 2 bounding boxes (29 ms.) 0 (0.75): x=48 y=56 w=8 h=8 ``` -From the results, we can see that 2 objects were detected: one with class ID 0 (`box`) and one with class ID 1 (`wheel`), which is correct! +The results show that two objects were detected: one with class ID 0 (`box`) and one with class ID 1 (`wheel`), which is correct! Let's visualize the result (The ` threshold` is 0.5, the default value set during the model testing on the Edge Impulse Studio). @@ -1009,7 +1014,7 @@ plt.show() ## Exploring a YOLO Model using Ultralitics -For this lab, we will explore the YOLOv8. [Ultralytics](https://ultralytics.com/) [YOLOv8](https://github.com/ultralytics/ultralytics) is a version of the acclaimed real-time object detection and image segmentation model, YOLO. YOLOv8 is built on cutting-edge advancements in deep learning and computer vision, offering unparalleled performance in terms of speed and accuracy. Its streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs. +For this lab, we will explore YOLOv8. [Ultralytics](https://ultralytics.com/) [YOLOv8](https://github.com/ultralytics/ultralytics) is a version of the acclaimed real-time object detection and image segmentation model, YOLO. YOLOv8 is built on cutting-edge advancements in deep learning and computer vision, offering unparalleled performance in terms of speed and accuracy. Its streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs. ### Talking about the YOLO Model @@ -1092,7 +1097,14 @@ sudo reboot ### Testing the YOLO -After the Raspi-Zero booting, let's go to the working directory and run inference on an image that will be downloaded from the Ultralytics website, using the YOLOV8n model (the smallest in the family) at the Terminal (CLI): +After the Raspi-Zero booting, let's activate the `yolo` env, go to the working directory, + +```bash +source ~/yolo/bin/activate +cd /Documents/YOLO +``` + +and run inference on an image that will be downloaded from the Ultralytics website, using the YOLOV8n model (the smallest in the family) at the Terminal (CLI): ```bash yolo predict model='yolov8n' source='https://ultralytics.com/images/bus.jpg' @@ -1112,7 +1124,7 @@ So, the Ultrayitics YOLO is correctly installed on our Raspi. But, on the Raspi- ### Export Model to NCNN format -The latency issue is a reality of deploying computer vision models on edge devices with limited computational power, such as the Rasp-Zero. One alternative is to use a format optimized for optimal performance. This ensures that even devices with limited processing power can handle advanced computer vision tasks well. +Deploying computer vision models on edge devices with limited computational power, such as the Raspi-Zero, can cause latency issues. One alternative is to use a format optimized for optimal performance. This ensures that even devices with limited processing power can handle advanced computer vision tasks well. Of all the model export formats supported by Ultralytics, the [NCNN](https://docs.ultralytics.com/integrations/ncnn) is a high-performance neural network inference computing framework optimized for mobile platforms. From the beginning of the design, NCNN was deeply considerate about deployment and use on mobile phones and did not have third-party dependencies. It is cross-platform and runs faster than all known open-source frameworks (such as TFLite). @@ -1132,6 +1144,8 @@ yolo export model=yolov8n.pt format=ncnn yolo predict model='./yolov8n_ncnn_model' source='bus.jpg' ``` +> The first inference, when the model is loaded, usually has a high latency (around 17s), but from the 2nd, it is possible to note that the inference goes down to around 2s. + ### Exploring YOLO with Python To start, let's call the Python Interpreter so we can explore how the YOLO model works, line by line: @@ -1156,7 +1170,7 @@ result = model.predict(img, save=True, imgsz=640, conf=0.5, iou=0.3) ![](images/png/python-infer-bus.png) -We can verify that the result is almost the same as the one we get running the inference at the terminal level (CLI), except that the bus-stop was not detected with the reduced model. Note that the latency was reduced. +We can verify that the result is almost identical to the one we get running the inference at the terminal level (CLI), except that the bus stop was not detected with the reduced NCNN model. Note that the latency was reduced. Let's analyze the "result" content. @@ -1224,7 +1238,7 @@ Return to our "Boxe versus Wheel" dataset, labeled on [Roboflow](https://univers ![](images/png/dataset_code.png) -For training, let's adapt one of the public examples available from Ultralitytics and run it on Google Colab. Bellow, you can find mine to be adapted in your project: +For training, let's adapt one of the public examples available from Ultralitytics and run it on Google Colab. Below, you can find mine to be adapted in your project: - YOLOv8 Box versus Wheel Dataset Training [[Open In Colab]](https://colab.research.google.com/github.com/Mjrovai/EdgeML-with-Raspberry-Pi/blob/main/OBJ_DETEC/notebooks/yolov8_box_vs_wheel.ipynb) @@ -1350,11 +1364,11 @@ We can see that the inference result is excellent! The model was trained based o ## Object Detection on a live stream -All the models explored in this lab can be used to detect objects in real time using a camera, where the captured image should be the input for the trained and converted model. For the Raspi-4 or 5 with a desktop, OpenCV can be used to capture the frames and display the inference result. +All the models explored in this lab can detect objects in real-time using a camera. The captured image should be the input for the trained and converted model. For the Raspi-4 or 5 with a desktop, OpenCV can capture the frames and display the inference result. -But it is also possible to create a live stream with a webcam to detect objects in real time. For example, let's start with the script developed for the Image Classification app and adapt it for a *Real-Time Object Detection Web Application Using TensorFlow Lite and Flask*. +However, creating a live stream with a webcam to detect objects in real-time is also possible. For example, let’s start with the script developed for the Image Classification app and adapt it for a *Real-Time Object Detection Web Application Using TensorFlow Lite and Flask*. -This app version will work for all TFLite models. Verify if the model is in its correct folder:, for example: +This app version will work for all TFLite models. Verify if the model is in its correct folder, for example: ```python model_path = "./models/ssd-mobilenet-v1-tflite-default-v1.tflite" @@ -1393,16 +1407,16 @@ Let's see a technical description of the key modules used in the object detectio - Key functions: `create_preview_configuration()` for setting up the camera, `capture_file()` for capturing frames. 4. **PIL (Python Imaging Library):** - Purpose: Image processing and manipulation. - - Why: PIL provides a wide range of image processing capabilities. It's used here for resizing images, drawing bounding boxes, and converting between image formats. + - Why: PIL provides a wide range of image processing capabilities. It’s used here to resize images, draw bounding boxes, and convert between image formats. - Key classes: `Image` for loading and manipulating images, `ImageDraw` for drawing shapes and text on images. 5. **NumPy:** - Purpose: Efficient array operations and numerical computing. - - Why: NumPy's array operations are much faster than pure Python lists, which is crucial for processing image data and model inputs/outputs efficiently. + - Why: NumPy’s array operations are much faster than pure Python lists, which is crucial for efficiently processing image data and model inputs/outputs. - Key functions: `array()` for creating arrays, `expand_dims()` for adding dimensions to arrays. 6. **Threading:** - Purpose: Concurrent execution of tasks. - - Why: Threading allows simultaneous frame capture, object detection, and web server operation, which is crucial for maintaining real-time performance. - - Key components: `Thread` class for creating separate execution threads, `Lock` for thread synchronization. + - Why: Threading allows simultaneous frame capture, object detection, and web server operation, crucial for maintaining real-time performance. + - Key components: `Thread` class creates separate execution threads, and Lock is used for thread synchronization. 7. **io.BytesIO:** - Purpose: In-memory binary streams. - Why: Allows efficient handling of image data in memory without needing temporary files, improving speed and reducing I/O operations. @@ -1419,7 +1433,7 @@ Regarding the main app system architecture: 1. **Main Thread**: Runs the Flask server, handling HTTP requests and serving the web interface. 2. **Camera Thread**: Continuously captures frames from the camera. 3. **Detection Thread**: Processes frames through the TFLite model for object detection. -4. **Frame Buffer**: Shared memory space (protected by locks) where the latest frame and detection results are stored. +4. **Frame Buffer**: Shared memory space (protected by locks) storing the latest frame and detection results. And the app data flow, we can describe in short: @@ -1428,7 +1442,7 @@ And the app data flow, we can describe in short: 3. Flask routes access Frame Buffer to serve the latest frame and detection results 4. Web client receives updates via AJAX and updates UI -This architecture allows for efficient, real-time object detection while maintaining a responsive web interface, all running on a resource-constrained edge device like a Raspberry Pi. The use of threading and efficient libraries like TFLite and PIL enables the system to process video frames in real-time, while Flask and jQuery provide a user-friendly way to interact with the system. +This architecture allows for efficient, real-time object detection while maintaining a responsive web interface running on a resource-constrained edge device like a Raspberry Pi. Threading and efficient libraries like TFLite and PIL enable the system to process video frames in real-time, while Flask and jQuery provide a user-friendly way to interact with them. You can test the app with another pre-processed model, such as the EfficientDet, changing the app line: @@ -1446,7 +1460,7 @@ This lab has explored the implementation of object detection on edge devices lik 2. **Training and Deployment**: Using a custom dataset of boxes and wheels (labeled on Roboflow), we walked through the process of training models using Edge Impulse Studio and Ultralytics and deploying them on Raspberry Pi. -3. **Optimization Techniques**: We explored various optimization methods, such as model quantization (TFLite int8) and format conversion (e.g., to NCNN), to improve inference speed on edge devices. +3. **Optimization Techniques**: To improve inference speed on edge devices, we explored various optimization methods, such as model quantization (TFLite int8) and format conversion (e.g., to NCNN). 4. **Real-time Applications**: The lab exemplified a real-time object detection web application, demonstrating how these models can be integrated into practical, interactive systems.