Add RGB gesture recognition (opendr-eu#436)

* init commit * uncomment debug * demo fixes * demo fixes * ros2 node * learner update * added test * added docs and dependencies * update demo readme * added readme * minor learner fix * linter fixes * linter fixes * linter fixes * license fix * demo fix * docs fix * Add test to tests_suite.yml * Add test to tests_suite_develop.yml * Add test to test_packages.yml * Update docs/reference/gesture-recognition-learner.md Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> * Update docs/reference/gesture-recognition-learner.md Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> * Update docs/reference/index.md Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> * Update projects/python/perception/gesture_recognition/README.md Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> * Update tests/sources/tools/perception/gesture_recognition/test_gesture_recognition.py Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> * Update projects/opendr_ws/src/opendr_perception/scripts/gesture_recognition_node.py Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> * added performance profiling to ROS1 node * style fix * Fixed argument type * Already exists check before re-downloading * Renamed demo to webcam_demo * Renamed demo to webcam_demo in readme * added ROS/ROS2 node documentation and changed output topic names * ROS2 README fix * Added backwards compatible float conversion in ROS2 bridge to_ros_boxes * Some minor fixes in ROS1 gesture topic names * Some minor fixes in ROS1 gesture topic names and learner link in doc * Some minor fixes in ROS2 gesture topic names * Some minor fixes in ROS2 gesture learner link, run command, topic names in doc --------- Co-authored-by: Stefania Pedrazzi <stefaniapedrazzi@users.noreply.github.com> Co-authored-by: tsampazk <27914645+tsampazk@users.noreply.github.com>
lucamarchionni · Jun 29, 2023 · 993123b · 993123b
1 parent 5184997
commit 993123b
Show file tree

Hide file tree

Showing 23 changed files with 1,642 additions and 4 deletions.
diff --git a/.github/workflows/test_packages.yml b/.github/workflows/test_packages.yml
@@ -33,6 +33,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -93,6 +94,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn

diff --git a/.github/workflows/tests_suite.yml b/.github/workflows/tests_suite.yml
@@ -61,6 +61,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -172,6 +173,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -258,6 +260,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -362,6 +365,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn

diff --git a/.github/workflows/tests_suite_develop.yml b/.github/workflows/tests_suite_develop.yml
@@ -62,6 +62,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -176,6 +177,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -266,6 +268,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn
@@ -376,6 +379,7 @@ jobs:
           - perception/multimodal_human_centric
           - perception/pose_estimation
           - perception/fall_detection
+          - perception/gesture_recognition
           - perception/speech_recognition
           - perception/skeleton_based_action_recognition/costgcn
           - perception/skeleton_based_action_recognition/pstgcn

diff --git a/docs/reference/gesture-recognition-learner.md b/docs/reference/gesture-recognition-learner.md
@@ -0,0 +1,217 @@
+## gesture_recognition module
+
+The *gesture_recognition* module contains the *GestureRecognitionLearner* class and can be used to recognize and localize 18 hand gestures.
+The module relies on Nanodet object detection module.
+We provide data processing scripts and a pre-trained model for [Hagrid dataset](https://github.com/hukenovs/hagrid/tree/master).
+
+### Class GestureRecognitionLearner
+Bases: `object_detection_2d.nanodet.NanodetLearner`
+
+The learner has the following public methods:
+
+#### `GestureRecognitionLearner` constructor
+```python
+GestureRecognitionLearner(self, model_to_use, iters, lr, batch_size, checkpoint_after_iter, checkpoint_load_iter, temp_path, device,
+               weight_decay, warmup_steps, warmup_ratio, lr_schedule_T_max, lr_schedule_eta_min, grad_clip)
+```
+
+Constructor parameters:
+
+- **model_to_use**: *{"plus_m_1.5x_416"}, default=plus_m_1.5x_416*\
+  Specifies the model to use and the config file. Currently plus_m_1.5x_416 is supported, while other models can be created following the config file.
+- **iters**: *int, default=None*\
+  Specifies the number of epochs the training should run for.
+- **lr**: *float, default=None*\
+  Specifies the initial learning rate to be used during training.
+- **batch_size**: *int, default=None*\
+  Specifies number of images to be bundled up in a batch during training.
+  This heavily affects memory usage, adjust according to your system.
+- **checkpoint_after_iter**: *int, default=None*\
+  Specifies per how many training iterations a checkpoint should be saved.
+  If it is set to 0 no checkpoints will be saved.
+- **checkpoint_load_iter**: *int, default=None*\
+  Specifies which checkpoint should be loaded.
+  If it is set to 0, no checkpoints will be loaded.
+- **temp_path**: *str, default=''*\
+  Specifies a path where the algorithm looks for saving the checkpoints along with the logging files. If *''* the `cfg.save_dir` will be used instead.
+- **device**: *{'cpu', 'cuda'}, default='cuda'*\
+  Specifies the device to be used.
+- **weight_decay**: *float, default=None*
+- **warmup_steps**: *int, default=None*
+- **warmup_ratio**: *float, default=None*
+- **lr_schedule_T_max**: *int, default=None*
+- **lr_schedule_eta_min**: *float, default=None*
+- **grad_clip**: *int, default=None*
+
+#### `GestureRecognitionLearner.preprocess_data`
+```python
+GestureRecognitionLearner.preprocess_data(self, preprocess, download, verbose, save_path)
+```
+
+This method is used for downloading the [gesture recognition dataset](https://github.com/hukenovs/hagrid/tree/master) and preprocessing it to COCO format.
+
+Parameters:
+
+- **preprocess**: *bool, default=True*\
+  Indicates whether to preprocess data located in save_path to COCO format.
+- **download** : *bool, default=False*\
+  Indicates whether to download data to save_path.
+- **verbose** : *bool, default=True*\
+  Enables verbosity.
+- **save_path** : *str, default='./data'*\
+  Path where to save data or where the downloaded data that needs to be preprocessed is located.
+
+#### `GestureRecognitionLearner.fit`
+```python
+GestureRecognitionLearner.fit(self, dataset, val_dataset, logging_path, verbose, logging, seed, local_rank)
+```
+
+This method is used for training the algorithm on a train dataset and validating on a val dataset.
+
+Parameters:
+
+- **dataset**: *object*\
+  Object that holds the training dataset of `ExternalDataset` type.
+- **val_dataset** : *object, default=None*\
+  Object that holds the validation dataset of `ExternalDataset` type.
+- **logging_path** : *str, default=''*\
+  Subdirectory in temp_path to save log files and TensorBoard.
+- **verbose** : *bool, default=True*\
+  Enables verbosity.
+- **logging** : *bool, default=False*\
+  Enables the maximum verbosity and the logger.
+- **seed** : *int, default=123*\
+  Seed for repeatability.
+- **local_rank** : *int, default=1*\
+  Needed if training on multiple machines.
+
+#### `GestureRecognitionLearner.eval`
+```python
+GestureRecognitionLearner.eval(self, dataset, verbose, logging, local_rank)
+```
+
+This method is used to evaluate a trained model on an evaluation dataset.
+Saves a txt logger file containing stats regarding evaluation.
+
+Parameters:
+
+- **dataset** : *object*\
+  Object that holds the evaluation dataset of type `ExternalDataset`.
+- **verbose**: *bool, default=True*\
+  Enables verbosity.
+- **logging**: *bool, default=False*\
+  Enables the maximum verbosity and logger.
+- **local_rank** : *int, default=1*\
+  Needed if evaluating on multiple machines.
+
+#### `GestureRecognitionLearner.infer`
+```python
+GestureRecognitionLearner.infer(self, input, conf_threshold, iou_threshold, nms_max_num)
+```
+
+This method is used to perform gesture recognition (detection) on an image.
+Returns an `engine.target.BoundingBoxList` object, which contains bounding boxes that are described by the top-left corner and
+their width and height, or returns an empty list if no detections were made on the input image.
+
+Parameters:
+- **input** : *object*\
+  Object of type engine.data.Image.
+  Image type object to perform inference on.
+- **conf_threshold**: *float, default=0.35*\
+  Specifies the threshold for gesture detection inference.
+  An object is detected if the confidence of the output is higher than the specified threshold.
+- **iou_threshold**: *float, default=0.6*\
+  Specifies the IOU threshold for NMS in inference.
+- **nms_max_num**: *int, default=100*\
+  Determines the maximum number of bounding boxes that will be retained following the nms.
+
+#### `GestureRecognitionLearner.save`
+```python
+GestureRecognitionLearner.save(self, path, verbose)
+```
+
+This method is used to save a trained model with its metadata.
+Provided with the path, it creates the *path* directory, if it does not already exist.
+Inside this folder, the model is saved as *nanodet_{model_name}.pth* and a metadata file *nanodet_{model_name}.json*.
+If the directory already exists, the *nanodet_{model_name}.pth* and *nanodet_{model_name}.json* files are overwritten.
+If optimization is performed, the optimized model is saved instead.
+
+Parameters:
+
+- **path**: *str, default=None*\
+  Path to save the model, if None it will be `"temp_folder"` or `"cfg.save_dir"` from the learner.
+- **verbose**: *bool, default=True*\
+  Enables the maximum verbosity and logger.
+
+#### `GestureRecognitionLearner.load`
+```python
+GestureRecognitionLearner.load(self, path, verbose)
+```
+
+This method is used to load a previously saved model from its saved folder.
+Loads the model from inside the directory of the path provided, using the metadata .json file included.
+If optimization is performed, the optimized model is loaded instead.
+
+Parameters:
+
+- **path**: *str, default=None*\
+  Path of the model to be loaded.
+- **verbose**: *bool, default=True*\
+  Enables the maximum verbosity.
+
+#### `GestureRecognitionLearner.download`
+```python
+GestureRecognitionLearner.download(self, path, model, verbose, url)
+```
+
+Downloads the provided pretrained model.
+
+Parameters:
+
+- **path**: *str, default=None*\
+  Specifies the folder where data will be downloaded. If *None*, the *self.temp_path* directory is used instead.
+- **verbose**: *bool, default=True*\
+  Enables the maximum verbosity.
+- **url**: *str, default=OpenDR FTP URL*\
+  URL of the FTP server.
+
+#### Examples
+
+* **Training example**
+
+  ```python
+  from opendr.perception.gesture_recognition.gesture_recognition_learner import GestureRecognitionLearner
+
+
+  if __name__ == '__main__':
+    model_save_dir = './save_dir/'
+    data_save_dir = './data/'
+
+    gesture_model = GestureRecognitionLearner(model_to_use='plus_m_1.5x_416', iters=100, lr=1e-3, batch_size=32,checkpoint_after_iter=1, checkpoint_load_iter=0, device="cuda", temp_path = model_save_dir)
+
+    dataset, val_dataset, test_dataset = gesture_model.preprocess_data(preprocess=True, download=True, verbose=True, save_path=data_save_dir)
+
+    gesture_model.fit(dataset, val_dataset, logging_path = './logs', logging=True)
+    gesture_model.save()
+
+  ```
+
+* **Inference and result drawing example on a test image**
+
+  This example shows how to perform inference on an image and draw the resulting bounding boxes
+
+  ```python
+  from opendr.perception.gesture_recognition.gesture_recognition_learner import GestureRecognitionLearner
+  from opendr.engine.data import Image
+  from opendr.perception.object_detection_2d import draw_bounding_boxes
+
+  if __name__ == '__main__':
+    gesture_model = GestureRecognitionLearner(model_to_use='plus_m_1.5x_416')
+    gesture_model.download("./")
+    gesture_model.load("./nanodet_plus_m_1.5x_416", verbose=True)
+    img = Image.open("./test_image.jpg")
+    boxes = gesture_model.infer(input=img)
+
+    draw_bounding_boxes(img.opencv(), boxes, class_names=gesture_model.classes, show=True)
+  ```
+
diff --git a/docs/reference/index.md b/docs/reference/index.md
@@ -32,6 +32,8 @@ Neither the copyright holder nor any applicable licensor will be liable for any
         - pose estimation:
             - [lightweight_open_pose Module](lightweight-open-pose.md)
             - [high_resolution_pose_estimation Module](high-resolution-pose-estimation.md)
+        - gesture recognition:
+            - [gesture_recognition Module](gesture-recognition-learner.md)
         - activity recognition:
             - [skeleton-based action recognition](skeleton-based-action-recognition.md)
             - [continual skeleton-based action recognition Module](skeleton-based-action-recognition.md#class-costgcnlearner)

diff --git a/projects/opendr_ws/README.md b/projects/opendr_ws/README.md
@@ -82,6 +82,7 @@ Currently, apart from tools, opendr_ws contains the following ROS nodes (categor
 14. [Landmark-based Facial Expression Recognition](src/opendr_perception/README.md#landmark-based-facial-expression-recognition-ros-node)
 15. [Skeleton-based Human Action Recognition](src/opendr_perception/README.md#skeleton-based-human-action-recognition-ros-nodes)
 16. [Video Human Activity Recognition](src/opendr_perception/README.md#video-human-activity-recognition-ros-node)
+17. [RGB Hand Gesture Recognition](src/opendr_perception/README.md#rgb-gesture-recognition-ros-node)
 
 ## RGB + Infrared input
 1. [End-to-End Multi-Modal Object Detection (GEM)](src/opendr_perception/README.md#2d-object-detection-gem-ros-node)

diff --git a/projects/opendr_ws/src/opendr_perception/README.md b/projects/opendr_ws/src/opendr_perception/README.md
@@ -746,6 +746,31 @@ The node makes use of the toolkit's video human activity recognition tools which
 
 You can find the corresponding IDs regarding activity recognition [here](https://github.com/opendr-eu/opendr/blob/master/src/opendr/perception/activity_recognition/datasets/kinetics400_classes.csv).
 
+### RGB Gesture Recognition ROS Node
+
+For gesture recognition, the ROS [node](./scripts/gesture_recognition_node.py) is based on the gesture recognition learner defined [here](../../../../src/opendr/perception/gesture_recognition/gesture_recognition_learner.py), and the documentation of the learner can be found [here](../../../../docs/reference/gesture-recognition-learner.md).
+
+#### Instructions for basic usage:
+
+1. Start the node responsible for publishing images. If you have a USB camera, then you can use the `usb_cam_node` as explained in the [prerequisites above](#prerequisites).
+
+2. Start the gesture recognition node:
+   ```shell
+   rosrun opendr_perception gesture_recognition_node.py
+   ```
+   The following arguments are available:
+   - `-i or --input_rgb_image_topic INPUT_RGB_IMAGE_TOPIC`: topic name for input RGB image (default=`/usb_cam/image_raw`)
+   - `-o or --output_rgb_image_topic OUTPUT_RGB_IMAGE_TOPIC`: topic name for output annotated RGB image (default=`/opendr/image_gesture_annotated`)
+   - `-d or --detections_topic DETECTIONS_TOPIC`: topic name for detection messages (default=`/opendr/gestures`)
+   - `--performance_topic PERFORMANCE_TOPIC`: topic name for performance messages (default=`None`, disabled)
+   - `--device DEVICE`: Device to use, either `cpu` or `cuda`, falls back to `cpu` if GPU or CUDA is not found (default=`cuda`)
+   - `--threshold THRESHOLD`: Confidence threshold for predictions (default=0.5)
+   - `--model MODEL`: Config file name of the model that will be used (default=`plus_m_1.5x_416)`
+
+3. Default output topics:
+   - Output images: `/opendr/image_gesture_annotated`
+   - Detection messages: `/opendr/gestures`
+
 ## RGB + Infrared input
 
 ### 2D Object Detection GEM ROS Node