Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RGB gesture recognition #436

Merged
merged 41 commits into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5f902e0
init commit
katerynaCh May 16, 2023
dbe02c7
uncomment debug
katerynaCh May 16, 2023
7253458
demo fixes
katerynaCh May 16, 2023
018dac1
demo fixes
katerynaCh May 16, 2023
fc3fbc0
ros2 node
katerynaCh May 17, 2023
276137b
learner update
katerynaCh May 17, 2023
2e57c06
Merge branch 'gesture' of https://github.com/opendr-eu/opendr into ge…
katerynaCh May 17, 2023
616a0d3
added test
katerynaCh May 17, 2023
5d8c64f
added docs and dependencies
katerynaCh May 17, 2023
2c3c47d
update demo readme
katerynaCh May 17, 2023
7a54395
added readme
katerynaCh May 18, 2023
309a7c8
minor learner fix
katerynaCh May 18, 2023
bb44c94
linter fixes
katerynaCh May 18, 2023
fd92133
linter fixes
katerynaCh May 18, 2023
14f9c2c
linter fixes
katerynaCh May 18, 2023
e3d0874
license fix
katerynaCh May 18, 2023
580eca2
demo fix
katerynaCh May 26, 2023
6fa1943
docs fix
katerynaCh May 28, 2023
b0aeb92
Merge branch 'develop' into gesture
stefaniapedrazzi Jun 6, 2023
22569ab
Add test to tests_suite.yml
stefaniapedrazzi Jun 7, 2023
d05ece1
Add test to tests_suite_develop.yml
stefaniapedrazzi Jun 7, 2023
d50c991
Add test to test_packages.yml
stefaniapedrazzi Jun 7, 2023
df5470d
Update docs/reference/gesture-recognition-learner.md
katerynaCh Jun 8, 2023
1fa6d6f
Update docs/reference/gesture-recognition-learner.md
katerynaCh Jun 8, 2023
8ba2fbd
Update docs/reference/index.md
katerynaCh Jun 8, 2023
bdb8a25
Update projects/python/perception/gesture_recognition/README.md
katerynaCh Jun 8, 2023
79c09c7
Update tests/sources/tools/perception/gesture_recognition/test_gestur…
katerynaCh Jun 8, 2023
ba2bcd9
Update projects/opendr_ws/src/opendr_perception/scripts/gesture_recog…
katerynaCh Jun 8, 2023
7c7500f
added performance profiling to ROS1 node
katerynaCh Jun 8, 2023
018baa9
style fix
katerynaCh Jun 8, 2023
9532f49
Fixed argument type
tsampazk Jun 28, 2023
cdbcc19
Already exists check before re-downloading
tsampazk Jun 28, 2023
e770167
Renamed demo to webcam_demo
tsampazk Jun 28, 2023
91c783e
Renamed demo to webcam_demo in readme
tsampazk Jun 28, 2023
0de78b7
added ROS/ROS2 node documentation and changed output topic names
katerynaCh Jun 28, 2023
5c8bbde
ROS2 README fix
katerynaCh Jun 28, 2023
e80a50b
Added backwards compatible float conversion in ROS2 bridge to_ros_boxes
tsampazk Jun 28, 2023
e01b25b
Some minor fixes in ROS1 gesture topic names
tsampazk Jun 28, 2023
ed4550e
Some minor fixes in ROS1 gesture topic names and learner link in doc
tsampazk Jun 28, 2023
abe5f5b
Some minor fixes in ROS2 gesture topic names
tsampazk Jun 28, 2023
8643fc2
Some minor fixes in ROS2 gesture learner link, run command, topic nam…
tsampazk Jun 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test_packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -93,6 +94,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/tests_suite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -172,6 +173,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -258,6 +260,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -362,6 +365,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/tests_suite_develop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -176,6 +177,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -266,6 +268,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down Expand Up @@ -376,6 +379,7 @@ jobs:
- perception/multimodal_human_centric
- perception/pose_estimation
- perception/fall_detection
- perception/gesture_recognition
- perception/speech_recognition
- perception/skeleton_based_action_recognition/costgcn
- perception/skeleton_based_action_recognition/pstgcn
Expand Down
217 changes: 217 additions & 0 deletions docs/reference/gesture-recognition-learner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
## gesture_recognition module

The *gesture_recognition* module contains the *GestureRecognitionLearner* class and can be used to recognize and localize 18 hand gestures.
The module relies on Nanodet object detection module.
We provide data processing scripts and a pre-trained model for [Hagrid dataset](https://github.com/hukenovs/hagrid/tree/master).

### Class GestureRecognitionLearner
Bases: `object_detection_2d.nanodet.NanodetLearner`

The learner has the following public methods:

#### `GestureRecognitionLearner` constructor
```python
GestureRecognitionLearner(self, model_to_use, iters, lr, batch_size, checkpoint_after_iter, checkpoint_load_iter, temp_path, device,
weight_decay, warmup_steps, warmup_ratio, lr_schedule_T_max, lr_schedule_eta_min, grad_clip)
```

Constructor parameters:

- **model_to_use**: *{"plus_m_1.5x_416"}, default=plus_m_1.5x_416*\
Specifies the model to use and the config file. Currently plus_m_1.5x_416 is supported, while other models can be created following the config file.
- **iters**: *int, default=None*\
Specifies the number of epochs the training should run for.
- **lr**: *float, default=None*\
Specifies the initial learning rate to be used during training.
- **batch_size**: *int, default=None*\
Specifies number of images to be bundled up in a batch during training.
This heavily affects memory usage, adjust according to your system.
- **checkpoint_after_iter**: *int, default=None*\
Specifies per how many training iterations a checkpoint should be saved.
If it is set to 0 no checkpoints will be saved.
- **checkpoint_load_iter**: *int, default=None*\
Specifies which checkpoint should be loaded.
If it is set to 0, no checkpoints will be loaded.
- **temp_path**: *str, default=''*\
Specifies a path where the algorithm looks for saving the checkpoints along with the logging files. If *''* the `cfg.save_dir` will be used instead.
- **device**: *{'cpu', 'cuda'}, default='cuda'*\
Specifies the device to be used.
- **weight_decay**: *float, default=None*
- **warmup_steps**: *int, default=None*
- **warmup_ratio**: *float, default=None*
- **lr_schedule_T_max**: *int, default=None*
- **lr_schedule_eta_min**: *float, default=None*
- **grad_clip**: *int, default=None*

#### `GestureRecognitionLearner.preprocess_data`
```python
GestureRecognitionLearner.preprocess_data(self, preprocess, download, verbose, save_path)
```

This method is used for downloading the [gesture recognition dataset](https://github.com/hukenovs/hagrid/tree/master) and preprocessing it to COCO format.

Parameters:

- **preprocess**: *bool, default=True*\
Indicates whether to preprocess data located in save_path to COCO format.
- **download** : *bool, default=False*\
Indicates whether to download data to save_path.
- **verbose** : *bool, default=True*\
Enables verbosity.
- **save_path** : *str, default='./data'*\
Path where to save data or where the downloaded data that needs to be preprocessed is located.

#### `GestureRecognitionLearner.fit`
```python
GestureRecognitionLearner.fit(self, dataset, val_dataset, logging_path, verbose, logging, seed, local_rank)
```

This method is used for training the algorithm on a train dataset and validating on a val dataset.

Parameters:

- **dataset**: *object*\
Object that holds the training dataset of `ExternalDataset` type.
- **val_dataset** : *object, default=None*\
Object that holds the validation dataset of `ExternalDataset` type.
- **logging_path** : *str, default=''*\
Subdirectory in temp_path to save log files and TensorBoard.
- **verbose** : *bool, default=True*\
Enables verbosity.
- **logging** : *bool, default=False*\
Enables the maximum verbosity and the logger.
- **seed** : *int, default=123*\
Seed for repeatability.
- **local_rank** : *int, default=1*\
Needed if training on multiple machines.

#### `GestureRecognitionLearner.eval`
```python
GestureRecognitionLearner.eval(self, dataset, verbose, logging, local_rank)
```

This method is used to evaluate a trained model on an evaluation dataset.
Saves a txt logger file containing stats regarding evaluation.

Parameters:

- **dataset** : *object*\
Object that holds the evaluation dataset of type `ExternalDataset`.
- **verbose**: *bool, default=True*\
Enables verbosity.
- **logging**: *bool, default=False*\
Enables the maximum verbosity and logger.
- **local_rank** : *int, default=1*\
Needed if evaluating on multiple machines.

#### `GestureRecognitionLearner.infer`
```python
GestureRecognitionLearner.infer(self, input, conf_threshold, iou_threshold, nms_max_num)
```

This method is used to perform gesture recognition (detection) on an image.
Returns an `engine.target.BoundingBoxList` object, which contains bounding boxes that are described by the top-left corner and
their width and height, or returns an empty list if no detections were made on the input image.

Parameters:
- **input** : *object*\
Object of type engine.data.Image.
Image type object to perform inference on.
- **conf_threshold**: *float, default=0.35*\
Specifies the threshold for gesture detection inference.
An object is detected if the confidence of the output is higher than the specified threshold.
- **iou_threshold**: *float, default=0.6*\
Specifies the IOU threshold for NMS in inference.
- **nms_max_num**: *int, default=100*\
Determines the maximum number of bounding boxes that will be retained following the nms.

#### `GestureRecognitionLearner.save`
```python
GestureRecognitionLearner.save(self, path, verbose)
```

This method is used to save a trained model with its metadata.
Provided with the path, it creates the *path* directory, if it does not already exist.
Inside this folder, the model is saved as *nanodet_{model_name}.pth* and a metadata file *nanodet_{model_name}.json*.
If the directory already exists, the *nanodet_{model_name}.pth* and *nanodet_{model_name}.json* files are overwritten.
If optimization is performed, the optimized model is saved instead.

Parameters:

- **path**: *str, default=None*\
Path to save the model, if None it will be `"temp_folder"` or `"cfg.save_dir"` from the learner.
- **verbose**: *bool, default=True*\
Enables the maximum verbosity and logger.

#### `GestureRecognitionLearner.load`
```python
GestureRecognitionLearner.load(self, path, verbose)
```

This method is used to load a previously saved model from its saved folder.
Loads the model from inside the directory of the path provided, using the metadata .json file included.
If optimization is performed, the optimized model is loaded instead.

Parameters:

- **path**: *str, default=None*\
Path of the model to be loaded.
- **verbose**: *bool, default=True*\
Enables the maximum verbosity.

#### `GestureRecognitionLearner.download`
```python
GestureRecognitionLearner.download(self, path, model, verbose, url)
```

Downloads the provided pretrained model.

Parameters:

- **path**: *str, default=None*\
Specifies the folder where data will be downloaded. If *None*, the *self.temp_path* directory is used instead.
- **verbose**: *bool, default=True*\
Enables the maximum verbosity.
- **url**: *str, default=OpenDR FTP URL*\
URL of the FTP server.

#### Examples

* **Training example**

```python
from opendr.perception.gesture_recognition.gesture_recognition_learner import GestureRecognitionLearner


if __name__ == '__main__':
model_save_dir = './save_dir/'
data_save_dir = './data/'

gesture_model = GestureRecognitionLearner(model_to_use='plus_m_1.5x_416', iters=100, lr=1e-3, batch_size=32,checkpoint_after_iter=1, checkpoint_load_iter=0, device="cuda", temp_path = model_save_dir)

dataset, val_dataset, test_dataset = gesture_model.preprocess_data(preprocess=True, download=True, verbose=True, save_path=data_save_dir)

gesture_model.fit(dataset, val_dataset, logging_path = './logs', logging=True)
gesture_model.save()

```

* **Inference and result drawing example on a test image**

This example shows how to perform inference on an image and draw the resulting bounding boxes

```python
from opendr.perception.gesture_recognition.gesture_recognition_learner import GestureRecognitionLearner
from opendr.engine.data import Image
from opendr.perception.object_detection_2d import draw_bounding_boxes

if __name__ == '__main__':
gesture_model = GestureRecognitionLearner(model_to_use='plus_m_1.5x_416')
gesture_model.download("./")
gesture_model.load("./nanodet_plus_m_1.5x_416", verbose=True)
img = Image.open("./test_image.jpg")
boxes = gesture_model.infer(input=img)

draw_bounding_boxes(img.opencv(), boxes, class_names=gesture_model.classes, show=True)
```

2 changes: 2 additions & 0 deletions docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Neither the copyright holder nor any applicable licensor will be liable for any
- pose estimation:
- [lightweight_open_pose Module](lightweight-open-pose.md)
- [high_resolution_pose_estimation Module](high-resolution-pose-estimation.md)
- gesture recognition:
- [gesture_recognition Module](gesture-recognition-learner.md)
- activity recognition:
- [skeleton-based action recognition](skeleton-based-action-recognition.md)
- [continual skeleton-based action recognition Module](skeleton-based-action-recognition.md#class-costgcnlearner)
Expand Down
1 change: 1 addition & 0 deletions projects/opendr_ws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ Currently, apart from tools, opendr_ws contains the following ROS nodes (categor
14. [Landmark-based Facial Expression Recognition](src/opendr_perception/README.md#landmark-based-facial-expression-recognition-ros-node)
15. [Skeleton-based Human Action Recognition](src/opendr_perception/README.md#skeleton-based-human-action-recognition-ros-nodes)
16. [Video Human Activity Recognition](src/opendr_perception/README.md#video-human-activity-recognition-ros-node)
17. [RGB Hand Gesture Recognition](src/opendr_perception/README.md#rgb-gesture-recognition-ros-node)

## RGB + Infrared input
1. [End-to-End Multi-Modal Object Detection (GEM)](src/opendr_perception/README.md#2d-object-detection-gem-ros-node)
Expand Down
25 changes: 25 additions & 0 deletions projects/opendr_ws/src/opendr_perception/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -746,6 +746,31 @@ The node makes use of the toolkit's video human activity recognition tools which

You can find the corresponding IDs regarding activity recognition [here](https://github.com/opendr-eu/opendr/blob/master/src/opendr/perception/activity_recognition/datasets/kinetics400_classes.csv).

### RGB Gesture Recognition ROS Node

For gesture recognition, the ROS [node](./scripts/gesture_recognition_node.py) is based on the gesture recognition learner defined [here](../../../../src/opendr/perception/gesture_recognition/gesture_recognition_learner.py), and the documentation of the learner can be found [here](../../../../docs/reference/gesture-recognition-learner.md).

#### Instructions for basic usage:

1. Start the node responsible for publishing images. If you have a USB camera, then you can use the `usb_cam_node` as explained in the [prerequisites above](#prerequisites).

2. Start the gesture recognition node:
```shell
rosrun opendr_perception gesture_recognition_node.py
```
The following arguments are available:
- `-i or --input_rgb_image_topic INPUT_RGB_IMAGE_TOPIC`: topic name for input RGB image (default=`/usb_cam/image_raw`)
- `-o or --output_rgb_image_topic OUTPUT_RGB_IMAGE_TOPIC`: topic name for output annotated RGB image (default=`/opendr/image_gesture_annotated`)
- `-d or --detections_topic DETECTIONS_TOPIC`: topic name for detection messages (default=`/opendr/gestures`)
- `--performance_topic PERFORMANCE_TOPIC`: topic name for performance messages (default=`None`, disabled)
- `--device DEVICE`: Device to use, either `cpu` or `cuda`, falls back to `cpu` if GPU or CUDA is not found (default=`cuda`)
- `--threshold THRESHOLD`: Confidence threshold for predictions (default=0.5)
- `--model MODEL`: Config file name of the model that will be used (default=`plus_m_1.5x_416)`

3. Default output topics:
- Output images: `/opendr/image_gesture_annotated`
- Detection messages: `/opendr/gestures`

## RGB + Infrared input

### 2D Object Detection GEM ROS Node
Expand Down
Loading