Intent recognition tool (#443)

* init commit * update readme * update readme * fix ros node readme * fix ros node readmes * style fixes * style fixes * style fixes * style * fix ros node data type * ros data type fix * fixed unused import * license fix * style fix * style fix * license fix * change the way text backbone is defined in demos and ros nodes * change file path * added missing import * ditto * fixed download paths in python demos * style fix * style fix * fixed ros nodes to be consistent with last changes in speech transcription * bug fix in ros node * bug fix * removed unused function * Added __init__.py to intent recognition test * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/python/perception/multimodal_human_centric/intent_recognition/README.MD Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update docs/reference/intent-recognition-learner.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * updated nltk download path; unittest dataset length; docs; order of methods in learner * fix error in test * fix pandas version-related bug * test * test * Added intent recognition dependency on hri msgs and added node to catkin_install_python * Added intent recognition in ros2 setup.py * Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws/src/opendr_perception/README.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/README.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/README.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws/src/opendr_perception/scripts/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws/src/opendr_perception/scripts/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * test fix * Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Added fix from #465 to new node --------- Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>
opendr-eu · Sep 25, 2023 · 30817fe · 30817fe
1 parent 9a314b7
commit 30817fe
Show file tree

Hide file tree

Showing 46 changed files with 4,075 additions and 8 deletions.
diff --git a/docs/reference/index.md b/docs/reference/index.md
@@ -68,6 +68,7 @@ Neither the copyright holder nor any applicable licensor will be liable for any
         - multimodal human centric:
             - [rgbd_hand_gesture_learner Module](rgbd-hand-gesture-learner.md)
             - [audiovisual_emotion_recognition_learner Module](audiovisual-emotion-recognition-learner.md)
+            - [intent_recognition_learner Module](intent-recognition-learner.md)
         - compressive learning:
             - [multilinear_compressive_learning Module](multilinear-compressive-learning.md)
         - semantic segmentation:

diff --git a/docs/reference/intent-recognition-learner.md b/docs/reference/intent-recognition-learner.md
@@ -0,0 +1,169 @@
+## intent_recognition module
+
+The *intent_recognition* module contains the *IntentRecognitionLearner* class and can be used to recognize 20 intents of a person based on text.
+It is recommended to use *IntentRecognitionModule* together with *SpeechTranscriptionModule* to enable intent recognition based on transcribed speech.
+The module supports multimodal training on face (vision), speech (audio), and text data to facilitate improved unimodal inference on text modality.
+
+We provide data processing scripts and pre-trained model for [MIntRec dataset](https://github.com/thuiar/MIntRec).
+The class labels correspond to the following intent categories: 0 - Complain, 1 - Praise, 2 - Apologise, 3 - Thank, 4 - Criticize, 5 - Agree, 6 - Taunt, 7 - Flaunt, 8 - Joke, 9 - Oppose, 10 - Comfort, 11 - Care, 12 - Inform, 13 - Advise, 14 - Arrange, 15 - Introduce, 16 - Leave, 17 - Prevent, 18 - Greet, 19 - Ask for help.
+
+### Class IntentRecognitionLearner
+
+The learner has the following public methods:
+
+#### `IntentRecognitionLearner` constructor
+```python
+IntentRecognitionLearner(self, text_backbone, mode, log_path, cache_path, results_path, output_path, device, benchmark)
+```
+
+Constructor parameters:
+
+- **text_backbone**: *{"bert-base-uncased", "albert-base-v2", "prajjwal1/bert-small", "prajjwal1/bert-mini", "prajjwal1/bert-tiny"}, default="bert-base-uncased"*\
+  Specifies the text backbone to be used. The name matches the corresponding huggingface hub model, e.g., [prajjwal1/bert-small](https://huggingface.co/prajjwal1/bert-small).
+- **mode**: *{'language', 'joint'}, default="joint"*\
+  Specifies the modality of the model. 'Language' corresponds to text-only model, 'Joint' corresponds to multimodal model with vision, audio, and language modalities trained jointly.
+- **log_path**: *str, default="logs"*\
+  Specifies the path where to store the logs.
+- **cache_path**: *str, default="cache"*\
+  Specifies the path for cache, mainly used for tokenizer files.
+- **results_path**: *str, default="results"*\
+  Specifies where to store the results (performance metrics).
+- **output_path**: *str, default="outputs"*\
+  Specifies where to store the outputs: trained models, predictions, etc.
+- **device**: *str, default="cuda"*\
+  Specifies the device to be used for training.
+- **benchmark**: *{"MIntRec"}, default="MIntRec"*\
+  Specifies the benchmark (dataset) to be used for training. The benchmark defines the class labels, feature dimensionalities, etc.
+
+#### `IntentRecognitionLearner.fit`
+```python
+IntentRecognitionLearner.fit(self, dataset, val_dataset, verbose, silent)
+```
+
+This method is used for training the algorithm on a training dataset and validating on a validation dataset.
+
+Parameters:
+
+- **dataset**: *object*\
+  Object that holds the training dataset.
+- **val_dataset** : *object, default=None*\
+  Object that holds the validation dataset.
+- **verbose** : *bool, default=False*\
+  Enables verbosity.
+- **silent** : *bool, default=False*\
+  Enables training in the silent mode, i.e., only critical output is produced.
+
+#### `IntentRecognitionLearner.eval`
+```python
+IntentRecognitionLearner.eval(self, dataset, modality, verbose, silent, restore_best_model)
+```
+
+This method is used to evaluate a trained model on an evaluation dataset.
+
+Parameters:
+
+- **dataset** : *object*\
+  Object that holds the evaluation dataset.
+- **modality**: *str*, {'audio', 'video', 'language', 'joint'}\
+  Specifies the modality to be used for inference. Should either match the current training mode of the learner, or for a learner trained in joint (multimodal) mode, any modality can be used for inference, although we do not recommend using only video or only audio.
+- **verbose**: *bool, default=False*\
+  If True, provides detailed logs.
+- **silent**: *bool, default=False*\
+  If True, run in silent mode, i.e., with only critical output.
+- **restore_best_model** : *bool, default=False*\
+  If True, best model according to performance on validation set will be loaded from self.output_path. If False, current model state will be evaluated.
+
+#### `IntentRecognitionLearner.infer`
+```python
+IntentRecognitionLearner.infer(self, batch, modality)
+```
+
+This method is used to perform inference from given language sequence (text).
+Returns a list of `engine.target.Category` objects, which contains calss predictions and confidence scores for each sentence in the input sequence.
+
+Parameters:
+- **batch**: *dict*\
+  Dictionary with input data with keys corresponding to modalities, e.g. {'text': 'Hello'}.
+- **modality**: *str, default='language'*\
+  Modality to be used for inference. Currently, inference from raw data is only supported for language modality (text).
+
+#### `IntentRecognitionLearner.save`
+```python
+IntentRecognitionLearner.save(self, path)
+```
+This method is used to save a trained model.
+
+Parameters:
+
+- **path**: *str*\
+  Path to save the model.
+
+#### `IntentRecognitionLearner.load`
+```python
+IntentRecognitionLearner.load(self, path)
+```
+
+This method is used to load a previously saved model.
+
+Parameters:
+
+- **path**: *str*\
+  Path of the model to be loaded.
+
+#### `IntentRecognitionLearner.download`
+```python
+IntentRecognitionLearner.download(self, path)
+```
+
+Downloads the provided pretrained model into 'path'.
+
+Parameters:
+
+- **path**: *str*\
+  Specifies the folder where data will be downloaded. 
+
+#### `IntentRecognitionLearner.trim`
+```python
+IntentRecognitionLearner.trim(self, modality)
+```
+
+This method is used to convert a model trained in a multimodal manner ('joint' mode) for unimodal inference. This will drop unnecessary layers corresponding to other modalities for computational efficiency.
+
+Parameters:
+- **modality**: *str, default='language'*\
+  Modality to which to convert the model
+
+#### Examples
+
+Additional configuration parameters/hyperparameters can be specified in *intent_recognition_learner/algorithm/configs/mult_bert.py*.
+
+* **Training, evaluation and inference example**
+
+  ```python
+  from opendr.perception.multimodal_human_centric import IntentRecognitionLearner
+  from opendr.perception.multimodal_human_centric.intent_recognition_learner.algorithm.data.mm_pre import MIntRecDataset
+
+  if __name__ == '__main__':
+    # Initialize the multimodal learner
+    learner = IntentRecognitionLearner(text_backbone='bert-base-uncased', mode='joint', log_path='logs', cache_path='cache', results_path='results', output_path='outputs')
+
+    # Initialize datasets
+    train_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='train')
+    val_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='dev')
+    test_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='test')
+
+    # Train the model
+    learner.fit(dataset, val_dataset, silent=False, verbose=True)
+
+    # Evaluate the best according to validation set model on multimodal input
+    out = learner.eval(test_dataset, 'joint', restore_best_model=True)
+
+    # Evaluate the best according to validation set model on text-only input
+    out_l = learner.eval(test_dataset, 'language', restore_best_model=True)
+
+    # Keep only the text-specific layers of the model and drop the rest
+    learner.trim('language')
+
+    # Evaluate the trimmed model. Should produce the same result as out_l.
+    out_l_2 = learner.eval(test_dataset, 'language', restore_best_model=False)
+  ```
diff --git a/projects/opendr_ws/README.md b/projects/opendr_ws/README.md
@@ -88,14 +88,18 @@ Currently, apart from tools, opendr_ws contains the following ROS nodes (categor
 1. [End-to-End Multi-Modal Object Detection (GEM)](src/opendr_perception/README.md#2d-object-detection-gem-ros-node)
 ## RGBD input
 1. [RGBD Hand Gesture Recognition](src/opendr_perception/README.md#rgbd-hand-gesture-recognition-ros-node)
+## RGB + IMU input
+1. [Continual SLAM](src/opendr_perception//README.md#continual-slam-ros-nodes)
 ## RGB + Audio input
 1. [Audiovisual Emotion Recognition](src/opendr_perception/README.md#audiovisual-emotion-recognition-ros-node)
 
-## RGB + IMU input
-1. [Continual SLAM](src/opendr_perception//README.md#continual-slam-ros-nodes)
+
 ## Audio input
 1. [Speech Command Recognition](src/opendr_perception/README.md#speech-command-recognition-ros-node)
 2. [Speech Transcription](src/opendr_perception/README.md#speech-transcription-ros-node)
+## Text input
+1. [Intent Recognition](src/opendr_perception/README.md#intent-recognition-ros-node)
+
 ## Point cloud input
 1. [3D Object Detection Voxel](src/opendr_perception/README.md#3d-object-detection-voxel-ros-node)
 2. [3D Object Tracking AB3DMOT](src/opendr_perception/README.md#3d-object-tracking-ab3dmot-ros-node)

diff --git a/projects/opendr_ws/src/opendr_perception/CMakeLists.txt b/projects/opendr_ws/src/opendr_perception/CMakeLists.txt
@@ -45,5 +45,6 @@ catkin_install_python(PROGRAMS
    scripts/facial_emotion_estimation_node.py
    scripts/continual_skeleton_based_action_recognition_node.py
    scripts/point_cloud_2_publisher_node.py
+   scripts/intent_recognition_node.py
    DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
  )
diff --git a/projects/opendr_ws/src/opendr_perception/README.md b/projects/opendr_ws/src/opendr_perception/README.md
@@ -974,7 +974,6 @@ whose documentation can be found here:
 
 EdgeSpeechNets currently does not have a pretrained model available for download, only local files may be used.
 
-
 ### Speech Transcription ROS Node
 
 A ROS node for speech transcription from an audio stream using Whisper or Vosk.
@@ -1020,6 +1019,49 @@ The node makes use of the toolkit's speech transcription tools:
 
    For viewing the output, refer to the [notes above.](#notes)
 
+----
+## Text input
+
+### Intent Recognition ROS Node
+
+A ROS node for recognizing intents from language.
+This node should be used together with the speech transcription node that would transcribe the speech into text and infer intent from it.
+The provided intent recognition node subscribes to the speech transcription output topic.
+
+You can find the intent recognition ROS node python script [here](./scripts/intent_recognition_node.py) to inspect the code and modify if you wish for your needs.
+The node makes use of the toolkit's intent recognition [learner](../../../../src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py), and the documentation can be found [here](../../../../docs/reference/intent-recognition-learner.md).
+
+#### Instructions for basic usage:
+
+1. Follow the instructions of the speech transcription node and start it.
+
+2. Start the intent recognition node
+
+    ```shell
+    rosrun opendr_perception intent_recognition_node.py
+    ```
+    The following arguments are available:
+   - `-i or --input_transcription_topic INPUT_TRANSCRIPTION_TOPIC`: topic name for input transcription of type OpenDRTranscription (default=`/opendr/speech_transcription`)
+   - `-o or --output_intent_topic OUTPUT_INTENT_TOPIC`: topic name for predicted intent (default=`/opendr/intent`)
+   - `--performance_topic PERFORMANCE_TOPIC`: topic name for performance messages (default=`None`, disabled)
+   - `--device DEVICE`: device to be used for inference (default=`cuda`)
+   - `--text_backbone TEXT_BACKBONE`: text backbone tobe used, choices are `bert-base-uncased`, `albert-base-v2`, `bert-small`, `bert-mini`, `bert-tiny` (default=`bert-base-uncased`)
+   - `--cache_path CACHE_PATH`: cache path for tokenizer files (default=`./cache/`)
+
+3. Default output topics:
+   - Predicted intents and confidence: `/opendr/intent`
+
+   For viewing the output, refer to the [notes above.](#notes)
+
+**Notes**
+
+On the table below you can find the detectable classes and their corresponding IDs:
+
+| Class  | Complain | Praise | Apologise | Thank | Criticize | Agree | Taunt | Flaunt | Joke | Oppose | Comfort | Care | Inform | Advise | Arrange | Introduce | Leave | Prevent | Greet | Ask for help |
+|--------|----------|--------|-----------|-------|-----------|-------|-------|--------|------|--------|---------|------|--------|--------|---------|-----------|-------|---------|-------|--------------|
+| **ID** | 0        | 1      | 2         | 3     | 4         | 5     | 6     | 7      | 8    | 9      | 10      | 11   | 12     | 13     | 14      | 15        | 16    | 17      | 18    | 19           |
+
+
 ----
 ## Point cloud input
 

diff --git a/projects/opendr_ws/src/opendr_perception/package.xml b/projects/opendr_ws/src/opendr_perception/package.xml
@@ -12,16 +12,19 @@
   <build_depend>std_msgs</build_depend>
   <build_depend>vision_msgs</build_depend>
   <build_depend>audio_common_msgs</build_depend>
+  <build_depend>hri_msgs</build_depend>
   <build_export_depend>roscpp</build_export_depend>
   <build_export_depend>rospy</build_export_depend>
   <build_export_depend>std_msgs</build_export_depend>
   <build_export_depend>vision_msgs</build_export_depend>
   <build_export_depend>audio_common_msgs</build_export_depend>
+  <build_export_depend>hri_msgs</build_export_depend>
   <exec_depend>roscpp</exec_depend>
   <exec_depend>rospy</exec_depend>
   <exec_depend>std_msgs</exec_depend>
   <exec_depend>vision_msgs</exec_depend>
   <exec_depend>audio_common_msgs</exec_depend>
+  <exec_depend>hri_msgs</exec_depend>
   <export>
   </export>
 </package>