Skip to content

Commit

Permalink
Intent recognition tool (opendr-eu#443)
Browse files Browse the repository at this point in the history
* init commit

* update readme

* update readme

* fix ros node readme

* fix ros node readmes

* style fixes

* style fixes

* style fixes

* style

* fix ros node data type

* ros data type fix

* fixed unused import

* license fix

* style fix

* style fix

* license fix

* change the way text backbone is defined in demos and ros nodes

* change file path

* added missing import

* ditto

* fixed download paths in python demos

* style fix

* style fix

* fixed ros nodes to be consistent with last changes in speech transcription

* bug fix in ros node

* bug fix

* removed unused function

* Added __init__.py to intent recognition test

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/python/perception/multimodal_human_centric/intent_recognition/README.MD

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update docs/reference/intent-recognition-learner.md

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* updated nltk download path; unittest dataset length; docs; order of methods in learner

* fix error in test

* fix pandas version-related bug

* test

* test

* Added intent recognition dependency on hri msgs and added node to catkin_install_python

* Added intent recognition in ros2 setup.py

* Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/python/perception/multimodal_human_centric/intent_recognition/demo_speech.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws/src/opendr_perception/README.md

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/README.md

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/README.md

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws/src/opendr_perception/scripts/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws/src/opendr_perception/scripts/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Update projects/opendr_ws_2/src/opendr_perception/opendr_perception/intent_recognition_node.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* test fix

* Update src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>

* Added fix from opendr-eu#465 to new node

---------

Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com>
  • Loading branch information
2 people authored and Luca Marchionni committed Sep 25, 2023
1 parent 13d0dd9 commit 101b03a
Show file tree
Hide file tree
Showing 46 changed files with 4,075 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Neither the copyright holder nor any applicable licensor will be liable for any
- multimodal human centric:
- [rgbd_hand_gesture_learner Module](rgbd-hand-gesture-learner.md)
- [audiovisual_emotion_recognition_learner Module](audiovisual-emotion-recognition-learner.md)
- [intent_recognition_learner Module](intent-recognition-learner.md)
- compressive learning:
- [multilinear_compressive_learning Module](multilinear-compressive-learning.md)
- semantic segmentation:
Expand Down
169 changes: 169 additions & 0 deletions docs/reference/intent-recognition-learner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
## intent_recognition module

The *intent_recognition* module contains the *IntentRecognitionLearner* class and can be used to recognize 20 intents of a person based on text.
It is recommended to use *IntentRecognitionModule* together with *SpeechTranscriptionModule* to enable intent recognition based on transcribed speech.
The module supports multimodal training on face (vision), speech (audio), and text data to facilitate improved unimodal inference on text modality.

We provide data processing scripts and pre-trained model for [MIntRec dataset](https://github.com/thuiar/MIntRec).
The class labels correspond to the following intent categories: 0 - Complain, 1 - Praise, 2 - Apologise, 3 - Thank, 4 - Criticize, 5 - Agree, 6 - Taunt, 7 - Flaunt, 8 - Joke, 9 - Oppose, 10 - Comfort, 11 - Care, 12 - Inform, 13 - Advise, 14 - Arrange, 15 - Introduce, 16 - Leave, 17 - Prevent, 18 - Greet, 19 - Ask for help.

### Class IntentRecognitionLearner

The learner has the following public methods:

#### `IntentRecognitionLearner` constructor
```python
IntentRecognitionLearner(self, text_backbone, mode, log_path, cache_path, results_path, output_path, device, benchmark)
```

Constructor parameters:

- **text_backbone**: *{"bert-base-uncased", "albert-base-v2", "prajjwal1/bert-small", "prajjwal1/bert-mini", "prajjwal1/bert-tiny"}, default="bert-base-uncased"*\
Specifies the text backbone to be used. The name matches the corresponding huggingface hub model, e.g., [prajjwal1/bert-small](https://huggingface.co/prajjwal1/bert-small).
- **mode**: *{'language', 'joint'}, default="joint"*\
Specifies the modality of the model. 'Language' corresponds to text-only model, 'Joint' corresponds to multimodal model with vision, audio, and language modalities trained jointly.
- **log_path**: *str, default="logs"*\
Specifies the path where to store the logs.
- **cache_path**: *str, default="cache"*\
Specifies the path for cache, mainly used for tokenizer files.
- **results_path**: *str, default="results"*\
Specifies where to store the results (performance metrics).
- **output_path**: *str, default="outputs"*\
Specifies where to store the outputs: trained models, predictions, etc.
- **device**: *str, default="cuda"*\
Specifies the device to be used for training.
- **benchmark**: *{"MIntRec"}, default="MIntRec"*\
Specifies the benchmark (dataset) to be used for training. The benchmark defines the class labels, feature dimensionalities, etc.

#### `IntentRecognitionLearner.fit`
```python
IntentRecognitionLearner.fit(self, dataset, val_dataset, verbose, silent)
```

This method is used for training the algorithm on a training dataset and validating on a validation dataset.

Parameters:

- **dataset**: *object*\
Object that holds the training dataset.
- **val_dataset** : *object, default=None*\
Object that holds the validation dataset.
- **verbose** : *bool, default=False*\
Enables verbosity.
- **silent** : *bool, default=False*\
Enables training in the silent mode, i.e., only critical output is produced.

#### `IntentRecognitionLearner.eval`
```python
IntentRecognitionLearner.eval(self, dataset, modality, verbose, silent, restore_best_model)
```

This method is used to evaluate a trained model on an evaluation dataset.

Parameters:

- **dataset** : *object*\
Object that holds the evaluation dataset.
- **modality**: *str*, {'audio', 'video', 'language', 'joint'}\
Specifies the modality to be used for inference. Should either match the current training mode of the learner, or for a learner trained in joint (multimodal) mode, any modality can be used for inference, although we do not recommend using only video or only audio.
- **verbose**: *bool, default=False*\
If True, provides detailed logs.
- **silent**: *bool, default=False*\
If True, run in silent mode, i.e., with only critical output.
- **restore_best_model** : *bool, default=False*\
If True, best model according to performance on validation set will be loaded from self.output_path. If False, current model state will be evaluated.

#### `IntentRecognitionLearner.infer`
```python
IntentRecognitionLearner.infer(self, batch, modality)
```

This method is used to perform inference from given language sequence (text).
Returns a list of `engine.target.Category` objects, which contains calss predictions and confidence scores for each sentence in the input sequence.

Parameters:
- **batch**: *dict*\
Dictionary with input data with keys corresponding to modalities, e.g. {'text': 'Hello'}.
- **modality**: *str, default='language'*\
Modality to be used for inference. Currently, inference from raw data is only supported for language modality (text).

#### `IntentRecognitionLearner.save`
```python
IntentRecognitionLearner.save(self, path)
```
This method is used to save a trained model.

Parameters:

- **path**: *str*\
Path to save the model.

#### `IntentRecognitionLearner.load`
```python
IntentRecognitionLearner.load(self, path)
```

This method is used to load a previously saved model.

Parameters:

- **path**: *str*\
Path of the model to be loaded.

#### `IntentRecognitionLearner.download`
```python
IntentRecognitionLearner.download(self, path)
```

Downloads the provided pretrained model into 'path'.

Parameters:

- **path**: *str*\
Specifies the folder where data will be downloaded.

#### `IntentRecognitionLearner.trim`
```python
IntentRecognitionLearner.trim(self, modality)
```

This method is used to convert a model trained in a multimodal manner ('joint' mode) for unimodal inference. This will drop unnecessary layers corresponding to other modalities for computational efficiency.

Parameters:
- **modality**: *str, default='language'*\
Modality to which to convert the model

#### Examples

Additional configuration parameters/hyperparameters can be specified in *intent_recognition_learner/algorithm/configs/mult_bert.py*.

* **Training, evaluation and inference example**

```python
from opendr.perception.multimodal_human_centric import IntentRecognitionLearner
from opendr.perception.multimodal_human_centric.intent_recognition_learner.algorithm.data.mm_pre import MIntRecDataset

if __name__ == '__main__':
# Initialize the multimodal learner
learner = IntentRecognitionLearner(text_backbone='bert-base-uncased', mode='joint', log_path='logs', cache_path='cache', results_path='results', output_path='outputs')

# Initialize datasets
train_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='train')
val_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='dev')
test_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='test')

# Train the model
learner.fit(dataset, val_dataset, silent=False, verbose=True)

# Evaluate the best according to validation set model on multimodal input
out = learner.eval(test_dataset, 'joint', restore_best_model=True)

# Evaluate the best according to validation set model on text-only input
out_l = learner.eval(test_dataset, 'language', restore_best_model=True)

# Keep only the text-specific layers of the model and drop the rest
learner.trim('language')

# Evaluate the trimmed model. Should produce the same result as out_l.
out_l_2 = learner.eval(test_dataset, 'language', restore_best_model=False)
```
8 changes: 6 additions & 2 deletions projects/opendr_ws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,18 @@ Currently, apart from tools, opendr_ws contains the following ROS nodes (categor
1. [End-to-End Multi-Modal Object Detection (GEM)](src/opendr_perception/README.md#2d-object-detection-gem-ros-node)
## RGBD input
1. [RGBD Hand Gesture Recognition](src/opendr_perception/README.md#rgbd-hand-gesture-recognition-ros-node)
## RGB + IMU input
1. [Continual SLAM](src/opendr_perception//README.md#continual-slam-ros-nodes)
## RGB + Audio input
1. [Audiovisual Emotion Recognition](src/opendr_perception/README.md#audiovisual-emotion-recognition-ros-node)

## RGB + IMU input
1. [Continual SLAM](src/opendr_perception//README.md#continual-slam-ros-nodes)

## Audio input
1. [Speech Command Recognition](src/opendr_perception/README.md#speech-command-recognition-ros-node)
2. [Speech Transcription](src/opendr_perception/README.md#speech-transcription-ros-node)
## Text input
1. [Intent Recognition](src/opendr_perception/README.md#intent-recognition-ros-node)

## Point cloud input
1. [3D Object Detection Voxel](src/opendr_perception/README.md#3d-object-detection-voxel-ros-node)
2. [3D Object Tracking AB3DMOT](src/opendr_perception/README.md#3d-object-tracking-ab3dmot-ros-node)
Expand Down
1 change: 1 addition & 0 deletions projects/opendr_ws/src/opendr_perception/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,5 +45,6 @@ catkin_install_python(PROGRAMS
scripts/facial_emotion_estimation_node.py
scripts/continual_skeleton_based_action_recognition_node.py
scripts/point_cloud_2_publisher_node.py
scripts/intent_recognition_node.py
DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
)
44 changes: 43 additions & 1 deletion projects/opendr_ws/src/opendr_perception/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -974,7 +974,6 @@ whose documentation can be found here:

EdgeSpeechNets currently does not have a pretrained model available for download, only local files may be used.


### Speech Transcription ROS Node

A ROS node for speech transcription from an audio stream using Whisper or Vosk.
Expand Down Expand Up @@ -1020,6 +1019,49 @@ The node makes use of the toolkit's speech transcription tools:
For viewing the output, refer to the [notes above.](#notes)
----
## Text input
### Intent Recognition ROS Node
A ROS node for recognizing intents from language.
This node should be used together with the speech transcription node that would transcribe the speech into text and infer intent from it.
The provided intent recognition node subscribes to the speech transcription output topic.
You can find the intent recognition ROS node python script [here](./scripts/intent_recognition_node.py) to inspect the code and modify if you wish for your needs.
The node makes use of the toolkit's intent recognition [learner](../../../../src/opendr/perception/multimodal_human_centric/intent_recognition_learner/intent_recognition_learner.py), and the documentation can be found [here](../../../../docs/reference/intent-recognition-learner.md).

#### Instructions for basic usage:

1. Follow the instructions of the speech transcription node and start it.

2. Start the intent recognition node

```shell
rosrun opendr_perception intent_recognition_node.py
```
The following arguments are available:
- `-i or --input_transcription_topic INPUT_TRANSCRIPTION_TOPIC`: topic name for input transcription of type OpenDRTranscription (default=`/opendr/speech_transcription`)
- `-o or --output_intent_topic OUTPUT_INTENT_TOPIC`: topic name for predicted intent (default=`/opendr/intent`)
- `--performance_topic PERFORMANCE_TOPIC`: topic name for performance messages (default=`None`, disabled)
- `--device DEVICE`: device to be used for inference (default=`cuda`)
- `--text_backbone TEXT_BACKBONE`: text backbone tobe used, choices are `bert-base-uncased`, `albert-base-v2`, `bert-small`, `bert-mini`, `bert-tiny` (default=`bert-base-uncased`)
- `--cache_path CACHE_PATH`: cache path for tokenizer files (default=`./cache/`)

3. Default output topics:
- Predicted intents and confidence: `/opendr/intent`

For viewing the output, refer to the [notes above.](#notes)

**Notes**

On the table below you can find the detectable classes and their corresponding IDs:

| Class | Complain | Praise | Apologise | Thank | Criticize | Agree | Taunt | Flaunt | Joke | Oppose | Comfort | Care | Inform | Advise | Arrange | Introduce | Leave | Prevent | Greet | Ask for help |
|--------|----------|--------|-----------|-------|-----------|-------|-------|--------|------|--------|---------|------|--------|--------|---------|-----------|-------|---------|-------|--------------|
| **ID** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |


----
## Point cloud input

Expand Down
3 changes: 3 additions & 0 deletions projects/opendr_ws/src/opendr_perception/package.xml
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,19 @@
<build_depend>std_msgs</build_depend>
<build_depend>vision_msgs</build_depend>
<build_depend>audio_common_msgs</build_depend>
<build_depend>hri_msgs</build_depend>
<build_export_depend>roscpp</build_export_depend>
<build_export_depend>rospy</build_export_depend>
<build_export_depend>std_msgs</build_export_depend>
<build_export_depend>vision_msgs</build_export_depend>
<build_export_depend>audio_common_msgs</build_export_depend>
<build_export_depend>hri_msgs</build_export_depend>
<exec_depend>roscpp</exec_depend>
<exec_depend>rospy</exec_depend>
<exec_depend>std_msgs</exec_depend>
<exec_depend>vision_msgs</exec_depend>
<exec_depend>audio_common_msgs</exec_depend>
<exec_depend>hri_msgs</exec_depend>
<export>
</export>
</package>
Loading

0 comments on commit 101b03a

Please sign in to comment.