This repo contains the implementation of the models proposed in the paper "BETOLD: A Task-Oriented Dialog Dataset for Dialog Breakdown " accepted at the COLING 2022's workshop "When creative AI meets conversational AI".
BETOLD (Breakdown Expectation for Task-Oriented Long Dialogs) is a task-oriented dialog dataset, derived from real conversations between system and user in order to fulfill the task of booking an appointment. The aim of the dataset is to predict LUFHs, i.e. user-initiated (U) forward calls (F) and hang-ups (H) that happen in a late (L) point of the conversation. This dataset is characterized by NLG and NLU intents and entities.
For more details on the data see this repo: https://github.com/telepathylabsai/BETOLD_dataset
- If you want to use CUDA you need to install the correct version of the CUDA systems that matches your distribution, see pytorch.
- Install the package using pip
pip install -r requirements.txt pip install -e .
There are different variants of the models, depending on the dataset features used for training.
The available features are the following:
- callers
, representing an utterance coming from the user ("nlu") or from the system ("nlg")
- intents
, an nlg or nlu intent
- entities_mh
, the set of entities modeled as a multi-one hot encoded representation
- entities_enc
, the set of entities encoded using SBERT (not shown in the paper)
Run the following script to train a model:
python breakdown_detection/main_script.py [options]
Options:
--use_features
(list of strings) specifies the dataset features for the training of the model. See the parameter AVAILABLE_FEATURES for the complete list of features.--model_param_set
(integer) specifies the index of the hyperparameter configuration available in the script file--training_param_set
(integer) specifies the index of the training hyperparameters (number of epochs and eval with validation set) available in the script file--num_epochs
(integer) specifies the number of epochs for the training (it overwrites --training_param_set)--results_file
(str) specifies the path to the file where to store the results
The trained model is stored in the directory trained_models. It can be loaded and analyzed. See "Analyze Results" for more details.
Instead of training one model, you can run the grid search over a set of hyperparameters, including the different combinations of dataset features. You should directly specify the hyperparameter ranges in the file and run the script as:
python breakdown_detection/grid_search_script.py [options]
1) File breakdown_detection/results_analysis/compute_breakdown_probability_per_conversation.py
allows you to load a trained model and inspect the results (probability of LUHF)
at each step (intents, entities, ...) of a given conversation of the test set by specifying the index of the conversation.
2) File breakdown_detection/results_analysis/compute_avg_breakdown_probability.py
allows you to load a trained model and computes the avg probability of LUHF for each conversation
and it saves a histogram of these results.
3) File breakdown_detection/results_analysis/explainability.py
allows you to calculate the feature attributions using integrated gradients.
Here choose which feature you wish to calculate: intents, callers or entities.
4) File breakdown_detection/results_analysis/explainability_visualization.py
allows to examine individual examples of explanations from integrated gradients technique.
We report here the updated results:
LUHF F1 | Not LUHF F1 | Macro-avg F1 | |
---|---|---|---|
Intents | 0.825 +/- 0.019 | 0.744 +/- 0.011 | 0.784 +/- 0.015 |
Entities_mh | 0.740 +/- 0.034 | 0.652 +/- 0.012 | 0.696 +/- 0.022 |
Entities_enc | 0.808 +/- 0.015 | 0.714 +/- 0.009 | 0.761 +/- 0.011 |
Intents+Entities_mh+Callers | 0.836 +/- 0.017 | 0.758 +/- 0.011 | 0.797 +/- 0.014 |
Intents+Entities_enc+Callers | 0.831 +/- 0.016 | 0.755 +/- 0.010 | 0.793 +/- 0.013 |
Text baseline | 0.862 +/- 0.012 | 0.790 +/- 0.010 | 0.826 +/- 0.010 |
- Silvia Terragni <silvia.terragni@telepathy.ai>
- Bruna Guedes
- Andre Manso
- Modestas Filipavicius
- Nghia Khau
- Roland Mathis
This work has been accepted at the COLING 2022's workshop When creative AI meets conversational AI. If you decide to use this resource, please cite:
@inproceedings{terragni2022_betold, title = "{BETOLD}: A Task-Oriented Dialog Dataset for Breakdown Detection", author = "Terragni, Silvia and Guedes, Bruna and Manso, Andre and Filipavicius, Modestas and Khau, Nghia and Mathis, Roland", booktitle = "Proceedings of the Second Workshop on When Creative AI Meets Conversational AI", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.cai-1.4", pages = "23--34", }