This repository provides Python code to reproduce the experiments from the article A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries, accepted for publication to EACL 2023.
git clone git@github.com:deezer/music-ner-eacl2023.git
cd music-ner-eacl2023
Build the docker image and run it in a container while launching an interactive bash session (the current docker image requires a CUDA-capable GPU):
$ make build
$ make run-bash
Print the data statistics shown in Table 2
of the paper:
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset1
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset2
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset3
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset4
Prepare ground-truth sets with seen
and rare / unseen
entities:
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset1/ --th_seen=1 --th_rare_unseen=0
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset2/ --th_seen=1 --th_rare_unseen=0
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset3/ --th_seen=1 --th_rare_unseen=0
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset4/ --th_seen=1 --th_rare_unseen=0
Note: some small variations between different runs, hence from the exact scores reported in the paper, could exist but with no statistically significant differences.
Fine-tune multiple transformers (BERT
, RoBERTa
and MPNet
) to perform music NER and print results (Table 4
):
./music-ner/scripts/run_ner_model_selection.sh
poetry run python3 music-ner/tables-and-stats/transformer_baselines.py --results_dir output
Fine-tune BERT
to perform music NER, export human
annotation results in the same json
format as the one produced by transformers, and print results (Tables 5
and 6
):
./music-ner/scripts/run_ner.sh
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset1 --output_dir output/dataset1
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset2 --output_dir output/dataset2
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset3 --output_dir output/dataset3
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset4 --output_dir output/dataset4
poetry run python3 music-ner/tables-and-stats/human_vs_bert.py --results_dir output
Run experiments for seen
and rare / unseen
ground-truth sets and print results (Table 7
):
./music-ner/scripts/run_ner_seen_ents.sh
./music-ner/scripts/run_ner_rare_unseen_ents.sh
poetry run python3 music-ner/tables-and-stats/seen_vs_unseen.py --results_dir output
Reproduce Figure 1
with the detailed error analysis for BERT
and human
predictors:
poetry run python3 music-ner/tables-and-stats/graph_error_analysis.py --results_dir output
Please cite our paper if you use this data or code in your work:
@InProceedings{Epure2023,
title={A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries},
author={Epure, Elena and Hennequin, Romain},
booktitle={Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
month={May},
year={2023}
}