Files for the EMNLP submission "Communication breakdown: On the low mutual intelligibility between human and neural captioning", EMNLP 2022. A preprint of the manuscript is accessible here.
- annotations: contains csv files with the anonymized raw human annotations, divided by annotation block (we annotated 2088 cases, divided in 36 blocks of 58 questions).
- scripts: contains the (ordered) scripts to filter and organize the annotations, and compute human accuracy acores.
- files: contains all the intermediate files generated by the scripts, needed to output the final tsv with the human annotations, called human_data_final.tsv.
- psychopy_experiment: contains the psychopy code of our experiment, running online on Pavlovia. The shared script is for the first block of annotations: the following blocks run with the same code and different questions.
- mini-readme (to update)
To be used in this order (see mini readme):
- generate_tab_delimited_file.py
- pre-process-sentences-and-get-basic-stats.py
- lmi.py
Frequency spectrum of POS, lemmas (ImageCode Video set - COCO)