Skip to content

Files for the study "Communication breakdown: On the low mutual intelligibility between human and neural captioning", EMNLP 2022. A project by R. Dessì, E. Gualdoni, F. Franzon, G. Boleda, M. Baroni

Notifications You must be signed in to change notification settings

franfranz/emecomm_context

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Communication breakdown: On the low mutual intelligibility between human and neural captioning

Files for the EMNLP submission "Communication breakdown: On the low mutual intelligibility between human and neural captioning", EMNLP 2022. A preprint of the manuscript is accessible here.

Data

Human Data Collection

  • annotations: contains csv files with the anonymized raw human annotations, divided by annotation block (we annotated 2088 cases, divided in 36 blocks of 58 questions).
  • scripts: contains the (ordered) scripts to filter and organize the annotations, and compute human accuracy acores.
  • files: contains all the intermediate files generated by the scripts, needed to output the final tsv with the human annotations, called human_data_final.tsv.
  • psychopy_experiment: contains the psychopy code of our experiment, running online on Pavlovia. The shared script is for the first block of annotations: the following blocks run with the same code and different questions.

Caption Analysis

  • mini-readme (to update)

Preprocessing

To be used in this order (see mini readme):

  • generate_tab_delimited_file.py
  • pre-process-sentences-and-get-basic-stats.py
  • lmi.py

Graphs

Frequency spectrum of POS, lemmas (ImageCode Video set - COCO)

About

Files for the study "Communication breakdown: On the low mutual intelligibility between human and neural captioning", EMNLP 2022. A project by R. Dessì, E. Gualdoni, F. Franzon, G. Boleda, M. Baroni

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published