Code for the EACL 2021 paper How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation

Most experiments can be reproduced by running the accompanying jupyter notebook. For significance analysis run scrips/r/analyse-ordinal.r anonymized_judgements/<data_file> crossed

Power analysis is not included in the steps, as it is computationally expensive. To reproduce one step, run

python -m summaryanalysis.design_power -b <batch count> -d <docs per batch> -a <annotators per doc> <model_file> out.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls