This is the codebase for our SemEval 2020 paper: Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines.
@inproceedings{duluth2020humor,
title = "Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines",
author = "Shuning Jin and Yue Yin and XianE Tang and Ted Pedersen",
booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020)",
year = "2020",
url = "https://arxiv.org/abs/2009.02795"
}
SemEval-2020 Task 7: Assessing Humor in Edited News Headlines
- Competition page
- Subtask 1: regression task to predict the funniness score of an edited headline
- Subtask 2: classification task to predict the funnier between two edited headlines
- Leaderboard for offical evaluation
- webpage version: “Evaluation-Task-1” is for Subtask 1 and “Evaluation-Task-2” is for Subtask 2.
- cleaned csv version
- Our system ranks 11/49 (0.531 RMSE) in Subtask 1, and 9/32 (0.632 accuracy) in Subtask 2.
conda environment
-
packages are specified in environment.yml
-
require conda3: Anaconda 3 or Miniconda 3
-
create conda environment:
conda env create -f environment.yml
-
activate/deactivate the environment:
# linux/mac (conda>=3.6): conda activate humor conda deactivate # linux/mac (conda<3.6): source activate humor source deactivate # windows: activate humor deactivate
-
spaCy
python -m spacy download en
HuggingFace Transformers Cache Directory
we use the transformers library by HuggingFace. Save caches so you don't have to download the same model more than once.
# replace `/path/to/cache/directory` with your directory
CACHE=/path/to/cache/directory
bash scripts/path_setup.sh HUGGINGFACE_TRANSFORMERS_CACHE $CACHE
# this will add the following line to ~/.bash_profile (mac) or ~/.bashrc (linux)
# export HUGGINGFACE_TRANSFORMERS_CACHE=/path/to/cache/directory
- see data directory
- datasets: Humicroedit (official task data) and Funlines (additional training data)
- you can download the data from the source website, or simply run
This gives the same data as in
bash scripts/download_data.sh
data
directory.
experiment_directory
├── log.log
├── params.json
# if args.save_model
├── model_state.th
# if args.tensorboard
├── tensorboard_train
├── tensorboard_val
# if args.do_eval
└── output-{eval_data_name}.csv
To see tensorboard output:
open http://localhost:6006
tensorboard --logdir tensorboard_train
# you may need to wait a few seconds and refresh the page
open http://localhost:6006
tensorboard --logdir tensorboard_val
# you may need to wait a few seconds and refresh the page
-
Baseline
-
Baseline 1 uses the average score; Baseline 2 uses the majority label.
-
output will be in
baseline_output
directorybash scripts/baseline.sh > baseline_output/results.log
-
-
To reproduce our main experiment based on the contrastive approach (Table 2 in the paper):
-
To reproduce the additional analysis on the non-contrastive approach (Table 3 in the paper)
-
use this script
bash scripts/table3.sh
-