This is the codebase for our SemEval 2020 paper: Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines.
title = "Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines",
author = "Shuning Jin and Yue Yin and XianE Tang and Ted Pedersen",
booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020)",
year = "2020",
url = ""
SemEval-2020 Task 7: Assessing Humor in Edited News Headlines
- Competition page
- Subtask 1: regression task to predict the funniness score of an edited headline
- Subtask 2: classification task to predict the funnier between two edited headlines
- Leaderboard for offical evaluation
- webpage version: “Evaluation-Task-1” is for Subtask 1 and “Evaluation-Task-2” is for Subtask 2.
- cleaned csv version
- Our system ranks 11/49 (0.531 RMSE) in Subtask 1, and 9/32 (0.632 accuracy) in Subtask 2.
conda environment
packages are specified in environment.yml
require conda3: Anaconda 3 or Miniconda 3
create conda environment:
conda env create -f environment.yml
activate/deactivate the environment:
# linux/mac (conda>=3.6): conda activate humor conda deactivate # linux/mac (conda<3.6): source activate humor source deactivate # windows: activate humor deactivate
python -m spacy download en
HuggingFace Transformers Cache Directory
we use the transformers library by HuggingFace. Save caches so you don't have to download the same model more than once.
# replace `/path/to/cache/directory` with your directory
# this will add the following line to ~/.bash_profile (mac) or ~/.bashrc (linux)
# export HUGGINGFACE_TRANSFORMERS_CACHE=/path/to/cache/directory
- see data directory
- datasets: Humicroedit (official task data) and Funlines (additional training data)
- you can download the data from the source website, or simply run
This gives the same data as in
bash scripts/
├── log.log
├── params.json
# if args.save_model
# if args.tensorboard
├── tensorboard_train
├── tensorboard_val
# if args.do_eval
└── output-{eval_data_name}.csv
To see tensorboard output:
open http://localhost:6006
tensorboard --logdir tensorboard_train
# you may need to wait a few seconds and refresh the page
open http://localhost:6006
tensorboard --logdir tensorboard_val
# you may need to wait a few seconds and refresh the page
Baseline 1 uses the average score; Baseline 2 uses the majority label.
output will be in
directorybash scripts/ > baseline_output/results.log
To reproduce our main experiment based on the contrastive approach (Table 2 in the paper):
To reproduce the additional analysis on the non-contrastive approach (Table 3 in the paper)
use this script
bash scripts/