Code for Multidomain Language Models for Green NLP.

RAW TEXT DOMAIN DATA

Amazon Reviews
Arxiv Papers
Realnews
Reddit Comments

SUPERVISED TASK DATA

ACL-ARC
AG-News
ChemProt
Clothing Reviews
HyperPartisan
IMDB
MultiNLI
PubMed-RCT
SARC
SciCite
TalkDown

CODE

Code is split in multiple evaluation files, one for each task. Models are not provided, but can be pretrained separately using the run_language_modeling.py script provided here (or by HuggingFace).

Each script is indicative of the code run in our machines. Train/dev/test splits are not provided, as they were randomly sampled. Nevertheless, the scripts were tested with multiple samples and performance was as similar as it can be to reported results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RAW TEXT DOMAIN DATA

SUPERVISED TASK DATA

CODE

Files

README.md

Latest commit

History

README.md

File metadata and controls

RAW TEXT DOMAIN DATA

SUPERVISED TASK DATA

CODE