Skip to content

Latest commit

 

History

History
100 lines (85 loc) · 6.43 KB

README.md

File metadata and controls

100 lines (85 loc) · 6.43 KB

Downstream Datasets Make Surprisingly Good Pretraining Corpora

This repository contains the links to the models pretrained in the paper Downstream Datasets Make Surprisingly Good Pretraining Corpora . The models are pretrained from scratch on text from the train split of popular downstream datasets. The models are hosted on Huggingface and their links are given below. The links to the datasets used for pretraining are also provided.

Pretraining dataset Corpus size (MB) Electra model Roberta model
CoNLL-2012 6.4 link link
SQuAD-v1.1 19 link link
SWAG 22 link link
AG News 27 link link
HellaSwag 30 link link
QQP 43 link link
Jigsaw 59 link link
MNLI 65 link link
Sentiment140 114 link link
PAWS 139 link link
DBPedia14 151 link link
Yahoo Answers Topics 461 link link
Discovery 293 link link
Amazon Polarity 1427 link link

Pretraining Hyperparameters

Hyperparameter ELECTRA Roberta
Size Small Base
Parameter count 14M 110M
Training steps 1M 100K
Warmup steps 10K 6K
Batch size 128 512
Peak learning rate 5e-4 5e-4
Sequence length 128 512

More details can be found in the paper: https://arxiv.org/abs/2209.14389

If you use these models, please use citation given below:

@article{krishna2022downstream,
  title={Downstream datasets make surprisingly good pretraining corpora},
  author={Krishna, Kundan and Garg, Saurabh and Bigham, Jeffrey P and Lipton, Zachary C},
  journal={arXiv preprint arXiv:2209.14389},
  year={2022}
}