Description | Parameters | Dataset | Model and Test set(s) |
---|---|---|---|
Adaptive Inputs (Baevski and Auli, 2018) |
1026M | Google Billion Words | download (.tar.bz2) |
Adaptive Inputs (Baevski and Auli, 2018) |
247M | WikiText-103 | download (.tar.bz2) |
First, see the general language modeling README for instructions on preprocessing the WikiText-103 data.
Then use the following training command to train a model with adaptive inputs
using the transformer_lm_wiki103
model architecture:
fairseq-train --task language_modeling \
data-bin/wikitext-103 \
--save-dir checkpoints/transformer_wikitext-103 \
--arch transformer_lm_wiki103 \
--max-update 286000 --max-lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
--warmup-updates 16000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 \
--criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
--sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d
@inproceedings{
baevski2018adaptive,
title={Adaptive Input Representations for Neural Language Modeling},
author={Alexei Baevski and Michael Auli},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=ByxZX20qFQ},
}