Readd code link for the Mogrifier LSTM

sebastianruder · Jan 29, 2020 · 6f6c3bb · 6f6c3bb
1 parent dc60028
commit 6f6c3bb
Showing 1 changed file with 12 additions and 12 deletions.
diff --git a/english/language_modeling.md b/english/language_modeling.md
@@ -20,12 +20,12 @@ per-word log-probability (lower is better).
 
 | Model           | Validation perplexity | Test perplexity | Number of params |  Paper / Source | Code |
 | ------------- | :-----:| :-----: | :-----: | -------------- | ---- |
-| Mogrifier LSTM + dynamic eval (Melis et al., 2019)      | 44.9  | 44.8  | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM + dynamic eval (Melis et al., 2019)      | 44.9  | 44.8  | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | AdvSoft + AWD-LSTM-MoS + dynamic eval (Wang et al., 2019) | 46.63 | 46.01 | 22M | [Improving Neural Language Modeling via Adversarial Training](http://proceedings.mlr.press/v97/wang19f/wang19f.pdf) | [Official](https://github.com/ChengyueGongR/advsoft) |
 | FRAGE + AWD-LSTM-MoS + dynamic eval (Gong et al., 2018) | 47.38 | 46.54 | 22M | [FRAGE: Frequency-Agnostic Word Representation](https://arxiv.org/abs/1809.06858) | [Official](https://github.com/ChengyueGongR/Frequency-Agnostic) |
 | AWD-LSTM-DOC x5 (Takase et al., 2018) | 48.63 | 47.17 | 185M | [Direct Output Connection for a High-Rank Language Model](https://arxiv.org/abs/1808.10143) | [Official](https://github.com/nttcslab-nlp/doc_lm) |
 | AWD-LSTM-MoS + dynamic eval (Yang et al., 2018)* | 48.33 | 47.69 | 22M | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | [Official](https://github.com/zihangdai/mos) |
-| Mogrifier LSTM (Melis et al., 2019)                     | 51.4  | 50.1  | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM (Melis et al., 2019)                     | 51.4  | 50.1  | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | AWD-LSTM + dynamic eval (Krause et al., 2017)* | 51.6 | 51.1 | 24M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) | [Official](https://github.com/benkrause/dynamic-evaluation) |
 | AWD-LSTM-DOC + Partial Shuffle (Press, 2019) ***preprint*** | 53.79 | 52.00 | 23M | [Partially Shuffling the Training Data to Improve Language Models](https://arxiv.org/abs/1903.04167) | [Official](https://github.com/ofirpress/PartialShuffle) |
 | AWD-LSTM-DOC (Takase et al., 2018) | 54.12 | 52.38 | 23M | [Direct Output Connection for a High-Rank Language Model](https://arxiv.org/abs/1808.10143) | [Official](https://github.com/nttcslab-nlp/doc_lm) |
@@ -46,14 +46,14 @@ consists of around 2 million words extracted from Wikipedia articles.
 
 | Model           | Validation perplexity | Test perplexity | Number of params | Paper / Source | Code |
 | ------------- | :-----:| :-----: | :-----: | -------------- | ---- |
-| Mogrifier LSTM + dynamic eval (Melis et al., 2019)      | 40.2  | 38.6  | 35M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM + dynamic eval (Melis et al., 2019)      | 40.2  | 38.6  | 35M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | AdvSoft + AWD-LSTM-MoS + dynamic eval (Wang et al., 2019) | 40.27 | 38.65 | 35M | [Improving Neural Language Modeling via Adversarial Training](http://proceedings.mlr.press/v97/wang19f/wang19f.pdf) | [Official](https://github.com/ChengyueGongR/advsoft) |
 | FRAGE + AWD-LSTM-MoS + dynamic eval (Gong et al., 2018) | 40.85 | 39.14 | 35M | [FRAGE: Frequency-Agnostic Word Representation](https://arxiv.org/abs/1809.06858) | [Official](https://github.com/ChengyueGongR/Frequency-Agnostic) |
 | AWD-LSTM-MoS + dynamic eval (Yang et al., 2018)* | 42.41 | 40.68 | 35M | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | [Official](https://github.com/zihangdai/mos) |
 | AWD-LSTM + dynamic eval (Krause et al., 2017)* | 46.4 | 44.3 | 33M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) | [Official](https://github.com/benkrause/dynamic-evaluation) |
 | AWD-LSTM + continuous cache pointer (Merity et al., 2017)* | 53.8 | 52.0 | 33M | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | [Official](https://github.com/salesforce/awd-lstm-lm) |
 | AWD-LSTM-DOC x5 (Takase et al., 2018) | 54.19 | 53.09 | 185M | [Direct Output Connection for a High-Rank Language Model](https://arxiv.org/abs/1808.10143) | [Official](https://github.com/nttcslab-nlp/doc_lm) |
-| Mogrifier LSTM (Melis et al., 2019)                     | 57.3  | 55.1  | 35M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM (Melis et al., 2019)                     | 57.3  | 55.1  | 35M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | AWD-LSTM-DOC + Partial Shuffle (Press, 2019) ***preprint*** | 60.16 | 57.85 | 37M | [Partially Shuffling the Training Data to Improve Language Models](https://arxiv.org/abs/1903.04167) | [Official](https://github.com/ofirpress/PartialShuffle) |
 | AWD-LSTM-DOC (Takase et al., 2018) | 60.29 | 58.03 | 37M | [Direct Output Connection for a High-Rank Language Model](https://arxiv.org/abs/1808.10143) | [Official](https://github.com/nttcslab-nlp/doc_lm) |
 | AWD-LSTM-MoS (Yang et al., 2018) | 63.88 | 61.45 | 35M | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | [Official](https://github.com/zihangdai/mos) |
@@ -115,14 +115,14 @@ Within these 100 million bytes are 205 unique tokens.
 | ---------------- | :-----: | :-----: | -------------- | ---- |
 | Transformer-XL + RMS dynamic eval (Krause et al., 2019)* ***arxiv preprint*** | 0.94 | 277M | [Dynamic Evaluation of Transformer Language Models](https://arxiv.org/pdf/1904.08378.pdf) | [Official](https://github.com/benkrause/dynamiceval-transformer) |
 | Compressive Transformer (Rae et al., 2019) ***arxiv preprint*** | 0.97 | - | [Compressive Transformers for Long-Range Sequence Modelling](https://arxiv.org/pdf/1911.05507.pdf) | - |
-| Mogrifier LSTM + dynamic eval (Melis et al., 2019)            | 0.988 | 96M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM + dynamic eval (Melis et al., 2019)            | 0.988 | 96M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | 24-layer Transformer-XL (Dai et al., 2018) ***under review*** | 0.99 | 277M | [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/pdf/1901.02860.pdf) | [Official](https://github.com/kimiyoung/transformer-xl) |
 | 18-layer Transformer-XL (Dai et al., 2018) ***under review*** | 1.03 | 88M | [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/pdf/1901.02860.pdf) | [Official](https://github.com/kimiyoung/transformer-xl) |
 | 12-layer Transformer-XL (Dai et al., 2018) ***under review*** | 1.06 | 41M | [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/pdf/1901.02860.pdf) | [Official](https://github.com/kimiyoung/transformer-xl) |
 | 64-layer Character Transformer Model (Al-Rfou et al., 2018) | 1.06 | 235M | [Character-Level Language Modeling with Deeper Self-Attention](https://arxiv.org/abs/1808.04444) ||
 | mLSTM + dynamic eval (Krause et al., 2017)* | 1.08 | 46M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) | [Official](https://github.com/benkrause/dynamic-evaluation) |
 | 12-layer Character Transformer Model (Al-Rfou et al., 2018) | 1.11 | 44M | [Character-Level Language Modeling with Deeper Self-Attention](https://arxiv.org/abs/1808.04444) ||
-| Mogrifier LSTM (Melis et al., 2019)            | 1.122 | 96M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) |  |
+| Mogrifier LSTM (Melis et al., 2019)            | 1.122 | 96M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | 3-layer AWD-LSTM (Merity et al., 2018)  | 1.232 | 47M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) | [Official](https://github.com/salesforce/awd-lstm-lm) |
 | Large mLSTM +emb +WN +VD (Krause et al., 2017) | 1.24 | 46M | [Multiplicative LSTM for sequence modelling](https://arxiv.org/abs/1609.07959) | [Official](https://github.com/benkrause/mLSTM) |
 | Large FS-LSTM-4 (Mujika et al., 2017) | 1.245 | 47M | [Fast-Slow Recurrent Neural Networks](https://arxiv.org/abs/1705.08639) | [Official](https://github.com/amujika/Fast-Slow-LSTM) |
@@ -150,8 +150,8 @@ The vocabulary of the words in the character-level dataset is limited to 10 000
 
 | Model           | Bit per Character (BPC) |  Number of params | Paper / Source | Code |
 | ---------------- | :-----: | :-----: | -------------- | ---- |
-| Mogrifier LSTM + dynamic eval (Melis et al., 2019)| 1.083 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) |  |
-| Mogrifier LSTM (Melis et al., 2019)               | 1.120 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) |  |
+| Mogrifier LSTM + dynamic eval (Melis et al., 2019)| 1.083 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
+| Mogrifier LSTM (Melis et al., 2019)               | 1.120 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | Trellis Network (Bai et al., 2019) | 1.159 | 13.4M | [Trellis Networks for Sequence Modeling](https://openreview.net/pdf?id=HyeVtoRqtQ) | [Official](https://github.com/locuslab/trellisnet)
 | 3-layer AWD-LSTM (Merity et al., 2018)  | 1.175 | 13.8M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) | [Official](https://github.com/salesforce/awd-lstm-lm) |
 | 6-layer QRNN (Merity et al., 2018)  | 1.187 | 13.8M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) | [Official](https://github.com/salesforce/awd-lstm-lm) |
@@ -168,17 +168,17 @@ The character-based [MWC](http://k-kawakami.com/research/mwc) dataset is a colle
 
 | Model           | Validation BPC | Test BPC | Number of params |  Paper / Source | Code |
 | ------------- | :-----:| :-----: | :-----: | -------------- | ---- |
-| Mogrifier LSTM + dynamic eval (Melis et al., 2019)| 1.200 | 1.187 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) |  |
-| Mogrifier LSTM (Melis et al., 2019)               | 1.312 | 1.298 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM + dynamic eval (Melis et al., 2019)| 1.200 | 1.187 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
+| Mogrifier LSTM (Melis et al., 2019)               | 1.312 | 1.298 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | HCLM with Cache (Kawakami et al. 2017)            | 1.591 | 1.538 |  8M | [Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling](https://arxiv.org/abs/1704.06986) |  |
 | LSTM (Kawakami et al. 2017)                       | 1.793 | 1.736 |  8M | [Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling](https://arxiv.org/abs/1704.06986) |  |
 
 #### MWC Finnish in the single text, large setting.
 
 | Model           | Validation BPC | Test BPC | Number of params |  Paper / Source | Code |
 | ------------- | :-----:| :-----: | :-----: | -------------- | ---- |
-| Mogrifier LSTM + dynamic eval (Melis et al., 2019)| 1.202 | 1.191 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
-| Mogrifier LSTM (Melis et al., 2019)               | 1.327 | 1.313 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | |
+| Mogrifier LSTM + dynamic eval (Melis et al., 2019)| 1.202 | 1.191 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
+| Mogrifier LSTM (Melis et al., 2019)               | 1.327 | 1.313 | 24M | [Mogrifier LSTM](http://arxiv.org/abs/1909.01792) | [Official](https://github.com/deepmind/lamb) |
 | HCLM with Cache (Kawakami et al. 2017)            | 1.754 | 1.711 |  8M | [Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling](https://arxiv.org/abs/1704.06986) |  |
 | LSTM (Kawakami et al. 2017)                       | 1.943 | 1.913 |  8M | [Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling](https://arxiv.org/abs/1704.06986) |  |