Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEBLEU example with TeacherMaskSoftmaxEmbeddingHelper and Triggers #45

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
d7e3910
add differentiable_expected_bleu loss
wwt17 Sep 30, 2018
b83a145
modify DEBLEU loss interface from logits to probs
wwt17 Oct 5, 2018
87bd449
add TeacherMaskSoftmaxEmbeddingHelper
wwt17 Oct 5, 2018
a730a25
change API of sess
wwt17 Oct 5, 2018
86c2f9e
add xe ; refine configs
wwt17 Oct 5, 2018
e10c78b
fix a typo in doc
wwt17 Oct 6, 2018
bdbca3b
add summary and checkpoints ; add train configs
wwt17 Oct 6, 2018
ffbf14d
remove duplicated config
wwt17 Oct 6, 2018
7e5b92b
copy tf.batch_gather
wwt17 Oct 6, 2018
d8f7449
config dataset val=test
zcyang Oct 6, 2018
4bf0d1e
Merge branch 'DEBLEU' of https://github.com/wwt17/texar into DEBLEU
zcyang Oct 6, 2018
9bdbe09
add triggers ; now the whole code is runnable
wwt17 Oct 7, 2018
d04e4e0
add learning rate
zcyang Oct 7, 2018
92f1498
Merge branch 'DEBLEU' of https://github.com/wwt17/texar into DEBLEU
zcyang Oct 7, 2018
3f74126
add mask summary ; fix action
zcyang Oct 8, 2018
69129f4
fix random shift bug
zcyang Oct 8, 2018
142665f
don't restore Adam status
zcyang Oct 8, 2018
c836e53
fix save path
Oct 8, 2018
ffe568d
add flags.restore_adam
Oct 8, 2018
73d1c7b
add global_step onto saved ckpt
Oct 8, 2018
bf92f2e
add flags.restore_mask
Oct 10, 2018
9fe74cb
remove config_model_full.py ; rename debleu ; rename some arguments ;…
Oct 13, 2018
9dcde6a
fix checkpoint save and restore bug
Oct 13, 2018
038478e
refine trigger
Oct 13, 2018
101d5a1
refine trigger
Oct 13, 2018
b293bc1
add trigger save & restore (not tested yet)
Oct 14, 2018
9b2b382
move module triggers into texar/utils
Oct 14, 2018
190d5b3
refine codes
Oct 14, 2018
a4fdd5a
add comments to debleu.py
Oct 14, 2018
77c0a52
add name_scope to TeacherMaskSoftmaxEmbeddingHelper
Oct 14, 2018
c70b8e2
fix lr decay boundaries
Oct 14, 2018
6daaac8
fix save trigger path
Oct 14, 2018
afacfe9
add docs
wwt17 Oct 14, 2018
0794ddc
add more trigger docs
wwt17 Oct 15, 2018
0d3e187
update README.md
wwt17 Oct 15, 2018
b095785
rename some filenames ; add val/test datasets
Oct 15, 2018
06c5727
add config_train_iwslt14_en-fr.py
Oct 15, 2018
5305d38
update README.md
Oct 15, 2018
3aab0a6
replace moses bleu by nltk bleu
Oct 16, 2018
8ca85a9
modify model
Oct 16, 2018
78b6994
refine models
Oct 17, 2018
82bc6a8
refine summary ; batch_size=160
Oct 18, 2018
fffd648
remove exponetial decay configs ; fix summary bug
Oct 18, 2018
6d07aa1
add stages
Oct 19, 2018
923ea8c
add config_train
Oct 19, 2018
56b44c7
modify 2-layer encoder to 1-layer
Oct 20, 2018
c6991c8
change configs to bowen's
Oct 20, 2018
7e89acf
open trigger file in binary mode
Oct 20, 2018
de78471
add binary mode
Oct 20, 2018
b822f39
use new datasets ; reinitialize optimizer when annealing ; modify con…
Oct 22, 2018
9d6e4bb
replace name_scope by variable_scope in TeacherMaskSoftmaxEmbeddingHe…
Oct 22, 2018
ed1f6f3
fix lr bug
Oct 22, 2018
3cfc217
reset model and configs to those in pytorch codes ; fix connector bug…
Oct 29, 2018
18da2c7
anneal to bs160 4:2 mask ; reinitialize mask after restoring
Oct 29, 2018
1ac619d
add lr1e6_1_0.py config
Oct 30, 2018
c227c28
add more model configs
Nov 2, 2018
316e41c
refine code ; now everything is automatical
Nov 3, 2018
0f157e8
make mask pattern Tensors and use placeholder
Nov 4, 2018
c4c4288
reconstruct triggers ; modify code
Nov 4, 2018
2b1fe5a
add test units for triggers
Nov 5, 2018
ec20a9e
rewrite ScheduledStepsTrigger; correct and refine some docs TODO: 1.…
Nov 5, 2018
ad56c3e
fix final annealing bug
Nov 5, 2018
1f3e212
add config restore_from
Nov 5, 2018
8988209
add test units for ScheduledStepsTrigger and fix some bugs
Nov 6, 2018
5851220
fix docs for triggers
wwt17 Nov 6, 2018
8fdf62e
remove unfinished MovingAverageConvergenceTrigger
wwt17 Nov 6, 2018
3b58883
update README.md
wwt17 Nov 6, 2018
7b673ab
merge master
Nov 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/code/losses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,14 @@ Entropy
.. autofunction:: texar.losses.sequence_entropy_with_logits


DEBLEU
==================

:hidden:`debleu`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: texar.losses.debleu


Loss Utils
===========

Expand Down
5 changes: 5 additions & 0 deletions docs/code/modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ Decoders
.. autoclass:: texar.modules.GumbelSoftmaxEmbeddingHelper
:members:

:hidden:`TeacherMaskSoftmaxEmbeddingHelper`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.modules.TeacherMaskSoftmaxEmbeddingHelper
:members:

:hidden:`get_helper`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: texar.modules.get_helper
Expand Down
18 changes: 18 additions & 0 deletions docs/code/utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -278,3 +278,21 @@ AverageRecorder
==========================
.. autoclass:: texar.utils.AverageRecorder
:members:

Trigger
==========================

:hidden:`Trigger`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.utils.Trigger
:members:

:hidden:`ScheduledStepsTrigger`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.utils.ScheduledStepsTrigger
:members:

:hidden:`BestEverConvergenceTrigger`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.utils.BestEverConvergenceTrigger
:members:
48 changes: 48 additions & 0 deletions examples/differentiable_expected_bleu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Seq2seq Model #

This example builds an attentional seq2seq model for machine translation trained with Differentiable Expected BLEU (DEBLEU) and Teacher Mask. See https://openreview.net/pdf?id=S1x2aiRqFX for the implemented paper.

### Dataset ###

* iwslt14: The benchmark [IWSLT2014](https://sites.google.com/site/iwsltevaluation2014/home) (de-en) machine translation dataset.

Download the data with the following cmds:

```bash
python prepare_data.py --data de-en
```

### Train the model ###

Train the model with the following cmd:

```bash
python differentiable_expected_bleu.py --config_model config_model_medium --config_data config_data_iwslt14_de-en --config_train config_train --expr_name iwslt14_de-en --restore_from "" --reinitialize
```

Here:
* `--config_model` specifies the model config. Note not to include the `.py` suffix.
* `--config_data` specifies the data config.
* `--config_train` specifies the training config.
* `--expr_name` specifies the experiment name. Used as the directory name to save and restore all information.
* `--restore_from` specifies the checkpoint path to restore from. If not specified (or an empty string is specified), the latest checkpoint in `expr_name` is restored.
* `--reinitialize` is a flag indicates whether to reinitialize the state of the optimizers before training and after annealing. Default is enabled.

[config_model_medium.py](./config_model_medium.py) specifies a single-layer seq2seq model with Luong attention and bi-directional RNN encoder.

[config_model_large.py](./config_model_large.py) specifies a seq2seq model with Luong attention, 2-layer bi-directional RNN encoder, single-layer RNN decoder, and a connector between the final state of the encoder and the initial state of the decoder. The size of this model is quite large.

[config_data_iwslt14_de-en.py](./config_data_iwslt14_de-en.py) specifies the IWSLT'14 German-English dataset.

[config_train.py](./config_train.py) specifies the training (including annealing) configs.

## Results ##

On the IWSLT'14 German-English dataset, we ran both configs for 4~5 times. Here are the average BLEU scores attained:

| config | inference beam size | Cross-Entropy baseline | DEBLEU | improvement |
| :------------------------------------------------: | :-----------------: | :--------------------: | :----: | :---------: |
| [config_model_medium.py](./config_model_medium.py) | 1 | 26.12 | 27.40 | 1.28 |
| [config_model_medium.py](./config_model_medium.py) | 5 | 27.03 | 27.72 | 0.70 |
| [config_model_large.py](./config_model_large.py) | 1 | 25.24 | 26.47 | 1.23 |
| [config_model_large.py](./config_model_large.py) | 5 | 26.33 | 26.87 | 0.54 |
59 changes: 59 additions & 0 deletions examples/differentiable_expected_bleu/config_data_iwslt14_de-en.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
source_vocab_file = 'data/iwslt14_de-en/vocab.de'
target_vocab_file = 'data/iwslt14_de-en/vocab.en'

train_0 = {
'batch_size': 80,
'allow_smaller_final_batch': False,
'source_dataset': {
"files": 'data/iwslt14_de-en/train.de',
'vocab_file': source_vocab_file,
'max_seq_length': 50
},
'target_dataset': {
'files': 'data/iwslt14_de-en/train.en',
'vocab_file': target_vocab_file,
'max_seq_length': 50
},
}

train_1 = {
'batch_size': 160,
'allow_smaller_final_batch': False,
'source_dataset': {
"files": 'data/iwslt14_de-en/train.de',
'vocab_file': source_vocab_file,
'max_seq_length': 50
},
'target_dataset': {
'files': 'data/iwslt14_de-en/train.en',
'vocab_file': target_vocab_file,
'max_seq_length': 50
},
}


val = {
'batch_size': 80,
'shuffle': False,
'source_dataset': {
"files": 'data/iwslt14_de-en/valid.de',
'vocab_file': source_vocab_file,
},
'target_dataset': {
'files': 'data/iwslt14_de-en/valid.en',
'vocab_file': target_vocab_file,
},
}

test = {
'batch_size': 80,
'shuffle': False,
'source_dataset': {
"files": 'data/iwslt14_de-en/test.de',
'vocab_file': source_vocab_file,
},
'target_dataset': {
'files': 'data/iwslt14_de-en/test.en',
'vocab_file': target_vocab_file,
},
}
45 changes: 45 additions & 0 deletions examples/differentiable_expected_bleu/config_data_iwslt14_en-fr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
source_vocab_file = 'data/iwslt14_en-fr/vocab.en'
target_vocab_file = 'data/iwslt14_en-fr/vocab.fr'

batch_size = 80

train = {
'batch_size': batch_size,
'allow_smaller_final_batch': False,
'source_dataset': {
"files": 'data/iwslt14_en-fr/train.en',
'vocab_file': source_vocab_file,
'max_seq_length': 50
},
'target_dataset': {
'files': 'data/iwslt14_en-fr/train.fr',
'vocab_file': target_vocab_file,
'max_seq_length': 50
},
}

val = {
'batch_size': batch_size,
'shuffle': False,
'source_dataset': {
"files": 'data/iwslt14_en-fr/valid.en',
'vocab_file': source_vocab_file,
},
'target_dataset': {
'files': 'data/iwslt14_en-fr/valid.fr',
'vocab_file': target_vocab_file,
},
}

test = {
'batch_size': batch_size,
'shuffle': False,
'source_dataset': {
"files": 'data/iwslt14_en-fr/test.en',
'vocab_file': source_vocab_file,
},
'target_dataset': {
'files': 'data/iwslt14_en-fr/test.fr',
'vocab_file': target_vocab_file,
},
}
39 changes: 39 additions & 0 deletions examples/differentiable_expected_bleu/config_model_large.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Attentional Seq2seq model.
# Hyperparameters not specified here will take the default values.

num_units = 1000
embedding_dim = 500

embedder = {
'dim': embedding_dim
}

encoder = {
'rnn_cell_fw': {
'kwargs': {
'num_units': num_units
},
'num_layers': 2
},
'output_layer_fw': {
'dropout_rate': 0
}
}

connector = {
'activation_fn': 'tanh'
}

decoder = {
'rnn_cell': {
'kwargs': {
'num_units': num_units
},
},
'attention': {
'kwargs': {
'num_units': num_units,
},
'attention_layer_size': num_units
}
}
40 changes: 40 additions & 0 deletions examples/differentiable_expected_bleu/config_model_medium.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Attentional Seq2seq model.
# Hyperparameters not specified here will take the default values.

num_units = 256
embedding_dim = 256
dropout = 0.2

embedder = {
'dim': embedding_dim
}

encoder = {
'rnn_cell_fw': {
'kwargs': {
'num_units': num_units
},
'dropout': {
'input_keep_prob': 1. - dropout
}
}
}

connector = None

decoder = {
'rnn_cell': {
'kwargs': {
'num_units': num_units
},
'dropout': {
'input_keep_prob': 1. - dropout
}
},
'attention': {
'kwargs': {
'num_units': num_units,
},
'attention_layer_size': num_units
}
}
80 changes: 80 additions & 0 deletions examples/differentiable_expected_bleu/config_train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
max_epochs = 1000
steps_per_eval = 500
tau = 1.
infer_beam_width = 1
infer_max_decoding_length = 50

threshold_steps = 10000
minimum_interval_steps = 10000
phases = [
# (config_data, config_train, mask_pattern)
("train_0", "xe_0", None),
("train_0", "xe_1", None),
("train_0", "debleu_0", (2, 2)),
("train_1", "debleu_0", (4, 2)),
("train_1", "debleu_1", (1, 0)),
]

train_xe_0 = {
"optimizer": {
"type": "AdamOptimizer",
"kwargs": {
"learning_rate": 1e-3
}
},
"gradient_clip": {
"type": "clip_by_global_norm",
"kwargs": {
"clip_norm": 5.
}
},
"name": "XE_0"
}

train_xe_1 = {
"optimizer": {
"type": "AdamOptimizer",
"kwargs": {
"learning_rate": 1e-5
}
},
"gradient_clip": {
"type": "clip_by_global_norm",
"kwargs": {
"clip_norm": 5.
}
},
"name": "XE_1"
}

train_debleu_0 = {
"optimizer": {
"type": "AdamOptimizer",
"kwargs": {
"learning_rate": 1e-5
}
},
"gradient_clip": {
"type": "clip_by_global_norm",
"kwargs": {
"clip_norm": 5.
}
},
"name": "DEBLEU_0"
}

train_debleu_1 = {
"optimizer": {
"type": "AdamOptimizer",
"kwargs": {
"learning_rate": 1e-6
}
},
"gradient_clip": {
"type": "clip_by_global_norm",
"kwargs": {
"clip_norm": 5.
}
},
"name": "DEBLEU_1"
}
Loading