Improve a baseline NMT system trained on a very small parallel corpus using either monolingual data or parallel data in other languages
https://docs.google.com/presentation/d/1J6Xh0YfCSnQIcA6doUm7skZEG1ZqE50bUFehq72i6V4
https://yadi.sk/d/xUKsoX-G3T6ZYc
en-ru parallel data: https://translate.yandex.ru/corpus (just fill fields and you immediately get corpus)
https://trello.com/b/f3kcPkqm/low-resource-nmt
Install from local dir:
pip install -e tensor2tensor/
or directly from github:
pip install git+https://github.com/AlAntonov/tensor2tensor
Unsupervised Neural Machine Translation Using Monolingual Corpora Only
Dual learning for Machine Translation
Transfer Learning for Low-Resource Neural Machine Translation
Adversarial Neural Machine Translation
Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets
On Using Monolingual Corpora in Neural Machine Translation
Improving Neural Machine Translation Models with Monolingual Data
Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization
Exploiting Source-side Monolingual Data in Neural Machine Translation
Unsupervised Pretraining for Sequence to Sequence Learning
Universal Neural Machine Translation for Extremely Low Resource Languages
Joint Training for Neural Machine Translation Models with Monolingual Data
Unsupervised Neural Machine Translation
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
Learning bilingual word embeddings with (almost) no bilingual data
Effective Domain Mixing for Neural Machine Translation
Phrase-Based & Neural Unsupervised Machine Translation
- place your data in data/t2t_data/* (en.train.txt, he.train.txt - train, dev, test generate from these files)
- run he-en_translation.sh (with options, train/dev/test sizes, etc - check script for more info)
- Sizes can be fractional. For example: he-en_translation.sh --train_size 0.4 --test_size 0.3 --dev_size 0.3
- compute_bleu.py with flag --bootstrap return 95% confidence interval