lowResourceNMT

Improve a baseline NMT system trained on a very small parallel corpus using either monolingual data or parallel data in other languages

Presentation

https://docs.google.com/presentation/d/1J6Xh0YfCSnQIcA6doUm7skZEG1ZqE50bUFehq72i6V4

Datasets:

https://yadi.sk/d/xUKsoX-G3T6ZYc

en-ru parallel data: https://translate.yandex.ru/corpus (just fill fields and you immediately get corpus)

Workflow board:

https://trello.com/b/f3kcPkqm/low-resource-nmt

Tensor2Tensor

Forked Tensor2Tensor version

Install from local dir:

pip install -e tensor2tensor/

or directly from github:

pip install git+https://github.com/AlAntonov/tensor2tensor

Results

Read articles:

Attention Is All You Need

Unsupervised Neural Machine Translation Using Monolingual Corpora Only

Zero-shot translation

Dual learning for Machine Translation

Transfer Learning for Low-Resource Neural Machine Translation

Adversarial Neural Machine Translation

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

On Using Monolingual Corpora in Neural Machine Translation

Improving Neural Machine Translation Models with Monolingual Data

Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization

Exploiting Source-side Monolingual Data in Neural Machine Translation

Unsupervised Pretraining for Sequence to Sequence Learning

Universal Neural Machine Translation for Extremely Low Resource Languages

Joint Training for Neural Machine Translation Models with Monolingual Data

Unsupervised Neural Machine Translation

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance

Learning bilingual word embeddings with (almost) no bilingual data

Effective Domain Mixing for Neural Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

How to run train-and-evaluation with he-en:

place your data in data/t2t_data/* (en.train.txt, he.train.txt - train, dev, test generate from these files)
run he-en_translation.sh (with options, train/dev/test sizes, etc - check script for more info)

Features:

Sizes can be fractional. For example: he-en_translation.sh --train_size 0.4 --test_size 0.3 --dev_size 0.3
compute_bleu.py with flag --bootstrap return 95% confidence interval

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
data		data
experiments/embeddings		experiments/embeddings
scripts		scripts
translate_enhe		translate_enhe
.gitignore		.gitignore
Ideas.md		Ideas.md
LICENSE		LICENSE
README.md		README.md
Results.md		Results.md
en-ru_approx_bleu_scores.png		en-ru_approx_bleu_scores.png
hebrew_description.md		hebrew_description.md
translit_dist.png		translit_dist.png
transliteration_compairing.txt		transliteration_compairing.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lowResourceNMT

Presentation

Datasets:

Workflow board:

Tensor2Tensor

Results

Read articles:

How to run train-and-evaluation with he-en:

Features:

About

Releases

Packages

Contributors 5

Languages

License

awant/lowResourceNMT

Folders and files

Latest commit

History

Repository files navigation

lowResourceNMT

Presentation

Datasets:

Workflow board:

Tensor2Tensor

Results

Read articles:

How to run train-and-evaluation with he-en:

Features:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages