Gorkem Ozkaya
This repo contains a pair of NMT models between English and Turkish (in both directions). One can download the pre-trained models from the releases
section and use the below Jupyter/Colab notebooks in the root directory as a reference for downloading and running the pre-trained models.
Interactive demos for the TF2 version of the models are available on HuggingFace 🤗 Spaces:
The models are trained on Google Cloud TPU's using the tensor2tensor library for the TF1 version, and with TensorFlow's official models library for the TF2 version. As the neural network architechture, the Transformer architecture is used with the transformer_tpu hyperparameter configuration.
- TFRC Tensorflow Research Cloud program for cloud TPU hours
- Opus parallel corpus for making the Turkish/English parallel corpus available
- Open Subtitles As being the original source of the movie subtitles parallel corpus. Also see
P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles.
In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
- SETIMES As the original source of the news articles corpus.