This is a list of initiatives for adding new languages to opensource machine translation models (such as NLLB).
Also, some notable projects for increasing the translation quality for an already supported low-resourced language would be highlighted.
The first part of the document lists individual languages in the alphabetic order of their English names.
The second part of the document lists multilingual initiatives.
Any new additions are welcome (in the form of pull requests or issues)!
- Code and description: https://github.com/lolismek/AroTranslate
- Paper: https://arxiv.org/abs/2410.17728
- Interface: https://arotranslate.com/
- Press: https://burunen.ru/news/society/107048 (in Russian)
- Interface: https://translate-bur.ru/
- Model: https://huggingface.co/SaranaAbidueva/nllb-200-bxr-ru
- Interface: https://www.zedzek.com/en
- Interface: https://lango.to/
- Paper (for an old version): https://aclanthology.org/2022.fieldmatters-1.6/
Additionally, see TartuNLP.
See TartuNLP
- Model: https://huggingface.co/Salavat/nllb-200-distilled-600M-finetuned-isv_v2
- Demo: https://huggingface.co/spaces/Salavat/Interslavic-Translator-NLLB200
- Presentation: https://www.youtube.com/watch?v=BiNrza83Gvw
- Interface: https://tahrirchi.uz/uz/translator
See TartuNLP
lez
, lezg1247
- Model: https://huggingface.co/leks-forever/nllb-200-distilled-600M
- Code: https://github.com/leks-forever/nllb-tuning
- Demo: https://huggingface.co/spaces/leks-forever/lezghian-nllb-200-distilled-600M
- Description (in Russian): in a Telegram channel
See TartuNLP
See TartuNLP
See TartuNLP
See TartuNLP
- Github: https://github.com/TBSj/Qarachay_Malqar_translator
- Model: https://huggingface.co/TSjB/NLLB-201-600M-QM-V1
- Blog post (rus): https://habr.com/ru/articles/829248/
- Blog: https://cointegrated.medium.com/a37fc706b865
- Interface: https://tyvan.ru/
See TartuNLP
Multiple Finno-Ugric languages (including Komi, Udmurt, Hill and Meadow Mari, Erzya, Livonian, Mansi, Moksha and Livvi Karelian)
- Paper (an early one): https://aclanthology.org/2022.wmt-1.33/
- Paper: https://aclanthology.org/2023.nodalida-1.77.pdf
- Interface: https://translate.ut.ee/
- Model: https://huggingface.co/tartuNLP/smugri3_14-finno-ugric-nmt
Indigenous languages of the Americas (including Ashaninka, Aymara, Bribri, Chatino, Guarani, Hñähñu, Nahuatl, Quechua, Raramuri, Shipibo-Konibo, and Wixarika from the AmericasNLP Mt shared task, and Wayuunaiki, Arhuaco, Inga, and Nasa – additionally)
- Paper: https://aclanthology.org/2023.americasnlp-1.19.pdf
- Paper: https://aclanthology.org/2024.americasnlp-1.22.pdf
- Paper: https://aclanthology.org/2024.americasnlp-1.2.pdf
Apertium is a system of rule-based machine translation.
Currently, it has linguistic tools (such as dictionaries and morphological parsers) for an insane number of languages, but only few of them (51 language pairs) have been developed to a state considered stable enough for publicly releasing a translation service.
- Code: https://github.com/apertium
- Interface (with only a subset of the most stable language pairs): https://www.apertium.org/