GitHub - altsoph/EENLP: The broad index of NLP resources for Eastern European languages. The best EEML 2021 project.

About

This repo contains a curated meta-index of NLP datasets and models for Eastern European languages. It originally started as a summer school project at EEML 2021 (Eastern European Machine Learning Summer School) (hence the scope), self-organized by a group of participants. You can read more details about this initial summer school project here.

We hope this broad index of NLP resources for Eastern European languages could help:

facilitate the synergy of Eastern European NLP research communities;
highlight the underrepresented languages of Eastern Europe;
understand cross-cultural and cross-linguistic differences;
decrease the digital language divide.

Initially, EENLP was biased towards datasets for semantic NLP tasks such as sentiment analysis, NLI, word sense disambiguation, etc. However, we are expanding and improving this index further, so feel free to contribute new relevant resources. We are also happy to hear your feedback and suggestions via issues or at altsoph@gmail.com.

Resources

The datasets

Browse the datasets index or select your language of interest:

🇦🇱 🇦🇲 🇧🇾 🇧🇦 🇧🇬 🇭🇷 🇨🇿 🇪🇪 🇬🇪 🇭🇺 🇰🇿 🇱🇻 🇱🇹 🇲🇰 🇲🇩 🇲🇪 🇵🇱 🇷🇴 🇷🇺 🇷🇸 🇸🇰 🇸🇮 🇺🇦

The models

Browse the models index or select your language of interest:

🇦🇱 🇦🇲 🇧🇾 🇧🇦 🇧🇬 🇭🇷 🇨🇿 🇪🇪 🇬🇪 🇭🇺 🇰🇿 🇱🇻 🇱🇹 🇲🇰 🇲🇩 🇲🇪 🇵🇱 🇷🇴 🇷🇺 🇷🇸 🇸🇰 🇸🇮 🇺🇦

Contribution

Feel free to contribute. The details are in our contributing guidelines.

Citation

@misc{tikhonov2021eenlp,
      title={EENLP: Cross-lingual Eastern European NLP Index}, 
      author={Alexey Tikhonov and Alex Malkhasov and Andrey Manoshin and George Dima and Réka Cserháti and Md. Sadek Hossain Asif and Matt Sárdi},
      year={2021},
      eprint={2108.02605},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Licensing

This index is licensed under Apache-2.0 License. However, please, note that each resource has individual licensing properties.

Development

This is mostly internal documentation for us.

See developing this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 337 Commits
.github		.github
.idea		.idea
.vscode		.vscode
build_benchmarks		build_benchmarks
data		data
docs		docs
eenlp		eenlp
eval_benchmarks		eval_benchmarks
notebooks		notebooks
tests/data/paraphrase_detection		tests/data/paraphrase_detection
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Resources

The datasets

🇦🇱 🇦🇲 🇧🇾 🇧🇦 🇧🇬 🇭🇷 🇨🇿 🇪🇪 🇬🇪 🇭🇺 🇰🇿 🇱🇻 🇱🇹 🇲🇰 🇲🇩 🇲🇪 🇵🇱 🇷🇴 🇷🇺 🇷🇸 🇸🇰 🇸🇮 🇺🇦

The models

🇦🇱 🇦🇲 🇧🇾 🇧🇦 🇧🇬 🇭🇷 🇨🇿 🇪🇪 🇬🇪 🇭🇺 🇰🇿 🇱🇻 🇱🇹 🇲🇰 🇲🇩 🇲🇪 🇵🇱 🇷🇴 🇷🇺 🇷🇸 🇸🇰 🇸🇮 🇺🇦

Contribution

Citation

Licensing

Development

About

Releases

Packages

Contributors 3

Languages

License

altsoph/EENLP

Folders and files

Latest commit

History

Repository files navigation

About

Resources

The datasets

🇦🇱 🇦🇲 🇧🇾 🇧🇦 🇧🇬 🇭🇷 🇨🇿 🇪🇪 🇬🇪 🇭🇺 🇰🇿 🇱🇻 🇱🇹 🇲🇰 🇲🇩 🇲🇪 🇵🇱 🇷🇴 🇷🇺 🇷🇸 🇸🇰 🇸🇮 🇺🇦

The models

🇦🇱 🇦🇲 🇧🇾 🇧🇦 🇧🇬 🇭🇷 🇨🇿 🇪🇪 🇬🇪 🇭🇺 🇰🇿 🇱🇻 🇱🇹 🇲🇰 🇲🇩 🇲🇪 🇵🇱 🇷🇴 🇷🇺 🇷🇸 🇸🇰 🇸🇮 🇺🇦

Contribution

Citation

Licensing

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages