Skip to content

Latest commit

 

History

History
41 lines (30 loc) · 1.41 KB

README.md

File metadata and controls

41 lines (30 loc) · 1.41 KB

Lemmatization Dataset

The dataset has been curated from https://www.isical.ac.in/~utpal/resources.php. The raw text was collected from a collection of Rabindranath Tagore’s short stories and news articles from various domains.

Dataset

Each of the following files contains word and its lemma form.

  • train.txt
  • dev.txt
  • test.txt

Licensing

The original dataset does not provide any license information.

Citation

Please cite the following papers if you are using the data:

@article{alam2021review,
  title={A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models},
  author={Alam, Firoj and Hasan, Md Arid and Alam, Tanvir and Khan, Akib and Tajrin, Janntatul and Khan, Naira and Chowdhury, Shammur Absar},
  journal={arXiv preprint arXiv:2107.03844},
  year={2021}
}

@inproceedings{chakrabarty-etal-2017-context,
 address = {Vancouver, Canada},
 author = {Chakrabarty, Abhisek  and Pandit, Onkar Arun  and Garain, Utpal},
 booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
 doi = {10.18653/v1/P17-1136},
 pages = {1481--1491},
 publisher = {Association for Computational Linguistics},
 title = {Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks},
 url = {https://www.aclweb.org/anthology/P17-1136},
 year = {2017}
}