Name		Name	Last commit message	Last commit date
parent directory ..
.DS_Store		.DS_Store
LICENSE_CC_BY_NC_SA_4.0.txt		LICENSE_CC_BY_NC_SA_4.0.txt
LICENSE_MIT.txt		LICENSE_MIT.txt
README.md		README.md
dev.tsv		dev.tsv
test.tsv		test.tsv
train.tsv		train.tsv

README.md

News Categorization Dataset

The dataset is originally hosted on https://github.com/AI4Bharat/indicnlp_corpus. We curated it from the work of Bangla Text Classification using Transformers.

Dataset

The dataset contains six different class labels for news categorization task and is available with training, development, and test splits with 11,284, 1,411, and 1,411 news articles, respectively.

Directory Structure:

train.tsv
dev.tsv
test.tsv

Licensing

The dataset is licensed under CC BY-NC-SA 4.0.

Citation

Please cite the following papers if you are using the data:

@article{alam2021review,
  title={A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models},
  author={Alam, Firoj and Hasan, Md Arid and Alam, Tanvir and Khan, Akib and Tajrin, Janntatul and Khan, Naira and Chowdhury, Shammur Absar},
  journal={arXiv preprint arXiv:2107.03844},
  year={2021}
}
@article{alam2020bangla,
  title={Bangla Text Classification using Transformers},
  author={Alam, Tanvirul and Khan, Akib and Alam, Firoj},
  journal={arXiv preprint arXiv:2011.04446},
  year={2020}
}

@article{kunchukuttan2020ai4bharat,
 author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
 journal = {arXiv preprint arXiv:2005.00085},
 title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
 year = {2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

news_categorization

news_categorization

README.md

News Categorization Dataset

Dataset

Directory Structure:

Licensing

Citation

Files

news_categorization

Directory actions

More options

Directory actions

More options

Latest commit

History

news_categorization

Folders and files

parent directory

README.md

News Categorization Dataset

Dataset

Directory Structure:

Licensing

Citation