This repository contains State of the Art Language models and Classifier for Punjabi language (spoken in Indian sub-continent)
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Architecture/Dataset | Punjabi Wikipedia Articles |
---|---|
ULMFiT | 24.40 |
TransformerXL | 14.03 |
Dataset | Accuracy | MCC | Notebook to Reproduce results |
---|---|---|---|
IndicNLP News Article Classification Dataset - Punjabi | 97.12 | 96.17 | Link |
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Architecture | Visualization |
---|---|
ULMFiT | Encodings projection |
Download pretrained Language Models from here
Unsupervised training using Google's sentencepiece
Download the trained model and vocabulary from here