This repository contains Language model for Code mixed Hinglish (Hindi and English) - spoken in Indian sub-continent.
Methodology followed in this repo is detailed in this paper, accepted at Dravidian-Codemix-HASOC2020@FIRE2020
Architecture/Dataset | Synthetically Generated Wikipedia Articles Dataset |
---|---|
ULMFiT | 86.48 |
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
Download pretrained ULMFiT LM from here
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here