NLP

Collection of NLP projects, mostly relating to Mongolian. Datasets and model outputs are not pushed to Github because of file size. The notebooks assume that you have the data extracted to a data folder.

Eduge Classification

Folder: 01_eduge_classification

This is a classification task for the Eduge dataset of 70 thousand Mongolian online news articles with labeled news categories. This project uses Fast.ai. A shortened process is used:

Training ULMFiT language model on Eduge dataset
Train classifier

After training a language model using the Eduge dataset, a classifier accuracy of 93.5% is reached after 10 epochs. Training the classifier without the language model gives an accuracy of roughly 90.5% after 10 epochs.

ULMFiT Mongolian Language Model

Folder: 02_mongolian_language_model

This is a project to create a general language model using the ULMFiT method proposed by the Fast.ai co-authors Jeremy Howard and Sebastian Ruder. See the folder for more details and link to the completed model.

Result is a general language model that can be used for transfer learning on a variety of tasks. Notebook show the complete three stage process:

Pre-training language model
Fine tuning model on target dataset
Training the classifier

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
01_eduge_classification		01_eduge_classification
02_mongolian_language_model		02_mongolian_language_model
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP

Eduge Classification

ULMFiT Mongolian Language Model

About

Releases

Packages

Languages

robertritz/NLP

Folders and files

Latest commit

History

Repository files navigation

NLP

Eduge Classification

ULMFiT Mongolian Language Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages