Skip to content

RooieRakkert/Natural-Language-Identification-Graduate-Project

 
 

Repository files navigation

Natural Language Identification Machine Learning Pipeline

Graduate Project for Harvard's Python for Data Science (CSCI E - 29)

In this project, I pulled text data from European Parliament Proceedings in 21 languages. Using Scikit-Learn, I transformed the raw text into a numerical feature matrix, and trained a Multinomial naive bayes probability model to classify input language with greater than 99% accuracy.

Data Source: http://www.statmt.org/europarl/

About

Graduate Project for Harvard's Python for Data Science (CSCI E - 29)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%