COGS 109 Modeling and Data Analysis (Final Project)
- Data Processing & Modeling
- Natural Language Processing
With natural language processing (NLP) as our motivation, we came to decide on a topic that will be using text-based data for our analysis. As our first NLP project, we decided to go with a text classifier to ease our way into the domain of NLP. This in part lead us to use the well known 20newsgroup dataset, imported from sk-learn itself.
By implementing a text classifier we would hope to get some insight to how features of a body of text contributes to its own "meaning/purpose". This analysis will look into methods that one can use to classify documents within the context of news papers, to identify the entirety of a given text. We would hope that this project brings up the awareness of how aspects of a document affect their ability to be classified.
Final Project Write-up: Text Classifier: Performance Analysis
Data Set: 20newsgroups