Skip to content

This project shows the process for exploratory data analysis, text vectorization, and model selection for a classifier predicting sentiment for movie reviews.

Notifications You must be signed in to change notification settings

nplotko/IMDB-movie-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

IMDB Movie Reviews

In this project, I perform the following steps:

  • Data Import and HTML Cleaning
  • Used Sklearn's count vectorizer to generate word frequencies for wordcloud
  • Used SpaCy's part-of-speech tagging to generate noun frequencies for wordcloud of nouns
  • Generated seaborn plots to visualize the length of response and the class balance of sentiment
  • Used Sklearn's TF-IDF vectorizer to transform the review into sparse form
  • Used Sklearn's validation_curve to cross-validate hyperparameters for logistic regression and K nearest neighbors and visualize the change in accuracy for the setting of hyperparameters
  • Trained Sklearn's LinearSVC to compare to best scoring models
  • Concluded that logistic regression (with C = ~ 5) is the best performing model

About

This project shows the process for exploratory data analysis, text vectorization, and model selection for a classifier predicting sentiment for movie reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published