IMDB Movie Reviews

In this project, I perform the following steps:

Data Import and HTML Cleaning
Used Sklearn's count vectorizer to generate word frequencies for wordcloud
Used SpaCy's part-of-speech tagging to generate noun frequencies for wordcloud of nouns
Generated seaborn plots to visualize the length of response and the class balance of sentiment
Used Sklearn's TF-IDF vectorizer to transform the review into sparse form
Used Sklearn's validation_curve to cross-validate hyperparameters for logistic regression and K nearest neighbors and visualize the change in accuracy for the setting of hyperparameters
Trained Sklearn's LinearSVC to compare to best scoring models
Concluded that logistic regression (with C = ~ 5) is the best performing model

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
nlp_movie_reviews.ipynb		nlp_movie_reviews.ipynb

Provide feedback