- Preparing the IMDb movie review data for text processing
- Obtaining the IMDb movie review dataset
- Preprocessing the movie dataset into more convenient format
- Introducing the bag-of-words model
- Transforming words into feature vectors
- Assessing word relevancy via term frequency-inverse document frequency
- Cleaning text data
- Processing documents into tokens
- Training a logistic regression model for document classification
- Working with bigger data – online algorithms and out-of-core learning
- Topic modeling
- Decomposing text documents with Latent Dirichlet Allocation
- Latent Dirichlet Allocation with scikit-learn
- Summary
Please refer to the README.md file in ../ch01
for more information about running the code examples.