Skip to content

Itssanthoshhere/Sentiment-Analysis-with-IMDB-Movie-Reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

🎬 Sentiment Analysis with IMDB Movie Reviews 🎥

Project Overview 📝

Dive into the world of sentiment analysis with this exciting project! We analyze IMDB movie reviews to determine the sentiment behind them using cutting-edge machine learning techniques. From data preprocessing and text cleaning to feature extraction and model training, we explore it all with Naive Bayes and Support Vector Machine (SVM) classifiers.

  • Type: Natural Language Processing (NLP)
  • Language: Python

📚 Table of Contents

🛠️ Libraries Used

Explore the powerful libraries that drive this project:

  • Pandas: For seamless data manipulation and analysis
  • NumPy: For efficient numerical operations
  • Matplotlib: To visualize data in style
  • Scikit-Learn: To implement and evaluate machine learning models
  • NLTK: For mastering natural language processing
  • Regular Expressions (re): To clean and refine text data

📊 Dataset

We’re working with the IMDB Movie Reviews Dataset – a treasure trove of movie reviews! The dataset file, IMDB Dataset.csv, includes:

  • review: The actual movie review text
  • sentiment: The sentiment label (positive or negative)

📝 Steps

Here’s how we bring this project to life:

  1. Import Libraries: Get the essential tools ready for data processing, visualization, and machine learning.
  2. Load and Inspect Data: Peek into the dataset, check for any missing values, and understand the data distribution.
  3. Data Preprocessing: Transform text to lowercase, clean out HTML tags, tokenize reviews, and perform lemmatization.
  4. Data Preparation: Split the data into training and testing sets, encode labels, and convert text into TF-IDF features.
  5. Model Training and Evaluation: Train and test Naive Bayes and Support Vector Machine models, then evaluate their performance with accuracy scores, confusion matrices, and classification reports.

✨ Features

Our project shines with the following features:

  • Data Preprocessing: Clean and tokenize text, strip HTML tags, and normalize text.
  • Feature Extraction: Convert text into numerical features using TF-IDF vectorization.
  • Model Training: Build and train Naive Bayes and SVM classifiers.
  • Evaluation: Assess model performance with accuracy scores, confusion matrices, and detailed classification reports.

Usage 🚀

  1. Preprocess the data: Clean and tokenize the text data.
  2. Train the model: Fit a machine learning model on the training data.
  3. Evaluate the model: Test the model on the test data and calculate metrics like accuracy, precision, recall, etc.
  4. Predict sentiment: Use the trained model to predict the sentiment of new reviews.

Modeling 🧠

The project explores several machine learning models, including:

  • Logistic Regression
  • Support Vector Machines (SVM)
  • Naive Bayes
  • Random Forest

We also experimented with hyperparameter tuning to improve model performance.

Evaluation 📈

The performance of each model is evaluated using metrics such as:

  • Accuracy
  • Precision
  • Recall
  • F1 Score

The confusion matrix is also used to visualize the performance of the models.

📈 Results

See how well our models perform! We evaluate them based on accuracy, confusion matrices, and classification reports to gauge their sentiment classification prowess.

Contributing 🤝

Contributions are welcome! If you have suggestions for improvements, feel free to fork the repository and create a pull request.

🙏 Acknowledgements

A big shoutout to:

  • Dataset: The amazing IMDB movie reviews dataset, courtesy of Kaggle.
  • Libraries: Our project’s backbone includes pandas, numpy, matplotlib, scikit-learn, and nltk.
  • Inspiration: Inspired by fantastic sentiment analysis tutorials and groundbreaking NLP research.

👨‍💻 Author

  • Santhosh VS - Connect with me on LinkedIn

📧 Contact

Got questions or feedback? Drop me a line at santhosh02vs@gmail.com. I’d love to hear from you!


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published