This project focuses on performing sentiment analysis on 150 blog posts scraped from various websites. By leveraging Natural Language Processing (NLP) techniques and machine learning models, the project is able to predict the sentiments of these blog posts.
Sentiment analysis, also known as opinion mining, involves determining the sentiment expressed in a piece of text. This project aims to categorize the sentiments of blog posts as positive, negative, or neutral. The analysis is carried out using various NLP techniques and machine learning models.
The dataset consists of 150 blog posts scraped from different websites. The blogs cover a wide range of topics to ensure diversity in the sentiment analysis. Web scraping tools such as BeautifulSoup and Scrapy were used to collect the blog posts.
Preprocessing steps include:
- Cleaning the text (removing HTML tags, punctuation, numbers, and special characters)
- Tokenization
- Stop words removal
- Lemmatization
These steps ensure that the text data is in a suitable format for modeling.
Several machine learning models were applied to predict the sentiments:
- Logistic Regression
- Support Vector Machines (SVM)
- Random Forest
- Naive Bayes
Additionally, advanced NLP techniques like TF-IDF and word embeddings were utilized to improve model performance.
The models were evaluated based on metrics such as accuracy, precision, recall, and F1-score. Cross-validation was performed to ensure the robustness of the models.
The best performing model achieved an accuracy of XX% (update with actual result) on the test set. Detailed results, including confusion matrices and performance metrics for each model, can be found in the results
directory.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/sentiment-analysis-blog-posts.git
-
Navigate to the project directory:
cd sentiment-analysis-blog-posts
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the preprocessing script:
python preprocess.py
-
Train the models:
python train.py
-
Evaluate the models:
python evaluate.py
This project was developed by Himanshu Mahajan.
This project is licensed under the MIT License - see the LICENSE file for details.