This project focuses on sentiment analysis of social media posts related to social impact topics. It aims to classify the sentiment of the posts as positive, negative, or neutral, providing valuable insights into public opinions on social impact issues.
The key steps include:
-
Data Preprocessing: Text data is cleaned by removing URLs, stopwords, and punctuation. Tokenization and lemmatization techniques are applied.
-
Dataset Splitting: The preprocessed data is split into training and test sets for model evaluation.
-
Model Building: A machine learning pipeline is created using a TfidfVectorizer for text vectorization and a RandomForestClassifier for sentiment classification.
-
Hyperparameter Tuning: RandomizedSearchCV is utilized to find the best hyperparameters for the model.
-
Handling Class Imbalance: The SMOTE technique is employed to address class imbalance in the training data.
-
Model Training and Evaluation: The model is trained on the training set and evaluated on the test set using classification metrics.
-
By analyzing the sentiment of social media posts, this project provides insights into public opinions on social impact topics, helping to understand the sentiment landscape around these issues