This project is a part of a School project.
The objective of this project is to build a model capable of predicting the sentiment of a tweet from a labeled dataset using Machine learning and natural language processing techniques.
- Natural Nalguage processing
- Machine Learning
- Data Visualization
- Predictive Modeling
- Data Analysis
- Data cleaning
- etc.
- Python
- NLTK
- RegEX
- Pandas
- Jupyter Notebook
- Scikit Learn
- WordCloud
- etc.
The project uses the data: tweets and their labels and supervised machine learning techniques in order to predict a binary Sentiment state (positive or negative). The learning phase is implemented after the preprocessing: cleaning + tokeninzing + vectorizing
- Clone this repo.
- Raw Data is being kept here within this link.
- Extract the csv file training.1600000.processed.noemoticon.csv in the Data folder.
- Imports are being kept here
- The notebook contains the whole project useing ressources from the Data folder and .py files.
- https://github.com/tthustla for the clear steps he used through the project which helped me succeed my first NLP project
- Data credits by Stanford university