GitHub

About The Project

Clustering tweets by utlizing cosine Distance metric and K-means clustering algorithm.

Introduction

Data redundancy is an important problem of Twitter. Twitter users are likely to generate similar tweets (e.g., using the Retweet function) about some popular topics/events. a result of a huge number of tweets which let tweetos not interested to loss time about reading for the same topic many tweets
So by clustering similar tweets together, we can generate a more concise and organized representation of the raw tweets, which will be very useful for busy Tweetos to read only one tweet per class

Project Objectives

Aim of this project is to cluster and label the text tweets.
So when a new tweet is added to the corpus, it must be labeled easily without performing the full clustering again

KeyWords

Text mining / clustering / NLP / tweepy / NLTK / twitter API

Project scope

Data gathering (streaming tweets )

Data Processing and Wrangling (cleaning text tweets and apply NLP to text)

Vectorization (numerical data representation part)

cosine distance from nltk

apply k-means

label clusters

Installation

Get a twitter API Key Try this link
https://www.youtube.com/watch?v=vlvtqp44xoQ
Install tweepy

  !pip install tweepy

Install NLTK

 !pip install nltk

conda install -c anaconda nltk

Install stopwords from nltk graphic ( download nltk )

Evaluating Results

K-Means algorithm has been executed by

data representation method :TF-IDF
Distance metrics : Cosine Similarity
k =6 values (2 to 6 clusters)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
.gitattributes		.gitattributes
README.md		README.md
bdtweets2.csv		bdtweets2.csv
live_tweets1.json		live_tweets1.json
project.ipynb		project.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Introduction

Project Objectives

KeyWords

Project scope

Installation

Evaluating Results

TRY MY PROJECT CODE ON BINDER

About

Releases

Packages

Languages

115522/Tweeting

Folders and files

Latest commit

History

Repository files navigation

About The Project

Introduction

Project Objectives

KeyWords

Project scope

Installation

Evaluating Results

TRY MY PROJECT CODE ON BINDER

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages