GitHub - vickruto/we-rate-dogs-data-wrangling: A data wrangling project that uses dog ratings data from Twitter account @dog_rates

Python Data Wrangling Project

@dog_rates Tweets Data Wrangling and Analysis

The aim of the project is to gather, assess and clean dog ratings data to get it ready for downstream analysis, visualization and/or modeling. A brief preliminary analysis and visualization is included.

The Data Used ::

twitter_archive_enhanced.csv : A static text file containing an archive of tweets from Twitter account @dog_rates from 2015-11-15 to 2017-08-01. Available here
tweet_json.txt : Text file constructed by querying the Twitter API using Tweepy. Used to enrich the archive data with more information including the number of retweets and the number of likes
image_predictions.tsv : File containing the dog breed predictions from a Machine Learning Model. Available for download from here

twitter_archive_master.csv : The final master dataset stored after transforming the three data components above through all the wrangling steps

PROJECT STRUCTURE

.
├── wrangle_act.ipynb : Main Project notebook containing code for gathering, assessing, cleaning, analyzing, and visualizing data
├── wrangle_report.html : A documentation of the data wrangling steps taken: gathering, assessing, and cleaning
├── wrangle_report.ipnyb : The notebook used to generate the wrangle_report
├── act_report.html : A documentation of the insights and the visualizations produced from the wrangled data.
├── act_report.ipnyb : The notebook used to generate the act_report
├── no_status_tweet_ids.txt : Text File used to store the tweet ids that do not return a Tweet object as expected when queried using Tweepy
├── twitter-archive-enhanced.csv
├── image_predictions.tsv
├── tweets_json.txt
├── twitter_archive_master.csv
└── README.md

Insights

The following insights were gathered from analysis of the cleaned master dataset:

Tweets tweeted on Wednesday are likely to get the least number of favorites and retweets while tweets tweeted on Tuesday are likely to get the highest number of favorites
Tweets with Gofundme(a crowdfunding platform) links do not get more retweets or favorites for exposure as would be expected since WeRateDogs® is also a non-profit organization concerned with rescuing dogs and seeking treatment for sick dogs. Instead, they actually get less retweets and favorites.
Tweets with videos are likely to get 5 times as many likes and 7 times as many retweets as tweets with only pictures
Reply tweets are likely to get half as many retweets and favorites as compared to original tweets directly tweeted by @dog_rates. However, the dogs on reply tweets on average get higher ratings at slightly above $\frac{12}{10}$ while the dogs on tweets directly tweeted by @dog_rates on average get a rating slightly below $\frac{11}{10}$
A tweet with a high number of favorite counts is likely to have a high number of retweets as well. On the other hand, the rating given to a dog does not directly predict the number of retweets or likes a tweet is likely to get.

Useful Links

Tweet Object Data Dictionary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Data Wrangling Project

@dog_rates Tweets Data Wrangling and Analysis

The Data Used ::

PROJECT STRUCTURE

Insights

Useful Links

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
act_report.html		act_report.html
act_report.ipynb		act_report.ipynb
image_predictions.tsv		image_predictions.tsv
no_status_tweet_ids.txt		no_status_tweet_ids.txt
tweets_json.txt		tweets_json.txt
twitter-archive-enhanced.csv		twitter-archive-enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.html		wrangle_report.html
wrangle_report.ipynb		wrangle_report.ipynb

vickruto/we-rate-dogs-data-wrangling

Folders and files

Latest commit

History

Repository files navigation

Python Data Wrangling Project

@dog_rates Tweets Data Wrangling and Analysis

The Data Used ::

PROJECT STRUCTURE

Insights

Useful Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages