Skip to content

The Twitterlyzer is a set of Tools and Batch Jobs to make data collection in Twitter easier for researchers.

Notifications You must be signed in to change notification settings

plotti/twitterlyzer

Repository files navigation

Short Description

  • This is a RAILS 2.3.7 based application that helps you to collect Twitter data.

How to Install

  • Checkout with :
    
      git clone git@github.com:plotti/twitterlyzer.git
    

Install RVM

  • Install rvm if you are not using it (http://beginrescueend.com/rvm/install/)
    
     $ bash -s stable < <(curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer)
     $ source ~/.bash_profile
     $ rvm requirements
    
  • The rvm spec file is already in the repo
  • Install ruby 1.8.7
    
      $ rvm install 1.8.7
    
  • Create your gemset
    
     rvm gemset create 'socializer' 
    

    *If you have permission problems try to create the gem dir:
    
    sudo mkdir ~/.gem/specs
    sudo chmod 777 ~/.gem/specs
    

Setup Files

  • To get it running you will need to create a:
    • twitter.yml that contains your twitter credentials
    • bitly.yml that contains your bitly credentials
    • database.yml containting the database credentials
    • see twitter.example.yml or bitly.example.yml for details
    • Make sure when your Twitter account is NOT whitelisted that you dont use up your API limitations when using too many workers
    • Create the directory “/data” under your rails root to store the lists

Install Gems Dependencies

  • The app is using 2.3.7 rails so all gems are chosen to match that framework
    
    gem install rails -v =2.3.7
    gem update --system 1.5.3
    
  • To install them first install:
    
    gem install rails_gem_install
    RAILS_ENV=development rails_gem_install
    

Test

  • Test if the application works correctly
  • You will need rspec/rspec-rails and factory girl to test it.
  • You will need to start solr in test mode
    
    RAILS_ENV=test rake sunspot:solr:start
    spec spec
    

Get Delayed Jobs working

  • Create the necessary files with: script/generate delayed_job
  • To start collecting persons or feeds you need to start a couple of delayed job workers. To do so use the script
    
    "./script/delayed_job -n 4"
    
    • The Benchmarks I measured are depending on the number of workers (n):
      • Collecting Tweets: n 4: 40.000 tweets in 10min
      • n 8: 90.000 tweets in 10min
      • n 16: 180.000 tweets in 10 min (70% CPU usage)

Start Solr and Webserver

  • All of the tweets are indexed by a lucene solr server in the background
  • It uses sunspot and solr gems.
  • Before starting the server make sure to start solr.
    
    rake sunspot:solr:start 
    ./script/server
    

Dumping the DB and restoring it

  • In order to exchange your results it contains a rake task that dumps the existing DB into /dump
  • It uses the dump plugin for Rails 2.3 https://github.com/toy/dump
  • There is a small example db in dump containing 57 persons in one project and ~ 100K Tweets inc. Retweets
  • You can use it to experiment on the data
    
    rake dump
    rake dump:restore # to restore a db
    

FEATURES

It does the following:

It uses Delayed Jobs to get the collection done.
The Twitter API is wrapped using grackle and twitter gems

Projects
Persons are organized in projects that contain a set of people

Persons
collect one person
collect multiple persons based on a csv import
collect the egonetwork of a given person
show all people
show statistics of the people collected (friends, follower distributions, origin etc..)

Connections between persons
Connections between persons are stored not in the DB but on the HD in a PStore

Tweets
collect the tweets of a person
collect the tweets of all persons
collects tweets based on a csv list
collect all retweets of all collected tweets
export all tweets into a csv
show statistics on the tweets (links used, keywords, timeline)

Networks
export the friendship network of the collected persons in a project the formats:
UCINET
Gephi
export the retweet networks of persons
export the @ networks between persons
export the person stats
export the twitter links of persons

Tasks
It has some onboard scrapers under tasks that scrape the following websites
Murack.com
Google
Twellow
Wefollow

It can compute some sentiment for german tweets

About

The Twitterlyzer is a set of Tools and Batch Jobs to make data collection in Twitter easier for researchers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published