- This is a RAILS 2.3.7 based application that helps you to collect Twitter data.
- Checkout with :
git clone git@github.com:plotti/twitterlyzer.git
- Install rvm if you are not using it (http://beginrescueend.com/rvm/install/)
$ bash -s stable < <(curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer) $ source ~/.bash_profile $ rvm requirements
- The rvm spec file is already in the repo
- Install ruby 1.8.7
$ rvm install 1.8.7
- Create your gemset
rvm gemset create 'socializer'
*If you have permission problems try to create the gem dir:
sudo mkdir ~/.gem/specs sudo chmod 777 ~/.gem/specs
- To get it running you will need to create a:
- twitter.yml that contains your twitter credentials
- bitly.yml that contains your bitly credentials
- database.yml containting the database credentials
- see twitter.example.yml or bitly.example.yml for details
- Make sure when your Twitter account is NOT whitelisted that you dont use up your API limitations when using too many workers
- Create the directory “/data” under your rails root to store the lists
- The app is using 2.3.7 rails so all gems are chosen to match that framework
gem install rails -v =2.3.7 gem update --system 1.5.3
- To install them first install:
gem install rails_gem_install RAILS_ENV=development rails_gem_install
- Test if the application works correctly
- You will need rspec/rspec-rails and factory girl to test it.
- You will need to start solr in test mode
RAILS_ENV=test rake sunspot:solr:start spec spec
- Create the necessary files with: script/generate delayed_job
- To start collecting persons or feeds you need to start a couple of delayed job workers. To do so use the script
"./script/delayed_job -n 4"
- The Benchmarks I measured are depending on the number of workers (n):
- Collecting Tweets: n 4: 40.000 tweets in 10min
- n 8: 90.000 tweets in 10min
- n 16: 180.000 tweets in 10 min (70% CPU usage)
- The Benchmarks I measured are depending on the number of workers (n):
- All of the tweets are indexed by a lucene solr server in the background
- It uses sunspot and solr gems.
- Before starting the server make sure to start solr.
rake sunspot:solr:start ./script/server
- In order to exchange your results it contains a rake task that dumps the existing DB into /dump
- It uses the dump plugin for Rails 2.3 https://github.com/toy/dump
- There is a small example db in dump containing 57 persons in one project and ~ 100K Tweets inc. Retweets
- You can use it to experiment on the data
rake dump rake dump:restore # to restore a db
It does the following:
It uses Delayed Jobs to get the collection done.
The Twitter API is wrapped using grackle and twitter gems
Projects
Persons are organized in projects that contain a set of people
Persons
collect one person
collect multiple persons based on a csv import
collect the egonetwork of a given person
show all people
show statistics of the people collected (friends, follower distributions, origin etc..)
Connections between persons
Connections between persons are stored not in the DB but on the HD in a PStore
Tweets
collect the tweets of a person
collect the tweets of all persons
collects tweets based on a csv list
collect all retweets of all collected tweets
export all tweets into a csv
show statistics on the tweets (links used, keywords, timeline)
Networks
export the friendship network of the collected persons in a project the formats:
UCINET
Gephi
export the retweet networks of persons
export the @ networks between persons
export the person stats
export the twitter links of persons
Tasks
It has some onboard scrapers under tasks that scrape the following websites
Murack.com
Google
Twellow
Wefollow
It can compute some sentiment for german tweets