Skip to content

bgold09/tweet_learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tweet_learn

The purpose of this project is - using machine learning methods - to predict the following about tweets (posts on twitter):

  1. Are tweets informative or non-informative?
  2. Will a tweet be re-tweeted by another user?

Installation

1. Clone the repository

git clone https://github.com/bgold09/tweet_learn.git
cd tweet_learn

2. Install required Python packages

Use your preferred method (pip, apt-get, etc.) to install the following Python packages required by tweet learn:

3. Install Stanford Named Entity Recognizer (NER)

Download and unpack the Stanford Named Entity Recognizer:

wget http://nlp.stanford.edu/software/stanford-ner-2014-01-04.zip
unzip stanford-ner-2014-01-04.zip

Start a local NER java server (do this in a separate terminal window, as starting the process in the background will cause the server to function improperly):

java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/ner-eng-ie.crf-3-all2008-distsim.ser.gz -port 8080 -outputFormat inlineXML 

4. Create a MySQL database and required tables

mysql -u <username> -p -e 'CREATE DATABASE twitter;'
mysql -u <username> -p twitter < data/users_backup.sql

From a python session:

>>> import tweet_learn as tl
>>> tl.store_initial_data("train_test_set")
>>> tl.add_centrality_feature("train_test_set")

5. Extract the data and targets

From a python session:

>>> ml = tl.extract_transform_data("train_test_set", 0, 1001)

6. Run tests

Check out confusion.py, score.py and roc.py for various methods for testing the quality of your models.

License

Copyright (c) 2014 Scott Bickel, Brian Golden, Stephen Styer

Licensed under the MIT license.

About

Machine learning module for classifying Twitter posts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages