sfm-twitter-harvester

Harvesters for twitter content as part of Social Feed Manager.

Provides harvesters for Twitter REST API and Streaming API.

Harvesting is performed by Twarc and captured by a modified version of WarcProx.

Development

For information on development and running tests, see the development documentation.

When running tests, provide Twitter credentials either as a test_config.py file or environment variables (TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET, TWITTER_ACCESS_TOKEN and TWITTER_ACCESS_TOKEN_SECRET). An example test_config.py looks like:

    TWITTER_CONSUMER_KEY = "EHdoTksBfgGflP5nUalEfhaeo"
    TWITTER_CONSUMER_SECRET = "ZtUpemtBkf2cEmaqiy52Dd343ihFu9PAiLebuMOmqN0QtXeAlen"
    TWITTER_ACCESS_TOKEN = "411876914-c2yZjbk1np0Z5MWEFYYQKSQNFFGBXd8T4k90YkJl"
    TWITTER_ACCESS_TOKEN_SECRET = "jK9QOmn5VRF5mfgAN6

Running as a service

Running as a service for the REST API.

The twitter harvester will act on harvest start messages received from the twitter_harvester queue for REST harvests. To run as a service:

python twitter_harvester.py service <mq host> <mq username> <mq password>

Running as a service for the Streaming API.

The twitter harvester will act on harvest start and stop messages received from the twitter_harvester queue for streaming harvests. To run as a service:

stream_consumer.py <mq host> <mq username> <mq password> twitter_harvester <comma-separated list of harvest start routing keys, e.g., harvest.start.twitter.twitter_filter> <filepath of twitter_harvester.py>

The twitter harvester uses Supervisord to run streaming harvests. When it receives a harvest start message, it writes the harvest start message to a file and registers a new process with SupervisorD. The process is executing the twitter harvester with the harvest start file. When it receives a harvest stop message, it removes the process from SupervisorD, which terminates the running twitter harvester.

Process harvest start files

The twitter harvester can process harvest start files. The format of a harvest start file is the same as a harvest start message. To run without sending any messages:

python twitter_harvester.py seed <path to file>

Harvest start messages

Following is information necessary to construct a harvest start message for the twitter harvester.

For all harvest types:

Summary:

tweet

Extracted urls: Urls are extracted from entities.url and entities.media.

Search harvest type

Type: twitter_search

Api methods called:

search/tweets

Required parameters:

token (for query)

Optional parameters:

incremental: True (default) or False
media: True or False (default) to extract media urls
web_resources: True or False (default) to extract web resource urls

User timeline harvest type

Type: twitter_user_timeline

Api methods called:

statuses/user_timeline
users/lookup (to lookup screen names and user ids)

Required parameters:

token (for screen name) and/or uid (for user id)

Optional parameters:

incremental: True (default) or False
media: True or False (default) to extract media urls
web_resources: True or False (default) to extract web resource urls

Filter harvest type

Type: twitter_filter

Api methods called:

statuses/filter

Required parameters:

token: a dictionary containing track, follow, locations, and/or language

Optional parameters:

media: True or False (default) to extract media urls
web_resources: True or False (default) to extract web resource urls

Sample harvest type

Type: twitter_sample

Api methods called:

statuses/sample

Optional parameters:

media: True or False (default) to extract media urls
web_resources: True or False (default) to extract web resource urls

Authentication

Required parameters:

consumer_key
consumer_secret
access_token
access_token_secret

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
docker		docker
hooks		hooks
requirements		requirements
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile-rest-exporter		Dockerfile-rest-exporter
Dockerfile-rest-exporter-v2		Dockerfile-rest-exporter-v2
Dockerfile-rest-harvester		Dockerfile-rest-harvester
Dockerfile-stream-exporter		Dockerfile-stream-exporter
Dockerfile-stream-exporter-v2		Dockerfile-stream-exporter-v2
Dockerfile-stream-harvester		Dockerfile-stream-harvester
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py
twitter_harvester.py		twitter_harvester.py
twitter_rest_exporter.py		twitter_rest_exporter.py
twitter_rest_warc_iter.py		twitter_rest_warc_iter.py
twitter_stream_exporter.py		twitter_stream_exporter.py
twitter_stream_warc_iter.py		twitter_stream_warc_iter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sfm-twitter-harvester

Development

Running as a service

Running as a service for the REST API.

Running as a service for the Streaming API.

Process harvest start files

Harvest start messages

Search harvest type

User timeline harvest type

Filter harvest type

Sample harvest type

Authentication

About

Releases 35

Packages

Contributors 8

Languages

License

gwu-libraries/sfm-twitter-harvester

Folders and files

Latest commit

History

Repository files navigation

sfm-twitter-harvester

Development

Running as a service

Running as a service for the REST API.

Running as a service for the Streaming API.

Process harvest start files

Harvest start messages

Search harvest type

User timeline harvest type

Filter harvest type

Sample harvest type

Authentication

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 35

Packages 0

Contributors 8

Languages

Packages