Enzymes

Simple utilities for analyzing tweet data.

Mostly lazy. Expect a generator unless it's a reduction of some sort.

Entities

Entities can be any of:

hashtags
user_mentions
urls
media

Or any combination of the above.

`prep_tweets`

prep_tweets prepares tweets for entitiy analysis. It does the following:

Throws away all information that isn't the entities.
Removes duplicates.
Considers the retweeted_status if a tweet is a retweet. It also removes deuplicates (thus, if tweet A is retweeted 20 times in the data, it is only included once here).
Handles truncation, gets entities from extended_tweets.

from enzymes.entities import prep_tweets, entity_counts, entity_cooccurrence

# entities.prep_tweets prepares tweets for analyzing entities
dat = prep_tweets(dat)

# calling "list" brings everything into memory, necessary if
# using more than once
dat = list(dat)

`entity_counts`

Retrieve counts of each unique entity value, can also be used to get unique entity values:

entity_counts(['urls', 'hashtags'], dat)
# { maga: 2789, who: 93 }

`entity_cooccurrence`

Create a cooccurrence matrix (scipy.sparse.dok_matrix) of the included entities.

# requires a vocab dictionary, which can be
# created from the `entity_counts`.

counts = entity_counts(['urls', 'hashtags'], dat)
terms = {k for k, v in counts.items() if v > 5}
vocab = {k:i for i,k in enumerate(terms)}

entity_cooccurrence(['urls', 'hashtags'], dat, vocab)
#  <2318x2318 sparse matrix of type '<class 'numpy.float64'>'
#	with 124336 stored elements in Dictionary Of Keys format>

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
enzymes		enzymes
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enzymes

Entities

`prep_tweets`

`entity_counts`

`entity_cooccurrence`

About

Releases

Packages

Languages

agriuseatstweets/enzymes

Folders and files

Latest commit

History

Repository files navigation

Enzymes

Entities

prep_tweets

entity_counts

entity_cooccurrence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`prep_tweets`

`entity_counts`

`entity_cooccurrence`

Packages