twitter-search-spark

PySpark script files for searching json Twitter stream data. Primarily for finding the context of a particular emoji/character.

Searches Twitter archive data (from archive.org) to find the characters which occur before and after a chosen target within a certain window. This is useful for analyzing how emoji are used in context and how they are combined. For more information see the original non-Spark version here.

Uses Spark for parallelized read of large Twitter datasets.

Edit setup-submit.sh to change the Spark job. Arguments to the full_search_spark.py job are:

data_path : Path to the Twitter archive
emoji_match : Name of emoji to match
window : Window size for adjacency
top : Number of top characters to output/display
verbose : Print info messages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
full_search_spark.py		full_search_spark.py
requirements.txt		requirements.txt
setup-submit.sh		setup-submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twitter-search-spark

About

Releases

Packages

Languages

License

jzmnd/twitter_search_spark

Folders and files

Latest commit

History

Repository files navigation

twitter-search-spark

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages