Skip to content

Twitter Emoji search to find context of keyword/emoji using Spark for data pipeline

License

Notifications You must be signed in to change notification settings

jzmnd/twitter_search_spark

Repository files navigation

twitter-search-spark

PySpark script files for searching json Twitter stream data. Primarily for finding the context of a particular emoji/character.

Searches Twitter archive data (from archive.org) to find the characters which occur before and after a chosen target within a certain window. This is useful for analyzing how emoji are used in context and how they are combined. For more information see the original non-Spark version here.

Uses Spark for parallelized read of large Twitter datasets.

Edit setup-submit.sh to change the Spark job. Arguments to the full_search_spark.py job are:

  • data_path : Path to the Twitter archive
  • emoji_match : Name of emoji to match
  • window : Window size for adjacency
  • top : Number of top characters to output/display
  • verbose : Print info messages

About

Twitter Emoji search to find context of keyword/emoji using Spark for data pipeline

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published