Skip to content

Live twitter treaming from twitter4j API and Spam Detection Using Apache Spark for Stream Data Processing, ElasticSearch to index the data and Kibana to get live interatice dashboards

Notifications You must be signed in to change notification settings

sanirudh94/spark-twitter-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Twitter Streaming and Spam Detection

This project is about revealing trends and patterns across demographics from live Twitter data stream. Also filter through spam messages to get relevant trending information.

Motivation

  • Subscribe to live twitter stream
  • Transform twitter stream to
    • Extract most relevant fields from huge Tweet Blob
    • Detect possible Spam Messages
  • Index the data to make them searchable
  • Create analytical and interavtive dashboards to reveal trends

Built With

  • Apache Spark
  • Elasticsearch
  • Kibana
  • twitter4j Api
  • sbt-An interactive build tool

Overall Workflow

  • Connect to the Twitter streaming API using OAuth to continuosly ingest new tweets
  • Created a SparkStreamingContext[Dstreams] with batch interval of 10 seconds
  • Extracted important fields and produced JSON with processed fields for each incoming tweet
  • Passed each tweet through a spam detector and flag spam messages
  • Converted JSON into Map and send it to an Elasticsearch Cluster with relevant mappings
  • Analytical results from data in Elasticsearch are visualized with Kibana dashboards

alt text

Results

About

Live twitter treaming from twitter4j API and Spam Detection Using Apache Spark for Stream Data Processing, ElasticSearch to index the data and Kibana to get live interatice dashboards

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published