Skip to content

Wikipedia updates streaming, transformation and visualisation

Notifications You must be signed in to change notification settings

renardeinside/wikiflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

3b466ff · Aug 13, 2019

History

11 Commits
Aug 13, 2019
Aug 12, 2019
Aug 13, 2019
Aug 13, 2019
Aug 13, 2019
Aug 12, 2019
Aug 13, 2019
Aug 13, 2019
Aug 13, 2019

Repository files navigation

Spark flow on top of the Wikipedia SSE Stream

How-to run

  • Create the docker network:
make create-network
  • Run the streaming appliance
make run-appliance
  • To run streaming consumption of data via legacy API (DStreams), please run:
make run-legacy-consumer
  • To run streaming consumption of data via structured API, please run:
make run-structured-consumer
  • To run streaming consumption of data via structured API with write to delta, please run:
make run-analytics-consumer

You could also access the SparkUI for this Job at http://localhost:4040/jobs

Known issues

  • Sometimes you need to increase docker memory limit for your machine (for Mac it's 2.0GB by default).
  • To debug memory usage and status of the containers, please use this command:
docker stats
  • Sometimes docker couldn't gracefully stop the consuming applications, please use this command in case if container hangs:
docker-compose -f <name of compose file with the job>.yaml down

About

Wikipedia updates streaming, transformation and visualisation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published