Skip to content

Latest commit

 

History

History
21 lines (10 loc) · 1.42 KB

File metadata and controls

21 lines (10 loc) · 1.42 KB

A basic data pipeline that powers a streamlit Dashboard showing real-time counts for the top active users on GitHub.

image

Here’s very brief breakdown of what each service does:

  • Streamlit Dash Service: Displays a Streamlit Dashboard which polls the API and renders the data a chart and table (when running, its available under: http://localhost:8031)

  • Data API: Uses Flask to serves a minimal REST API that can query a database and return the results as JSON.

  • Postgres Database: Stores the continuously updating event count data. The Data API queries this database.

  • Posrgres Writer Service: Reads from a topic and continuously updates the database with new data.

  • Aggregation Service: Refines the raw event logs and continuously aggregates them into event counts broken down by GitHub display name.

  • Streaming Data Producer: Reads from a real time public feed of activity on GitHub and streams the data to a topic in Redpanda (our local message broker)

  • Red Panda Server: Manages the flow of streaming data via topics (buffers for streaming data).