Data Pipeline in K8S

My Insight DevOps Engineering project for the NY 2019A session. The pipeline I used for this project is from agdsouza/OnTheSamePage that supports monitoring of website pages in real time.

A video demo of the pipeline operation in Kubernetes can be found here.

Motivation

While running containerized stateless applications in Kubernetes has been proved very effective, deploying stateful components like Kafka, Cassandra etc. is still in its early stages and Kubernetes ecosystem is evolving very fast to support it. This project should serve as a good starting point for anyone who wants to get some basic understanding on how things work within Kubernetes and the interaction between different components deployed.

Pipeline in Kubernetes

The following is the configuration for the Kubernetes cluster provisioned on AWS:

EKS Control Plane
Worker nodes with three m4.xlarge EC2 instances

The following are the objects deployed in Kubernetes.

Confluent Kafka
Cassandra
Apache Spark Streaming
Apache Spark Batch + Airflow
Python application as Kafka producer

The weblog messages ingested from a Kafka producer get picked up from a Kafka topic by Spark Streaming to populate raw_data table in Cassandra database. Spark batch job which gets scheduled to run every minute via Airflow computes the averages from raw_data and populates page_averages table in Cassandra.

Getting Started

See the wiki page for instructions on getting started.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
airflow		airflow
app		app
cassandra		cassandra
deployment		deployment
img		img
kafka		kafka
scripts		scripts
spark		spark
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipeline in K8S

Motivation

Pipeline in Kubernetes

Getting Started

About

Releases

Packages

Languages

sontivr/data-pipeline-in-k8s

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline in K8S

Motivation

Pipeline in Kubernetes

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages