data-processing-pipeline-wykop

This project is a machine learning pipeline which consists of components to scrap, clean, preprocess and model data from polish social media site in order to predict number of likes for given post. Statistics about this process can be seen in grafana dashboard.

Architecture

Deployment

Project was deployed in 2 diffrent versions, placed in seperate directory.

docker-only-version

Whole project was deployed only with use of docker. To present results of ML modeling (pyspark) jupiter-notebook was used.

helm-grpc-version

Whole project was deployed on Kubernetes using Helm. Helm charts from Artifact Hub (mongo, grafana, graphite, rabbit) were used as well as own charts (grpc, celery-app, celery-app-beater - created from dockerfile '../docker-only-version'). In addition GRPC server was deployed in order to provide a way of communication with trained model.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docker-only-version		docker-only-version
helm-grpc-version		helm-grpc-version
.gitignore		.gitignore
README.md		README.md
architecture.png		architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-processing-pipeline-wykop

Architecture

Deployment

About

Packages

Languages

JanKulbinski/data-processing-pipeline-wykop

Folders and files

Latest commit

History

Repository files navigation

data-processing-pipeline-wykop

Architecture

Deployment

About

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages