Skip to content

Machine learning pipeline which consists of components to scrap, clean, preprocess and model data from polish social media site in order to predict number of likes for given post

Notifications You must be signed in to change notification settings

JanKulbinski/data-processing-pipeline-wykop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data-processing-pipeline-wykop

This project is a machine learning pipeline which consists of components to scrap, clean, preprocess and model data from polish social media site in order to predict number of likes for given post. Statistics about this process can be seen in grafana dashboard.

Architecture

architecture

Deployment

Project was deployed in 2 diffrent versions, placed in seperate directory.

docker-only-version

Whole project was deployed only with use of docker. To present results of ML modeling (pyspark) jupiter-notebook was used.

helm-grpc-version

Whole project was deployed on Kubernetes using Helm. Helm charts from Artifact Hub (mongo, grafana, graphite, rabbit) were used as well as own charts (grpc, celery-app, celery-app-beater - created from dockerfile '../docker-only-version'). In addition GRPC server was deployed in order to provide a way of communication with trained model.

About

Machine learning pipeline which consists of components to scrap, clean, preprocess and model data from polish social media site in order to predict number of likes for given post

Resources

Stars

Watchers

Forks

Packages

No packages published