Skip to content

niconomist98/DE-Microbatch-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DE-Microbatch-Pipeline with docker

Custom pipeline for microbatch data ingestion into sqlite3 database applying file-by-file, line-by-line ingestion with near real time stats tracking and updating

Please check the folder "Files" , there you'll find the initial testing notebooks image

Modularized code can be found in "pipeline" folder and main.py file image


Logo Logo Logo

DE-Microbatch-Pipeline

Microbatch , near real time processing pipeline, containeraized for easy replication and review in data engineering projects

Getting Started

Installation

Pull the docker image

docker pull niconomist98/pragma-microbatch-pipeline-v2:latest

Run the pipeline with the docker container

docker run niconomist98/pragma-microbatch-pipeline-v2

Usage

  • Run the docker image and the pipeline will start its workflow, the pipeline contains a preprocessing step to handle missing values in raw data before ingestion

    image

    image

  • Once preprocessing is completed, the pipeline starts ingestion, inserting line by line and updating the stats of average price, min, max and row counts without querying the final sqlite3 table, updating these calculations in near real time

    image

  • Once mini batch ingestion is completed, the pipeline run queries against the sqlite3 prices table (final table after ingestion) to verify the stats report of avg min, max, count of price column

    image

  • Once the general datasets pipeline is completed, the process starts once again, runing the pipeline for validation dataset, preprocessing data to handle null values, inserting one by one in database updating the stats in real time and querying the final table to validate the results

    image

    image

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Nicolas Restrepo Carvajal https://www.linkedin.com/in/niconomist98/

(back to top)

Acknowledgments

(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published