GitHub - niconomist98/DE-Microbatch-Pipeline

DE-Microbatch-Pipeline with docker

Custom pipeline for microbatch data ingestion into sqlite3 database applying file-by-file, line-by-line ingestion with near real time stats tracking and updating

Please check the folder "Files" , there you'll find the initial testing notebooks

Modularized code can be found in "pipeline" folder and main.py file

DE-Microbatch-Pipeline

Microbatch , near real time processing pipeline, containeraized for easy replication and review in data engineering projects

Getting Started

Installation

Pull the docker image

docker pull niconomist98/pragma-microbatch-pipeline-v2:latest

Run the pipeline with the docker container

docker run niconomist98/pragma-microbatch-pipeline-v2

Usage

Run the docker image and the pipeline will start its workflow, the pipeline contains a preprocessing step to handle missing values in raw data before ingestion
Once preprocessing is completed, the pipeline starts ingestion, inserting line by line and updating the stats of average price, min, max and row counts without querying the final sqlite3 table, updating these calculations in near real time
Once mini batch ingestion is completed, the pipeline run queries against the sqlite3 prices table (final table after ingestion) to verify the stats report of avg min, max, count of price column
Once the general datasets pipeline is completed, the process starts once again, runing the pipeline for validation dataset, preprocessing data to handle null values, inserting one by one in database updating the stats in real time and querying the final table to validate the results

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Nicolas Restrepo Carvajal https://www.linkedin.com/in/niconomist98/

(back to top)

Acknowledgments

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Data		Data
Files		Files
database		database
pipeline		pipeline
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DE-Microbatch-Pipeline with docker

DE-Microbatch-Pipeline

Getting Started

Installation

Usage

License

Contact

Acknowledgments

About

Releases

Packages

Languages

niconomist98/DE-Microbatch-Pipeline

Folders and files

Latest commit

History

Repository files navigation

DE-Microbatch-Pipeline with docker

DE-Microbatch-Pipeline

Getting Started

Installation

Usage

License

Contact

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages