Overview

A data ingestion pipeline incorporating Extract Load Transform connecting data producers and consumers.

Prerequisites

Python >= 3.6
Airflow >= 1.10.0
Docker >= 19.03.0
Terraform >= 1.2.0
AWS account credentials configured

Project Structure

The project is structured as follows:

airflow/: Directory containing the Airflow plugins and DAGs.
dataset/: Directory containing the scraped data.
jumiascraper/: Directory containing crawler entrypoint and docker files.
- jumiascraper/: Contains the scrapy project files and configurations.
  - spiders/: Directory containing the spiders.
terraform_s3: Directory containing the terraform configuration files for creating an s3 bucket.
README.md: Documentation file providing an overview of the project and instructions for setup and usage.

Getting Started

To set up the project, clone the repository:

git clone https://github.com/vitkx/jumia_scrapy.git

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request. Feel free to update the content based on your project's specifics, including adding information about the website you are scraping, additional features, or any other relevant details.

License

This project is licensed under the Apache License 2.0.

Please note that this configuration assumes you have AWS account credentials properly configured and have the necessary permissions to create and manage AWS resources.

For more information on Terraform and AWS, refer to their official documentation:

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github		.github
jumiascraper		jumiascraper
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
structure.ini		structure.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Prerequisites

Project Structure

Getting Started

Contributing

License

About

Releases

Packages

Languages

vitkx/jumia_scrapy

Folders and files

Latest commit

History

Repository files navigation

Overview

Prerequisites

Project Structure

Getting Started

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages