The Credit Scoring Data Platform is a comprehensive solution for managing and analyzing credit-related data. It includes components for data extraction, transformation, loading, modeling, and visualization, enabling organizations to gain insights into customer behavior, loan performance, and more.
# README file for the project
- MySQL Warehouse Database: Stores transformed and modeled data for analysis.
- MySQL Production Database: Contains production data from which information is extracted.
- generate_data.py: Python script to generate sample data for testing and development.
- etl_script.py: Python script to extract data from the production database, transform it, and load it into the warehouse database.
- A dbt project to model and transform data in the warehouse database according to the Kimball data model.
- Docker Compose file to set up and orchestrate the entire data platform environment.
- Clone Repository: Clone this repository to your local machine.
- Environment Variables: Set up necessary environment variables for database connections and configurations. Update the
.env
file with your credentials. - Docker: Ensure Docker is installed on your system.
- Build Docker Images: Run
docker-compose build
to build the Docker images. - Start Containers: Run
docker-compose up -d
to start the containers in detached mode. - Execute ETL: The Python ETL scripts and dbt project will automatically run when the containers start. Check logs for any errors.
- Access Data: Once the containers are running, you can access the MySQL warehouse database to analyze the transformed data.
- Customization: Modify the ETL scripts, dbt models, and Docker configurations as needed to fit your specific requirements.
Contributions to this project are welcome! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License.