- Overview
- Features
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- Deployment
- Contributing
- License
- Contact
This Python application extracts PDF file information from Azure Blob Storage and upserts the data into an Azure SQL Database. It is containerized using Docker and deployed to an Azure Kubernetes Service (AKS) cluster, leveraging Azure Workload Identity for secure authentication without the need for connection strings or SQL credentials.
- Azure Managed Identity Authentication
- Dockerized Application
- Azure Kubernetes Service (AKS) Deployment
- Configurable Extraction Patterns and Target Tables
- Bulk Upsert Operations with Temporary Tables: Leverages temporary SQL tables for efficient data processing that automatically cleans up after the operation.
- Azure Blob Storage: Source for PDF files.
- Azure SQL Database: Target for upserted data.
- Python Script: Handles file extraction and database upserts.
- Docker Container: Containerizes the application.
- AKS Deployment: Deploys the containerized app on AKS.
- Python 3.11
- Docker
- Azure CLI
- Kubectl
- Azure Account
git clone https://github.com/arqs-io/azure-blob-mssql-sync.git
cd azure-blob-mssql-sync
Create a .env
file in the project root directory with the following variables:
AZURE_STORAGE_ACCOUNT_NAME=your_storage_account_name
AZURE_BLOB_CONTAINER_NAME=your_blob_container_name
MSSQL_SERVER=your_sql_server.database.windows.net
MSSQL_DATABASE=your_database_name
DB_DRIVER=ODBC Driver 18 for SQL Server
TARGET_SQL_TABLE=etl.my_table
EXTRACTION_PATTERN=([A-Za-z0-9]{12})\.pdf$
Variable | Description |
---|---|
AZURE_STORAGE_ACCOUNT_NAME |
Azure Storage account name |
AZURE_BLOB_CONTAINER_NAME |
Name of the Azure Blob container |
MSSQL_SERVER |
SQL server URL |
MSSQL_DATABASE |
Name of the target database |
DB_DRIVER |
ODBC driver for SQL Server |
TARGET_SQL_TABLE |
SQL table for upserting data |
EXTRACTION_PATTERN |
Regex pattern for extracting file information |
Instead of setting environment variables manually, you can use a .env
file as shown in the configuration step above.
Run the script locally with:
python src/your_script.py
docker build -t your-dockerhub-username/your-image-name:tag .
docker run --env-file .env your-dockerhub-username/your-image-name:tag
- Fork the Repository
- Create a New Branch
git checkout -b feature/YourFeatureName
- Make Your Changes
- Commit Your Changes
git commit -m "Add feature XYZ"
- Push to Your Fork
git push origin feature/YourFeatureName
- Create a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or support, please contact clem@arqs.io.