Skip to content

arqs-io/azure-blob-mssql-sync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure Blob Storage to SQL Database Upsert

License Python Docker

Table of Contents

Overview

This Python application extracts PDF file information from Azure Blob Storage and upserts the data into an Azure SQL Database. It is containerized using Docker and deployed to an Azure Kubernetes Service (AKS) cluster, leveraging Azure Workload Identity for secure authentication without the need for connection strings or SQL credentials.

Features

  • Azure Managed Identity Authentication
  • Dockerized Application
  • Azure Kubernetes Service (AKS) Deployment
  • Configurable Extraction Patterns and Target Tables
  • Bulk Upsert Operations with Temporary Tables: Leverages temporary SQL tables for efficient data processing that automatically cleans up after the operation.

Architecture

  1. Azure Blob Storage: Source for PDF files.
  2. Azure SQL Database: Target for upserted data.
  3. Python Script: Handles file extraction and database upserts.
  4. Docker Container: Containerizes the application.
  5. AKS Deployment: Deploys the containerized app on AKS.

Prerequisites

  • Python 3.11
  • Docker
  • Azure CLI
  • Kubectl
  • Azure Account

Installation

1. Clone the Repository

git clone https://github.com/arqs-io/azure-blob-mssql-sync.git
cd azure-blob-mssql-sync

2. Configure Environment Variables

Create a .env file in the project root directory with the following variables:

AZURE_STORAGE_ACCOUNT_NAME=your_storage_account_name
AZURE_BLOB_CONTAINER_NAME=your_blob_container_name
MSSQL_SERVER=your_sql_server.database.windows.net
MSSQL_DATABASE=your_database_name
DB_DRIVER=ODBC Driver 18 for SQL Server
TARGET_SQL_TABLE=etl.my_table
EXTRACTION_PATTERN=([A-Za-z0-9]{12})\.pdf$

Configuration

Environment Variables

Variable Description
AZURE_STORAGE_ACCOUNT_NAME Azure Storage account name
AZURE_BLOB_CONTAINER_NAME Name of the Azure Blob container
MSSQL_SERVER SQL server URL
MSSQL_DATABASE Name of the target database
DB_DRIVER ODBC driver for SQL Server
TARGET_SQL_TABLE SQL table for upserting data
EXTRACTION_PATTERN Regex pattern for extracting file information

Using a .env File (Optional)

Instead of setting environment variables manually, you can use a .env file as shown in the configuration step above.

Usage

Run the script locally with:

python src/your_script.py

Deployment

Building the Docker Image

docker build -t your-dockerhub-username/your-image-name:tag .

Running the Docker Container Locally

docker run --env-file .env your-dockerhub-username/your-image-name:tag

Contributing

  1. Fork the Repository
  2. Create a New Branch
git checkout -b feature/YourFeatureName
  1. Make Your Changes
  2. Commit Your Changes
git commit -m "Add feature XYZ"
  1. Push to Your Fork
git push origin feature/YourFeatureName
  1. Create a Pull Request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or support, please contact clem@arqs.io.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published