This repository showcases SQLMesh for building ETL pipelines that transform data into the OMOP Common Data Model (CDM), a standard widely used in healthcare research.
To dive deeper into SQLMesh and its features, check out SQLMesh.
- Define using simple SQL — no need for complex Jinja or YAML.
- Self-documenting queries with native SQL comments.
- Easily write SQL in any dialect, and SQLMesh will automatically transpile it to your target SQL dialect instantly.
MODEL (
name omop.stg__person, -- Name of the model
kind VIEW, -- Specify the model type
grain (
person_source_value -- Define the grain (unique identifier)
),
audits (UNIQUE_VALUES(
columns = (person_source_value) -- Column to check for uniqueness
))
);
-- Source table containing patient data
SELECT *
FROM omop.patients;
- Plan and apply workflows to evaluate the potential impact of changes.
- Automatically track column-level lineage and enforce data contracts.
- Manage versions of SQL models and transformations.
demo-etl-sqlmesh-omop/
├── sqlmesh_project/ # SQLMesh project configuration and models
│ ├── models/ # SQL model definitions
│ ├── care_site/ # Care site data model
│ ├── location/ # Location data model
│ ├── person/ # Person data model
│ ├── provider/ # Provider data model
│ ├── seed/ # Models using static CSV datasets
│ ├── seeds/ # Seed data files
│ ├── csv/ # CSV seed data (or other static data sources)
│ ├── sqlmeshenv/ # Virtual environment
│ ├── config.yaml # SQLMesh project configuration
├── .env # Environment variables for secrets
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile # Docker image build instructions
├── pyproject.toml # Python project metadata and dependencies
- Sample Data: For demo purposes, using data generated by the SyntheaTM Patient Generator.
- Python: Version 3.8 or higher.
- SQLMesh: Installed and configured (see SQLMesh Documentation).
- Database: A supported SQL database (e.g., PostgreSQL, MySQL).
- Optional:
- Docker (for running the demo environment).
- Poetry (for Python dependency management).
git clone https://github.com/sidataplus/demo-etl-sqlmesh-omop.git
To install dependencies, you can use Poetry:
poetry install
Alternatively, it is recommended (but not required) to use a Python virtual environment with SQLMesh:
python3 -m venv sqlmeshenv
source sqlmeshenv/bin/activate # On macOS/Linux
# or
sqlmeshenv\Scripts\activate # On Windows
pip install "sqlmesh[web,dbt,github,llm,postgres]"
Create a .env
file to securely store your credentials. Add the following environment variables:
POSTGRES_USER=<POSTGRES_USER>
POSTGRES_PASSWORD=<POSTGRES_PASSWORD>
POSTGRES_DB=<POSTGRES_DB>
POSTGRES_HOST=<POSTGRES_HOST>
POSTGRES_PORT=<POSTGRES_PORT>
To build and start the Docker containers, run:
docker-compose up --build
This command will set up the PostgreSQL database and launch the SQLMesh application within a Docker container.
Inside sqlmesh_project/
, initialize a SQLMesh project to use PostgreSQL as the default database dialect.
sqlmesh init postgres
Add the following configuration:
gateways:
local:
connection:
type: postgres
host: <POSTGRES_HOST>
port: <POSTGRES_PORT>
database: <POSTGRES_DB>
user: <POSTGRES_USER>
password: <POSTGRES_PASSWORD>
default_gateway: local
Test the connection by running the command:
sqlmesh info
After starting up, the SQLMesh web UI is served at http://localhost:8000 by default