Skip to content

The OMOP implementation on SQLMesh for the demo at the OHDSI Symposium 2024.

Notifications You must be signed in to change notification settings

sidataplus/demo-etl-sqlmesh-omop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sidataplus logo

This repository showcases SQLMesh for building ETL pipelines that transform data into the OMOP Common Data Model (CDM), a standard widely used in healthcare research.

To dive deeper into SQLMesh and its features, check out SQLMesh.

overview pipeline

Core Features

Work with SQL Anywhere

  • Define using simple SQL — no need for complex Jinja or YAML.
  • Self-documenting queries with native SQL comments.
  • Easily write SQL in any dialect, and SQLMesh will automatically transpile it to your target SQL dialect instantly.
MODEL (
  name omop.stg__person,             -- Name of the model
  kind VIEW,                         -- Specify the model type            
  grain (
    person_source_value              -- Define the grain (unique identifier)
  ),
  audits (UNIQUE_VALUES(
    columns = (person_source_value)  -- Column to check for uniqueness
  ))
);

-- Source table containing patient data
SELECT *
FROM omop.patients;                 

Isolate Data Environments

  • Plan and apply workflows to evaluate the potential impact of changes.

OHDSI Symposium 2024 (6)

Easily track changes

  • Automatically track column-level lineage and enforce data contracts.
  • Manage versions of SQL models and transformations.
Screenshot 2567-12-05 at 02 59 02

Getting Started

Project Structure

demo-etl-sqlmesh-omop/
├── sqlmesh_project/               # SQLMesh project configuration and models
│   ├── models/                    # SQL model definitions
│      ├── care_site/              # Care site data model
│      ├── location/               # Location data model
│      ├── person/                 # Person data model
│      ├── provider/               # Provider data model
│      ├── seed/                   # Models using static CSV datasets
│   ├── seeds/                     # Seed data files
│      ├── csv/                    # CSV seed data (or other static data sources)
│   ├── sqlmeshenv/                # Virtual environment
│   ├── config.yaml                # SQLMesh project configuration
├── .env                           # Environment variables for secrets
├── docker-compose.yml             # Docker Compose configuration
├── Dockerfile                     # Docker image build instructions
├── pyproject.toml                 # Python project metadata and dependencies

Prerequisites

  • Sample Data: For demo purposes, using data generated by the SyntheaTM Patient Generator.
  • Python: Version 3.8 or higher.
  • SQLMesh: Installed and configured (see SQLMesh Documentation).
  • Database: A supported SQL database (e.g., PostgreSQL, MySQL).
  • Optional:
    • Docker (for running the demo environment).
    • Poetry (for Python dependency management).

Installation

Step 1: Clone the Repository

git clone https://github.com/sidataplus/demo-etl-sqlmesh-omop.git

Step 2: Install Dependencies

To install dependencies, you can use Poetry:

poetry install

Alternatively, it is recommended (but not required) to use a Python virtual environment with SQLMesh:

python3 -m venv sqlmeshenv
source sqlmeshenv/bin/activate  # On macOS/Linux
# or
sqlmeshenv\Scripts\activate  # On Windows
pip install "sqlmesh[web,dbt,github,llm,postgres]"

Step 3: Set Up Environment Variables

Create a .env file to securely store your credentials. Add the following environment variables:

POSTGRES_USER=<POSTGRES_USER>
POSTGRES_PASSWORD=<POSTGRES_PASSWORD>
POSTGRES_DB=<POSTGRES_DB>
POSTGRES_HOST=<POSTGRES_HOST>
POSTGRES_PORT=<POSTGRES_PORT>

Step 4: Docker Setup (optional, if you're using Docker)

To build and start the Docker containers, run:

docker-compose up --build

This command will set up the PostgreSQL database and launch the SQLMesh application within a Docker container.

Configuring the Connection to PostgreSQL (or Other Databases)

Inside sqlmesh_project/, initialize a SQLMesh project to use PostgreSQL as the default database dialect.

sqlmesh init postgres

Add the following configuration:

gateways:
  local:
    connection:
      type: postgres
      host: <POSTGRES_HOST>
      port: <POSTGRES_PORT>
      database: <POSTGRES_DB>
      user: <POSTGRES_USER>
      password: <POSTGRES_PASSWORD>
default_gateway: local

Test the connection by running the command:

sqlmesh info

After starting up, the SQLMesh web UI is served at http://localhost:8000 by default

Our Work for OHDSI Symposium 2024

From dbt to SQLMesh Enhancing OMOP CDM Data Conversion Efficiency

About

The OMOP implementation on SQLMesh for the demo at the OHDSI Symposium 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published