This work-in-progress dbt package is designed to facilitate the linkage of OMOP CDM databases from different sources—specifically EHR (Electronic Health Records) and Claims databases. Currently, the package supports PostgreSQL with 2 incoming data sources. The plan is to extend support to more data sources. The package provides a series of SQL models to:
- Match persons across databases based on
person_source_value
- Pool other tables (e.g.,
visit_occurrence
,condition_occurrence
,observation_period
) from OMOP CDM with updatedperson_id
s - Validate and identify discrepancies in key attributes like gender and year of birth
- Identify temporal discrepancies in
condition_occurrence
- Create New PostgresDB for Linked Data: Initialize a new PostgreSQL database where the linked data will reside.
- Set Up FDW (Foreign Data Wrapper): Configure FDW for the two incoming data sources (EHR and Claims) to enable cross-database queries.
You may leverage this DDL template to create the FDW for each incoming data source:
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE SERVER omop_ehr FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '{hostname}', port '{port}', dbname '{dbname}');
CREATE USER MAPPING FOR {host_user} SERVER omop_ehr OPTIONS (user '{foreign_user}', password '{password}');
CREATE SCHEMA IF NOT EXISTS cdm_ehr;
IMPORT FOREIGN SCHEMA {foreign_schema} FROM SERVER omop_ehr INTO cdm_ehr;
- Run OMOP DDL for the New Linked Schema: Execute the DDL scripts to create the necessary OMOP tables in the new linked database.
- Install dbt: If you haven't already, install dbt. Make sure to install the Postgres-compatible version (
dbt-postgres
). - Clone this repository: Clone this dbt package repository into your local machine.
- Configure your
profiles.yml
: Update your dbt profiles to point to your Postgres instances where the OMOP databases reside. - Configure
models/sources.yml
: Edit this file to specify where the incoming data sources (EHR and Claims) are located.
Here is a list of the main models in this package:
linked_person.sql
: Matches persons from EHR and Claims databases based onperson_source_value
.
person_discrepancy.sql
: Identifies discrepancies ingender_concept_id
andyear_of_birth
in theperson
table.temporal_discrepancy_condition_occurrence.sql
: Identifies temporal discrepancies in thecondition_occurrence
table.
person_summary.sql
: Provides a summary of the number of matched, unmatched, and total persons in each source.
To build the models, navigate to the directory containing the dbt_project.yml
file and run:
dbt run
If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request.