Skip to content

A guide to combining data from CDISC and OMOP formats for the purpose of analysis for external control research


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



6 Commits

Repository files navigation

Guide: Converting OMOP Data to CDISC Format for External Controls

This guide provides a step-by-step process for converting OMOP data into the CDISC format for the purpose of analyzing OMOP data as external controls. This includes an overview of both CDISC and OMOP formats, defining mappings between them, and transforming the data using sample code in Python, R, STATA, and SPSS.

Table of Contents

Overview of CDISC and OMOP Formats



Extract Data from OMOP

Transform Data to CDISC Format

Export Transformed Data to CDISC-Compliant Format

Additional Code Samples in R, STATA, and SPSS

Validate the CDISC Output

1. Overview of CDISC and OMOP Formats

CDISC Format (Clinical Data Standards)

CDISC is designed to standardize clinical trial data, allowing for consistency and reproducibility. The main CDISC standards relevant to clinical trials include:

  • SDTM (Study Data Tabulation Model): Organizes clinical trial data into domains like DM (Demographics), AE (Adverse Events), LB (Laboratory), and more.
  • ADaM (Analysis Data Model): Structures datasets for statistical analysis, with a focus on traceability.
  • SEND (Standard for Exchange of Nonclinical Data): Used for nonclinical studies, primarily in regulatory submissions.

CDISC SDTM Domains Overview

Domain Description Commonly Used Variables
DM Demographics: Basic participant data SUBJID, AGE, SEX, RACE
AE Adverse Events: Reported adverse events AESEQ, AESTDTC, AEENDTC, AETERM
LB Laboratory: Lab test results LBSEQ, LBDTC, LBORRES, LBTEST
CM Concomitant Medications: Medications taken during the trial CMSEQ, CMSTDTC, CMENDTC, CMTRT
EX Exposure: Study treatment details EXSEQ, EXSTDTC, EXENDTC, EXDOSE
VS Vital Signs: Measurements of vitals VSSEQ, VSDTC, VSORRES, VSTEST
MH Medical History: Participant medical history MHSEQ, MHTERM, MHDTC
PR Procedures: Performed medical procedures PRSEQ, PRSTDTC, PRENDTC, PRTRT
EG ECG Test Results: Electrocardiogram results EGSEQ, EGDTC, EGORRES, EGTEST
DS Disposition: Status and reason for discontinuation DSDECOD, DSTERM, DSSTDTC

OMOP Format (Observational Medical Outcomes Partnership)

OMOP is a common data model primarily used for observational research, organizing it into standardized tables optimized for large-scale analytics. Below is an overview of key OMOP tables relevant for mapping to CDISC:

Table Description Commonly Used Variables
Person Demographics: Patient data person_id, gender_concept_id, year_of_birth, race_concept_id
Condition Occurrence Conditions: Recorded conditions condition_occurrence_id, person_id, condition_concept_id, condition_start_date, condition_end_date
Drug Exposure Drug Exposure: Medication data drug_exposure_id, person_id, drug_concept_id, drug_exposure_start_date, drug_exposure_end_date
Measurement Measurements: Lab tests, vitals measurement_id, person_id, measurement_concept_id, measurement_date, value_as_number
Observation Observations: General observations observation_id, person_id, observation_concept_id, observation_date, value_as_number
Procedure Occurrence Procedures: Recorded procedures procedure_occurrence_id, person_id, procedure_concept_id, procedure_date
Visit Occurrence Visits: Patient visits visit_occurrence_id, person_id, visit_concept_id, visit_start_date, visit_end_date



  1. Define Data Mappings by identifying mappings between OMOP tables and CDISC SDTM domains.
  2. Extract data from OMOP tables.
  3. Map OMOP fields to CDISC-compliant fields.
  4. Transform data to align with CDISC formatting.
  5. Export transformed data.
  6. Validate using CDISC validation tools.

Extract Data from OMOP

Load OMOP data using SQL, Python, or R. In Python:

import pandas as pd
import sqlite3  # or any library to access your OMOP database

# Connect to OMOP database
conn = sqlite3.connect("omop_db.sqlite")

# Extract data from OMOP tables
person_df = pd.read_sql_query("SELECT * FROM person", conn)
condition_occurrence_df = pd.read_sql_query("SELECT * FROM condition_occurrence", conn)
measurement_df = pd.read_sql_query("SELECT * FROM measurement", conn)
drug_exposure_df = pd.read_sql_query("SELECT * FROM drug_exposure", conn)

# Close the connection

# Display the data for verification
print("Person Data:")
print("\nCondition Occurrence Data:")
print("\nMeasurement Data:")
print("\nDrug Exposure Data:")

Transform Data to CDISC Format

Use identified mappings to transform OMOP variables into CDISC-compliant formats. See Python example code below.

# Map Demographics (DM) data
dm = person_df.rename(columns={
    'person_id': 'SUBJID',
    'gender_concept_id': 'SEX',
    'year_of_birth': 'BRTHDTC'
dm['STUDYID'] = 'YourStudyID'

# Map Adverse Events (AE) data
ae = condition_occurrence_df.rename(columns={
    'condition_occurrence_id': 'AESEQ',
    'condition_start_date': 'AESTDTC',
    'condition_end_date': 'AEENDTC',
    'condition_concept_id': 'AETERM'
ae['STUDYID'] = '


# Map Laboratory (LB) data
lb = measurement_df.rename(columns={
    'measurement_id': 'LBSEQ',
    'measurement_date': 'LBDTC',
    'value_as_number': 'LBORRES'
lb['STUDYID'] = 'YourStudyID'

Export Transformed Data to CDISC-Compliant Format

Export transformed DataFrames in formats compatible with CDISC (e.g., .xpt or .csv for SDTM datasets).

dm.to_csv("DM.csv", index=False)
ae.to_csv("AE.csv", index=False)
lb.to_csv("LB.csv", index=False)

Additional Code Samples in R, STATA, and SPSS

R Code Sample

# Load necessary libraries

# Connect to OMOP database
con <- dbConnect(RSQLite::SQLite(), "omop_db.sqlite")

# Extract tables
person_df <- dbGetQuery(con, "SELECT * FROM person")
condition_occurrence_df <- dbGetQuery(con, "SELECT * FROM condition_occurrence")

# Transform Demographics (DM) Data
dm <- person_df %>%
  rename(SUBJID = person_id, SEX = gender_concept_id, BRTHDTC = year_of_birth) %>%
  mutate(STUDYID = "YourStudyID")

# Export to CSV
write.csv(dm, "DM.csv", row.names = FALSE)

# Disconnect from database

STATA Code Sample

// Load OMOP data
use person, clear

// Transform Demographics for CDISC DM
gen SUBJID = person_id
gen SEX = gender_concept_id
gen BRTHDTC = year_of_birth
gen STUDYID = "YourStudyID"

// Save transformed DM data
save DM, replace

SPSS Code Sample

RENAME VARIABLES (person_id = SUBJID)(gender_concept_id = SEX)(year_of_birth = BRTHDTC).

Validate the CDISC Output

Validation using Pinnacle 21 or similar tools is essential to ensure SDTM compliance.

Pinnacle 21 Overview

Pinnacle 21 is a software product for validating compliance with CDISC standards. It is widely used by regulatory agencies, Pinnacle 21 offers two versions:

  • Community Version: Free, open-source tool for validation and Define.xml generation.
  • Enterprise Version: Advanced features for continuous compliance checks and submission readiness.

Pinnacle 21 plays a role in:

  • Standardizing Data
  • Making sure data sets comply with CDISC (SDTM, ADaM) and facilitating easier transformations from formats like OMOP to CDISC.
  • Regulatory Compliance
  • Helping organizations achieve "submission-ready" datasets, which are critical for regulatory approval.
  • Supporting CDISC Transformation Projects
  • Serving as a primary tool for data validation, Pinnacle 21 is invaluable for projects like the OMOP to CDISC transformation detailed in this guide.

Additional Resources:


A guide to combining data from CDISC and OMOP formats for the purpose of analysis for external control research







No releases published


No packages published