Create / Get access to mock OMOP data #15

linear · 2024-08-25T11:01:13Z

TLDR: Get example OMOP data to map it with RAMSES data model

Background: Ramses's Data model has a particular focus which is different from that of the OMOP data. We need some example OMOP data to match it with RAMSES data and then build a transformation tool

Tasks

Explore Synthetic Patient Data in OMOP dataset on BigQuery
Explore generating OMOP data using omock
Discuss options for mock data with contacts who work on OMOP in ARC
Summaries the general findings

razekmh · 2024-08-25T12:49:00Z

The BigQuery dataset is accessible to view on the web interface in the preview tab

To view the data you will need a google account and then to create a project on BigQuery/GCP then you will find the dataset in the public datasets list. You cannot easily export this data nor download it.

razekmh · 2024-08-25T12:52:39Z

While you can query the data directly on BigQuery using SQL. I do not think it is ideal as there's a limited quota of 1TB for the sizes of retrieved data which might be easily exceeded during experimentation.

razekmh · 2024-08-25T13:35:26Z

Another way to access the data is using the R package bigrquery. To do so, you still need to create a project and even a service account. Here are some guiding steps:

After creating a project, enable BigQuery API and then go to the Credentials page and click on create credentials

choose service account

Fill in a name of your choice for the service account.
Add a role for the service account. I am not sure how the permissions work here, so please be careful with your choices. ( I gave my account owner permissions which is too open and should be restricted)

Click on done to close the wizard
Your account will now appear in the Credentials page under Service Accounts
Click on your account to open the account page.
Click on the keys tab

Click on Add key and choose to Create a new key (Google recommends we use Workload Identity Federation but for this use case I found it easier to use the json based key)
Choose JSON and create the key.
They key will be downloaded to your device.
Store the key somewhere secure and make sure it is not readable by other people. Please revoke the key access once you are done with project exploration.
In your R studio console, install the BigRQuery library install.packages("bigrquery")
Load the library to the session library(bigrquery) and library(DBI)
You might need to load gargle library too library(gargle)
Set a variable to the location of your service account JSON file sa_key_path <- ("put the path here")
Authenticate your session using bq_auth(path = sa_key_path)
Get your Google project ID
Set a variable with the project name project_id <- "put the project name here"
Create a connection with the BigQuery database

con <- dbConnect(bigquery(), project = project_id, dataset = "bigquery-public-data")

Build SQL statement to test your connection (note the name of the table bigquery-public-data.cms_synthetic_patient_data_omop.care_site which should include the dataset name

sql_string <- "SELECT * FROM `bigquery-public-data.cms_synthetic_patient_data_omop.care_site` LIMIT 20"

Send the query to BigQuery result <- dbGetQuery(con, sql_string)
You may save the results to desk using the write.table() function.

razekmh · 2024-08-25T14:00:42Z

One issue with using the BigQuery dataset is that the tables which I believe RAMSES might depend on has sum null values. This is connected to the issues #13 and #14.
As far as I understand it currently, RAMSES detects the therapy episodes based on the timing of ordering and administration of medication. OMOP has concepts such as drug era, dose era, and drug exposure which might bee useful in extracting information about the drug administration. However, the drug order (prescription) is a bit unclear. There might be some relevant information to extract from the tables VISIT_OCCURRENCE, PROCEDURE_OCCURRENCE, CONDITION_OCCURRENCE and OBSERVATION but I found that the VISIT_OCCURRENCE table is not listed in the dataset and the columns condition_start_datetime and condition_end_datetime in the table CONDITION_OCCURRENCE are null.

Edit: columns condition_start_date and condition_end_date in the table CONDITION_OCCURRENCE contain data so this might be useful after all

razekmh · 2024-08-29T16:50:07Z

@zsenousy and @razekmh did some investigation into the omock package and we think it can be used to generate a fair number of the OMOP tables.

The package creates a (Common Data Model) CDM and fills it with tables. For example the following code snippet

 cdm <- mockCdmReference() |>
    mockPerson() |>
    mockObservationPeriod() |>
    mockConditionOccurrence(recordPerson = 2) |>
    mockDrugExposure(recordPerson = 3) |>
    mockMeasurement(recordPerson = 5) |>
    mockDeath(recordPerson = 1)

will create a CDM

• omop tables: person, observation_period, cdm_source,
concept, vocabulary, concept_relationship,
concept_synonym, concept_ancestor, drug_strength,
condition_occurrence, drug_exposure, measurement, death
• cohort tables: -
• achilles tables: -
• other tables: -

The function mockVisitOccurrence throws an error could not find function "mockVisitOccurrence"

zsenousy · 2024-08-29T16:59:08Z

https://cran.r-project.org/web/packages/omock/omock.pdf description on how to use OMOCK functions for generating CDM object including the required tables for fitting to RAMSES.

zsenousy · 2024-08-29T17:14:13Z

I see an issue here in omock repo which indicates they are still working on mockVisitOccurrence here

zsenousy · 2024-09-03T11:46:57Z

@skeating, @razekmh, and @zsenousy met with Steve and Tim 30 August 2024 to discuss initial issues and walkthrough OMOP. We had a discussion regarding our understanding to RAMSES package of the focus of exploring drug prescriptions and administrations and how we link this to OMOP data extract that Tim will provide to us by Wednesday 4 September.

Conclusions from the meeting:

Initial steps for us will be to understand RAMSES primary data and identify any derived data so that we can map the required data from OMOP to RAMSES validation and visualisation functions.
Build High-level bridge between RAMSES and OMOP with highlighting what RAMSES functions operated correctly to OMOP and listing out functions with possible issues for future
Meet with medical professionals to get concrete use cases which will identify how to test RAMSES after integration with OMOP.
Use Slab https://slab.com/ for writing notes as Steve recommended.

docsteveharris · 2024-09-05T10:10:27Z

Neat! Would you be able to do a quick demo of omock at one of the Monday SAFEHR meetings? @stef only if you think this is a good idea?

stef · 2024-09-05T11:28:54Z

you have my blessing, may the devil of demos be kind to you! also maybe, tag the correct person next time you look for approval... :)

razekmh · 2024-09-05T12:50:09Z

pinging @stefpiatek for this

stefpiatek · 2024-09-05T14:38:09Z

👋 Stef (apologies, I think this might be the second time).

Yeah I think it'd be a nice demo to get in for sure, maybe 2 weeks on Monday?

docsteveharris · 2024-09-05T17:02:52Z

that was freaky the @stef misfire thing …

stefpiatek · 2024-09-11T13:49:09Z

I think it might have been because my linear username was @stef, and perhaps is hadn't linked the accounts on import? Updated my username to be consistent between the two

AngharadGreen · 2024-09-17T15:57:37Z

@razekmh @zsenousy - following the instructions from - https://github.com/OHDSI/ETL-CMS.git, I have produced the .csv files from the DE-SynPUF dataset that can be loaded into an OMOP CDM v5.2 database. This is the same data that is available from here - https://console.cloud.google.com/marketplace/product/hhs/synpuf?project=ag-amr.

I found accessing the database via BigQuery in the google cloud confusing, so switched to trying to access the data this way.

The data contains information from 2.33 million synthetic patients and the total size of the .csv files together is 97.2 GB, I have uploaded them in a zipped file to a teams channel SharePoint I've created and given you access to - https://liveuclac.sharepoint.com/:f:/s/AMRRAMSESOMOPproject/EhOa4CvrelVNu0F2su_hmxIBruJ3j1ItS-3DuXJGBASXtw?e=9jBnwQ

It has created the following OMOP tables:
drug_cost.csv
observation.csv
procedure_occurrence.csv
device_exposure.csv
procedure_cost.csv
observation_period.csv
provider.csv
measurement_occurrence.csv person.csv
visit_cost.csv
drug_exposure.csv
condition_occurrence.csv
care_site.csv
visit_occurrence.csv
location.csv
specimen.csv
death.csv
payer_plan_period.csv
device_cost.csv

For some reason the csv tables for specimen, visit_cost and device_cost are blank, except for the column headers, so I am trying to figure out why.

Let me know if you have problems accessing these .csv files. I am in the process of working through them into the OMOP CDM to load into RAMSES

Edit 18 September 2024: I have also uploaded the OMOP_Input files (OMOP_Input.zip) to the teams sharepoint which contains the vocabulary data that also needs to be loaded to create the OMOP database. I am currently trying to create the OMOP database in PostgreSQL as when following Validate and load electronic health records, RAMSES will only connect to a PostgreSQL server to load in external healthcare records.

razekmh assigned razekmh, zsenousy and AngharadGreen Aug 25, 2024

razekmh mentioned this issue Sep 10, 2024

Mapping of OMOP CDM data to RAMSES DB #17

Closed

AngharadGreen closed this as completed Sep 17, 2024

AngharadGreen reopened this Sep 17, 2024

skeating closed this as completed Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create / Get access to mock OMOP data #15

Create / Get access to mock OMOP data #15

linear bot commented Aug 25, 2024 •

edited by zsenousy

Loading

razekmh commented Aug 25, 2024

razekmh commented Aug 25, 2024

razekmh commented Aug 25, 2024 •

edited

Loading

razekmh commented Aug 25, 2024 •

edited

Loading

razekmh commented Aug 29, 2024 •

edited

Loading

zsenousy commented Aug 29, 2024

zsenousy commented Aug 29, 2024 •

edited

Loading

zsenousy commented Sep 3, 2024 •

edited

Loading

docsteveharris commented Sep 5, 2024

stef commented Sep 5, 2024

razekmh commented Sep 5, 2024

stefpiatek commented Sep 5, 2024

docsteveharris commented Sep 5, 2024

stefpiatek commented Sep 11, 2024

AngharadGreen commented Sep 17, 2024 •

edited

Loading

Create / Get access to mock OMOP data #15

Create / Get access to mock OMOP data #15

Comments

linear bot commented Aug 25, 2024 • edited by zsenousy Loading

razekmh commented Aug 25, 2024

razekmh commented Aug 25, 2024

razekmh commented Aug 25, 2024 • edited Loading

razekmh commented Aug 25, 2024 • edited Loading

razekmh commented Aug 29, 2024 • edited Loading

zsenousy commented Aug 29, 2024

zsenousy commented Aug 29, 2024 • edited Loading

zsenousy commented Sep 3, 2024 • edited Loading

docsteveharris commented Sep 5, 2024

stef commented Sep 5, 2024

razekmh commented Sep 5, 2024

stefpiatek commented Sep 5, 2024

docsteveharris commented Sep 5, 2024

stefpiatek commented Sep 11, 2024

AngharadGreen commented Sep 17, 2024 • edited Loading

linear bot commented Aug 25, 2024 •

edited by zsenousy

Loading

razekmh commented Aug 25, 2024 •

edited

Loading

razekmh commented Aug 25, 2024 •

edited

Loading

razekmh commented Aug 29, 2024 •

edited

Loading

zsenousy commented Aug 29, 2024 •

edited

Loading

zsenousy commented Sep 3, 2024 •

edited

Loading

AngharadGreen commented Sep 17, 2024 •

edited

Loading