Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create / Get access to mock OMOP data #15

Closed
3 of 4 tasks
linear bot opened this issue Aug 25, 2024 · 15 comments
Closed
3 of 4 tasks

Create / Get access to mock OMOP data #15

linear bot opened this issue Aug 25, 2024 · 15 comments
Assignees

Comments

@linear
Copy link

linear bot commented Aug 25, 2024

TLDR: Get example OMOP data to map it with RAMSES data model

Background: Ramses's Data model has a particular focus which is different from that of the OMOP data. We need some example OMOP data to match it with RAMSES data and then build a transformation tool

Tasks

Copy link

razekmh commented Aug 25, 2024

The BigQuery dataset is accessible to view on the web interface in the preview tab

image.png

To view the data you will need a google account and then to create a project on BigQuery/GCP then you will find the dataset in the public datasets list. You cannot easily export this data nor download it.

Copy link

razekmh commented Aug 25, 2024

While you can query the data directly on BigQuery using SQL. I do not think it is ideal as there's a limited quota of 1TB for the sizes of retrieved data which might be easily exceeded during experimentation.

Copy link

razekmh commented Aug 25, 2024

Another way to access the data is using the R package bigrquery. To do so, you still need to create a project and even a service account. Here are some guiding steps:

  • After creating a project, enable BigQuery API and then go to the Credentials page and click on create credentials

image.png

  • choose service account

image.png

  • Fill in a name of your choice for the service account.
  • Add a role for the service account. I am not sure how the permissions work here, so please be careful with your choices. ( I gave my account owner permissions which is too open and should be restricted)

image.png

  • Click on done to close the wizard
  • Your account will now appear in the Credentials page under Service Accounts
  • Click on your account to open the account page.
  • Click on the keys tab

image.png

  • Click on Add key and choose to Create a new key (Google recommends we use Workload Identity Federation but for this use case I found it easier to use the json based key)
  • Choose JSON and create the key.
  • They key will be downloaded to your device.
  • Store the key somewhere secure and make sure it is not readable by other people. Please revoke the key access once you are done with project exploration.
  • In your R studio console, install the BigRQuery library install.packages("bigrquery")
  • Load the library to the session library(bigrquery) and library(DBI)
  • You might need to load gargle library too library(gargle)
  • Set a variable to the location of your service account JSON file sa_key_path <- ("put the path here")
  • Authenticate your session using bq_auth(path = sa_key_path)
  • Get your Google project ID
  • Set a variable with the project name project_id <- "put the project name here"
  • Create a connection with the BigQuery database
con <- dbConnect(bigquery(), project = project_id, dataset = "bigquery-public-data")
  • Build SQL statement to test your connection (note the name of the table bigquery-public-data.cms_synthetic_patient_data_omop.care_site which should include the dataset name
sql_string <- "SELECT * FROM `bigquery-public-data.cms_synthetic_patient_data_omop.care_site` LIMIT 20"
  • Send the query to BigQuery result <- dbGetQuery(con, sql_string)
  • You may save the results to desk using the write.table() function.

@razekmh
Copy link

razekmh commented Aug 25, 2024

One issue with using the BigQuery dataset is that the tables which I believe RAMSES might depend on has sum null values. This is connected to the issues #13 and #14.
As far as I understand it currently, RAMSES detects the therapy episodes based on the timing of ordering and administration of medication. OMOP has concepts such as drug era, dose era, and drug exposure which might bee useful in extracting information about the drug administration. However, the drug order (prescription) is a bit unclear. There might be some relevant information to extract from the tables VISIT_OCCURRENCE, PROCEDURE_OCCURRENCE, CONDITION_OCCURRENCE and OBSERVATION but I found that the VISIT_OCCURRENCE table is not listed in the dataset and the columns condition_start_datetime and condition_end_datetime in the table CONDITION_OCCURRENCE are null.

Edit: columns condition_start_date and condition_end_date in the table CONDITION_OCCURRENCE contain data so this might be useful after all

@razekmh
Copy link

razekmh commented Aug 29, 2024

@zsenousy and @razekmh did some investigation into the omock package and we think it can be used to generate a fair number of the OMOP tables.

The package creates a (Common Data Model) CDM and fills it with tables. For example the following code snippet

 cdm <- mockCdmReference() |>
    mockPerson() |>
    mockObservationPeriod() |>
    mockConditionOccurrence(recordPerson = 2) |>
    mockDrugExposure(recordPerson = 3) |>
    mockMeasurement(recordPerson = 5) |>
    mockDeath(recordPerson = 1)

will create a CDM

• omop tables: person, observation_period, cdm_source,
concept, vocabulary, concept_relationship,
concept_synonym, concept_ancestor, drug_strength,
condition_occurrence, drug_exposure, measurement, death
• cohort tables: -
• achilles tables: -
• other tables: -

The function mockVisitOccurrence throws an error could not find function "mockVisitOccurrence"

@zsenousy
Copy link

https://cran.r-project.org/web/packages/omock/omock.pdf description on how to use OMOCK functions for generating CDM object including the required tables for fitting to RAMSES.

@zsenousy
Copy link

zsenousy commented Aug 29, 2024

I see an issue here in omock repo which indicates they are still working on mockVisitOccurrence here

@zsenousy
Copy link

zsenousy commented Sep 3, 2024

@skeating, @razekmh, and @zsenousy met with Steve and Tim 30 August 2024 to discuss initial issues and walkthrough OMOP. We had a discussion regarding our understanding to RAMSES package of the focus of exploring drug prescriptions and administrations and how we link this to OMOP data extract that Tim will provide to us by Wednesday 4 September.

Conclusions from the meeting:

  • Initial steps for us will be to understand RAMSES primary data and identify any derived data so that we can map the required data from OMOP to RAMSES validation and visualisation functions.
  • Build High-level bridge between RAMSES and OMOP with highlighting what RAMSES functions operated correctly to OMOP and listing out functions with possible issues for future
  • Meet with medical professionals to get concrete use cases which will identify how to test RAMSES after integration with OMOP.
  • Use Slab https://slab.com/ for writing notes as Steve recommended.

Copy link

Neat! Would you be able to do a quick demo of omock at one of the Monday SAFEHR meetings? @stef only if you think this is a good idea?

@stef
Copy link

stef commented Sep 5, 2024

you have my blessing, may the devil of demos be kind to you! also maybe, tag the correct person next time you look for approval... :)

@razekmh
Copy link

razekmh commented Sep 5, 2024

pinging @stefpiatek for this

Copy link

👋 Stef (apologies, I think this might be the second time).

Yeah I think it'd be a nice demo to get in for sure, maybe 2 weeks on Monday?

Copy link

that was freaky the @stef misfire thing …

@stefpiatek
Copy link

I think it might have been because my linear username was @stef, and perhaps is hadn't linked the accounts on import? Updated my username to be consistent between the two

@AngharadGreen
Copy link

AngharadGreen commented Sep 17, 2024

@razekmh @zsenousy - following the instructions from - https://github.com/OHDSI/ETL-CMS.git, I have produced the .csv files from the DE-SynPUF dataset that can be loaded into an OMOP CDM v5.2 database. This is the same data that is available from here - https://console.cloud.google.com/marketplace/product/hhs/synpuf?project=ag-amr.

I found accessing the database via BigQuery in the google cloud confusing, so switched to trying to access the data this way.

The data contains information from 2.33 million synthetic patients and the total size of the .csv files together is 97.2 GB, I have uploaded them in a zipped file to a teams channel SharePoint I've created and given you access to - https://liveuclac.sharepoint.com/:f:/s/AMRRAMSESOMOPproject/EhOa4CvrelVNu0F2su_hmxIBruJ3j1ItS-3DuXJGBASXtw?e=9jBnwQ

It has created the following OMOP tables:
drug_cost.csv
observation.csv
procedure_occurrence.csv
device_exposure.csv
procedure_cost.csv
observation_period.csv
provider.csv
measurement_occurrence.csv person.csv
visit_cost.csv
drug_exposure.csv
condition_occurrence.csv
care_site.csv
visit_occurrence.csv
location.csv
specimen.csv
death.csv
payer_plan_period.csv
device_cost.csv

For some reason the csv tables for specimen, visit_cost and device_cost are blank, except for the column headers, so I am trying to figure out why.

Let me know if you have problems accessing these .csv files. I am in the process of working through them into the OMOP CDM to load into RAMSES

Edit 18 September 2024: I have also uploaded the OMOP_Input files (OMOP_Input.zip) to the teams sharepoint which contains the vocabulary data that also needs to be loaded to create the OMOP database. I am currently trying to create the OMOP database in PostgreSQL as when following Validate and load electronic health records, RAMSES will only connect to a PostgreSQL server to load in external healthcare records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants