-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create / Get access to mock OMOP data #15
Comments
The BigQuery dataset is accessible to view on the web interface in the preview tab To view the data you will need a google account and then to create a project on BigQuery/GCP then you will find the dataset in the public datasets list. You cannot easily export this data nor download it. |
While you can query the data directly on BigQuery using SQL. I do not think it is ideal as there's a limited quota of 1TB for the sizes of retrieved data which might be easily exceeded during experimentation. |
Another way to access the data is using the R package bigrquery. To do so, you still need to create a project and even a service account. Here are some guiding steps:
con <- dbConnect(bigquery(), project = project_id, dataset = "bigquery-public-data")
sql_string <- "SELECT * FROM `bigquery-public-data.cms_synthetic_patient_data_omop.care_site` LIMIT 20"
|
One issue with using the BigQuery dataset is that the tables which I believe RAMSES might depend on has sum null values. This is connected to the issues #13 and #14. Edit: columns |
@zsenousy and @razekmh did some investigation into the omock package and we think it can be used to generate a fair number of the OMOP tables. The package creates a (Common Data Model) CDM and fills it with tables. For example the following code snippet
will create a CDM
The function mockVisitOccurrence throws an error |
https://cran.r-project.org/web/packages/omock/omock.pdf description on how to use OMOCK functions for generating CDM object including the required tables for fitting to RAMSES. |
I see an issue here in omock repo which indicates they are still working on mockVisitOccurrence here |
@skeating, @razekmh, and @zsenousy met with Steve and Tim 30 August 2024 to discuss initial issues and walkthrough OMOP. We had a discussion regarding our understanding to RAMSES package of the focus of exploring drug prescriptions and administrations and how we link this to OMOP data extract that Tim will provide to us by Wednesday 4 September. Conclusions from the meeting:
|
Neat! Would you be able to do a quick demo of omock at one of the Monday SAFEHR meetings? @stef only if you think this is a good idea? |
you have my blessing, may the devil of demos be kind to you! also maybe, tag the correct person next time you look for approval... :) |
pinging @stefpiatek for this |
👋 Stef (apologies, I think this might be the second time). Yeah I think it'd be a nice demo to get in for sure, maybe 2 weeks on Monday? |
that was freaky the |
I think it might have been because my linear username was |
@razekmh @zsenousy - following the instructions from - https://github.com/OHDSI/ETL-CMS.git, I have produced the .csv files from the DE-SynPUF dataset that can be loaded into an OMOP CDM v5.2 database. This is the same data that is available from here - https://console.cloud.google.com/marketplace/product/hhs/synpuf?project=ag-amr. I found accessing the database via BigQuery in the google cloud confusing, so switched to trying to access the data this way. The data contains information from 2.33 million synthetic patients and the total size of the .csv files together is 97.2 GB, I have uploaded them in a zipped file to a teams channel SharePoint I've created and given you access to - https://liveuclac.sharepoint.com/:f:/s/AMRRAMSESOMOPproject/EhOa4CvrelVNu0F2su_hmxIBruJ3j1ItS-3DuXJGBASXtw?e=9jBnwQ It has created the following OMOP tables: For some reason the csv tables for specimen, visit_cost and device_cost are blank, except for the column headers, so I am trying to figure out why. Let me know if you have problems accessing these .csv files. I am in the process of working through them into the OMOP CDM to load into RAMSES Edit 18 September 2024: I have also uploaded the OMOP_Input files (OMOP_Input.zip) to the teams sharepoint which contains the vocabulary data that also needs to be loaded to create the OMOP database. I am currently trying to create the OMOP database in PostgreSQL as when following Validate and load electronic health records, RAMSES will only connect to a PostgreSQL server to load in external healthcare records. |
TLDR: Get example OMOP data to map it with RAMSES data model
Background: Ramses's Data model has a particular focus which is different from that of the OMOP data. We need some example OMOP data to match it with RAMSES data and then build a transformation tool
Tasks
The text was updated successfully, but these errors were encountered: