-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping of OMOP CDM data to RAMSES DB #17
Comments
Initial mapping RAMSES - OMOP1. Patient InformationOMOP CDM Tables:
RAMSES DB Equivalent:
Mapping Details:
2. Clinical EventsOMOP CDM Tables:
RAMSES DB Equivalent:
Mapping Details:
3. Microbiology and Laboratory DataOMOP CDM Tables:
RAMSES DB Equivalent:
Mapping Details:
4. Healthcare EncountersOMOP CDM Tables:
RAMSES DB Equivalent:
Mapping Details:
5. Missing or Unmapped ItemsGaps in RAMSES Database:
|
Thank you for this @zsenousy. I'm creating a visualisation of the OMOP and RAMSES mappings as well |
I am using omock to match the OMOP data to Ramses. My target in the mapping is to match the structure of the dummy data Ramses is using in the validate article. I am starting the matching with the two table > str(drug_prescriptions)
'data.frame': 367 obs. of 12 variables:
$ patient_id : chr "5124578766" "4874231672" "6292626973" "6292626973" ...
$ prescription_id : chr "66cac1c5eab88d72c8b7687966357f5b" "8ccd67f4730b62ceafb8bcb27996c10c" "72cf4b592b0f4e2143b4bb9d7c569c97" "806c86b55cf50505a20b722f081c4075" ...
$ rxsummary : chr "Piperacillin / Tazobactam IVI 4.5 g TDS" "Ciprofloxacin ORAL 500 mg BD" "Flucloxacillin ORAL 500 mg 6H" "Metronidazole ORAL 400 mg TDS" ...
$ authoring_date : POSIXct, format: "2015-08-04 13:07:16" "2017-07-06 09:49:31" "2017-01-13 17:34:41" "2017-11-12 09:09:13" ...
$ prescription_start: POSIXct, format: "2015-08-04 14:45:16" "2017-07-06 10:26:31" "2017-01-13 18:48:41" "2017-11-12 09:44:13" ...
$ prescription_end : POSIXct, format: "2015-08-07 14:45:16" "2017-07-07 22:26:31" "2017-01-17 18:48:41" "2017-11-14 09:44:13" ...
$ tr_DESC : chr "Piperacillin / Tazobactam" "Ciprofloxacin" "Flucloxacillin" "Metronidazole" ...
$ route : chr "IV" "ORAL" "ORAL" "ORAL" ...
$ dose : num 4.5 500 500 400 4.5 500 4.5 4.5 600 2 ...
$ units : chr "g" "mg" "mg" "mg" ...
$ frequency : chr "TDS" "BD" "6H" "TDS" ...
$ duration : num 3 1.5 4 2 4 3 3 2 1 2 ... > str(drug_administrations)
'data.frame': 2818 obs. of 7 variables:
$ patient_id : chr "5124578766" "5124578766" "5124578766" "5124578766" ...
$ prescription_id : chr "66cac1c5eab88d72c8b7687966357f5b" "66cac1c5eab88d72c8b7687966357f5b" "66cac1c5eab88d72c8b7687966357f5b" "66cac1c5eab88d72c8b7687966357f5b" ...
$ tr_DESC : chr "Piperacillin / Tazobactam" "Piperacillin / Tazobactam" "Piperacillin / Tazobactam" "Piperacillin / Tazobactam" ...
$ route : chr "IV" "IV" "IV" "IV" ...
$ dose : num 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 500 ...
$ units : chr "g" "g" "g" "g" ...
$ administration_date: POSIXct, format: "2015-08-04 14:45:16" "2015-08-04 23:45:16" "2015-08-05 08:45:16" "2015-08-05 17:45:16" ... |
I am creating a dummy dataset using omock by passing in the following code. The variables are set to default examples from the omock code and comments. I am also creating a ramses dataset to match against
Tackling the > cdm$drug_exposure
# A tibble: 180 × 6
drug_concept_id person_id drug_exposure_start_date drug_exposure_end_date drug_exposure_id drug_type_concept_id
* <dbl> <int> <date> <date> <int> <dbl>
1 10 9 1991-07-23 1995-03-08 1 1
2 10 4 1994-10-23 2003-10-24 2 1
3 10 7 2014-03-15 2014-03-22 3 1
4 10 1 2002-04-13 2004-12-14 4 1
5 10 2 1999-12-08 2001-02-27 5 1
6 10 7 2014-03-13 2014-03-26 6 1
7 10 2 1999-04-22 2002-11-26 7 1
8 10 3 2015-03-01 2015-05-28 8 1
9 10 1 2002-08-13 2005-10-05 9 1
10 10 5 2012-05-16 2012-09-02 10 1
# ℹ 170 more rows
# ℹ Use `print(n = ...)` to see more rows We can see that
However the attributes |
Tracing the > cdm$concept
# A tibble: 3,245 × 10
concept_id concept_name domain_id vocabulary_id standard_concept concept_class_id concept_code valid_start_date
* <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Musculoskele… Condition SNOMED S Clinical Finding 1234 NA
2 2 Osteoarthros… Condition SNOMED S Clinical Finding 1234 NA
3 3 Arthritis Condition SNOMED S Clinical Finding 1234 NA
4 4 Osteoarthrit… Condition SNOMED S Clinical Finding 1234 NA
5 5 Osteoarthrit… Condition SNOMED S Clinical Finding 1234 NA
6 6 Osteonecrosis Condition SNOMED S Clinical Finding 1234 NA
7 7 Degenerative… Condition Read NA Diagnosis 1234 NA
8 8 Knee osteoar… Condition Read NA Diagnosis 1234 NA
9 9 H/O osteoart… Observat… LOINC S Observation 1234 NA
10 10 Adalimumab Drug RxNorm S Ingredient 1234 NA
# ℹ 3,235 more rows
# ℹ 2 more variables: valid_end_date <chr>, invalid_reason <chr>
# ℹ Use `print(n = ...)` to see more rows The > unique(cdm$concept$domain_id)
[1] "Condition" "Observation" "Drug" "Ethnicity" "Gender" "Race" "Unit" "Visit"
[9] "Measurement" We can filter the > cdm$concept |> filter(domain_id == 'Drug')
# A tibble: 6 × 10
concept_id concept_name domain_id vocabulary_id standard_concept concept_class_id concept_code valid_start_date
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 Adalimumab Drug RxNorm S Ingredient 1234 NA
2 11 Injection Drug OMOP NA Dose Form 1234 NA
3 12 ALIMENTARY TR… Drug ATC NA ATC 1st 1234 NA
4 13 Descendant dr… Drug RxNorm S Drug 1234 NA
5 14 Injectable Drug OMOP NA Dose Form 1234 NA
6 19 Other ingredi… Drug RxNorm S Ingredient 1234 NA
# ℹ 2 more variables: valid_end_date <chr>, invalid_reason <chr> |
Let's the save this table to prepare it for more processing > drug_concepts <- cdm$concept |> filter(domain_id == 'Drug') One of the steps of the validation process is to match the drug codes using the # attempting to map drug name using AMR package
drug_prescriptions$drug_code <- AMR::as.ab(drug_prescriptions$tr_DESC) We can try the same with the list of drugs that were found in the > drug_concepts$drug_code <- drug_concepts$concept_name |> AMR::as.ab()
Warning message:
in as.ab(): these values could not be coerced to a valid antimicrobial ID: "Adalimumab", "Injectable",
"Injection", and "Other ingredient". We get a partial match as seen in the table > drug_concepts |> select(concept_id,concept_name, drug_code)
# A tibble: 6 × 3
concept_id concept_name drug_code
<dbl> <chr> <ab>
1 10 Adalimumab NA
2 11 Injection NA
3 12 ALIMENTARY TRACT AND METABOLISM PPA
4 13 Descendant drug SLF9
5 14 Injectable NA
6 19 Other ingredient NA Maybe manual editing is required for this step? as in the example from validate article # editing drug names
drug_prescriptions$drug_code <- gsub("Vancomycin protocol",
"Vancomycin",
drug_prescriptions$tr_DESC)
# mapping drug name using AMR package
drug_prescriptions$drug_code <- AMR::as.ab(drug_prescriptions$drug_code)
drug_prescriptions$drug_name <- AMR::ab_name(drug_prescriptions$drug_code) I suspect the actual data will have more drugs to match. |
Going back to the > cdm$drug_exposure |> filter(drug_concept_id %in% drug_concepts$concept_id)
# A tibble: 2,400 × 6
drug_concept_id person_id drug_exposure_start_date drug_exposure_end_date drug_exposure_id drug_type_concept_id
<dbl> <int> <date> <date> <int> <dbl>
1 10 9 1987-03-04 2004-10-31 1 1
2 10 4 1997-04-16 2005-03-29 2 1
3 10 7 2014-03-09 2014-03-24 3 1
4 10 1 2004-05-16 2008-08-06 4 1
5 10 2 1999-11-17 2001-07-19 5 1
6 10 7 2014-03-13 2014-03-22 6 1
7 10 2 1999-10-06 2001-08-11 7 1
8 10 3 2015-01-26 2015-05-26 8 1
9 10 1 2002-04-04 2007-06-28 9 1
10 10 5 2012-05-13 2012-06-28 10 1
# ℹ 2,390 more rows
# ℹ Use `print(n = ...)` to see more rows
> cdm$drug_exposure
# A tibble: 2,400 × 6 Inspecting the distribution of the drug codes in the > cdm$drug_exposure |>
+ filter(drug_concept_id %in% drug_concepts$concept_id) |>
+ group_by(drug_concept_id) |>
+ summarise(count = n())
# A tibble: 6 × 2
drug_concept_id count
<dbl> <int>
1 10 400
2 11 400
3 12 400
4 13 400
5 14 400
6 19 400 |
I believe the next steps are to explore the data in one of the example datasets and try to use it to replicate the steps in the article on Validate and load electronic health records. Some patterns and problems are expected to arise from this process. Since we do not have access to the actual data yet, we are not aiming for a full solution yet. However a general understanding of the feasibility of matching the two data models is what we aim for now. As you might see above I started working with two data frames provided by RAMSES as examples for what the EHR data would look like. I think it is worth it to try to match the data structure of both data frames from OMOP data from either example dataset we have access to. Working with > cdm$concept |> filter(domain_id == 'Drug')
# A tibble: 6 × 10
concept_id concept_name domain_id vocabulary_id standard_concept concept_class_id concept_code valid_start_date
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 Adalimumab Drug RxNorm S Ingredient 1234 NA
2 11 Injection Drug OMOP NA Dose Form 1234 NA
3 12 ALIMENTARY TR… Drug ATC NA ATC 1st 1234 NA
4 13 Descendant dr… Drug RxNorm S Drug 1234 NA
5 14 Injectable Drug OMOP NA Dose Form 1234 NA
6 19 Other ingredi… Drug RxNorm S Ingredient 1234 NA
# ℹ 2 more variables: valid_end_date <chr>, invalid_reason <chr> Also we can see that the step of matching the drug names using the > drug_concepts |> select(concept_id,concept_name, drug_code)
# A tibble: 6 × 3
concept_id concept_name drug_code
<dbl> <chr> <ab>
1 10 Adalimumab NA
2 11 Injection NA
3 12 ALIMENTARY TRACT AND METABOLISM PPA
4 13 Descendant drug SLF9
5 14 Injectable NA
6 19 Other ingredient NA I am not sure if this step will be absolutely necessary but based on the note in the article is seems to be essential to extract the DDD.
|
I would suggest we try the same steps with the data from the big data open dataset. The concept table has 3,902,588 entries which holds much more potential than the omock dataset. @AngharadGreen Please refer to #15 on accessing the data. Please comment here if you face any issues with accessing the data |
I have put together this visualisation to map the OMOP tables to Ramses tables: |
I have found this website is very useful for understanding the OMOP CDM - https://ohdsi.github.io/CommonDataModel/cdm54.html |
@razekmh I have struggled trying to access the data from BigQuery public datasets but have found this https://github.com/OHDSI/ETL-CMS.git and I am working through the instructions to download the dataset as it's the same one on the BigQuery public datasets |
Mapping RAMSES fields to OMOP DRUG_EXPOSURE fieldsLoad the necessary libraries
Connect to local database for Ramses
Generate mock OMOP CDM data
Simulate RAMSES
|
I think this issue has served its purpose. We will have to split it to multiple issues, one per table from the validate article |
Edit: @razekmh is modifying the description of this issue to list out the task and our current understanding.
The aim of this issue is to gain a full understanding of how does OMOP data model translate to RAMSES data model.
A more specific target would be to add OMOP data to RAMSES such that we would be able to export a report of the Defined Daily Dose (DDD) per drug? per 1000 bed days for both prescription and administration per ward and per specialisation
Naturally, achieving this task requires us to have access to an accurate data model for each the origin and target standards. We will tackle each separately here:
RAMSES data model is partially described in the article on Objects and classes in Ramses
. The article does not explain how import the data into RAMSES. We can see a glimpse of the data expected by RAMSES in the article on Validate and load electronic health records. in the validation article we can see that RAMSES offers some tools to process the EHR data and prepare it to be consumed by RAMSES. Our understanding of what does RAMSES expects will develop as we replicate the steps in the article on OMOP data.
OMOP is a standard by itself which means we can read on the structure in the official website. However, the actual data structure we will work with depends on the implantation in UCLH. This is due to two facts:
We try approximate the expected data structure from OMOP using two sources:
Each dataset/tool has its own limitations but we expect that working with them will help us at least partially understand the OMOP mapping to RAMSES while we are waiting for the access to the UCLH OMOP data. Accessing these dataset is described in issue Create / Get access to mock OMOP data #15
Here we describe the mapping of OMOP Data tables into RAMSES DB. We identify the mapping required between various attributes and the missing data items that need to be derived.
The text was updated successfully, but these errors were encountered: