Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map OMOP data to inpatient_microbiology table #29

Open
razekmh opened this issue Sep 26, 2024 · 16 comments
Open

Map OMOP data to inpatient_microbiology table #29

razekmh opened this issue Sep 26, 2024 · 16 comments
Assignees
Labels
design Design, scoping and ToR work

Comments

@razekmh
Copy link

razekmh commented Sep 26, 2024

Extract data from the example OMOP data to fill the inpatient_microbiology table from the validate article. This is a split from #17

Please feel free to assign yourself to the issue. Please the respective branch for development.

@razekmh razekmh added the design Design, scoping and ToR work label Sep 26, 2024
@AngharadGreen AngharadGreen self-assigned this Sep 27, 2024
@AngharadGreen
Copy link

  • It has been difficult to map the simulated OMOP data I have been working with to the RAMSES inpatient_microbiology table.
  • There isn't a clear way of storing lab microbiology results in the OMOP CDM.
  • I have found this paper useful - https://pubmed.ncbi.nlm.nih.gov/35612079/
  • When we receive OMOP extract data from UCLH it will be useful to see how lab microbiology results are stored within the CDM.
  • I have been working on issue Map OMOP data to inpatient_episodes table  #28 instead

@zsenousy
Copy link

zsenousy commented Oct 9, 2024

  • It has been difficult to map the simulated OMOP data I have been working with to the RAMSES inpatient_microbiology table.
  • There isn't a clear way of storing lab microbiology results in the OMOP CDM.
  • I have found this paper useful - https://pubmed.ncbi.nlm.nih.gov/35612079/
  • When we receive OMOP extract data from UCLH it will be useful to see how lab microbiology results are stored within the CDM.
  • I have been working on issue Map OMOP data to inpatient_episodes table  #28 instead

@AngharadGreen, Would you like to discuss this further in a call?

@AngharadGreen
Copy link

  • It has been difficult to map the simulated OMOP data I have been working with to the RAMSES inpatient_microbiology table.
  • There isn't a clear way of storing lab microbiology results in the OMOP CDM.
  • I have found this paper useful - https://pubmed.ncbi.nlm.nih.gov/35612079/
  • When we receive OMOP extract data from UCLH it will be useful to see how lab microbiology results are stored within the CDM.
  • I have been working on issue Map OMOP data to inpatient_episodes table  #28 instead

@AngharadGreen, Would you like to discuss this further in a call?

Hi @zsenousy I have found some useful resources to help me with this mapping, I am just working through those today. I can meet tomorrow if there is a time that works for you?

@AngharadGreen
Copy link

AngharadGreen commented Oct 9, 2024

Following this paper https://pubmed.ncbi.nlm.nih.gov/35612079/ I will try to map the OMOP CDM to the inpatient_microbiology ramses table following this figure -
image

I have put together a summary presentation going over how the OMOP CDM can be mapped to the Ramses fata frame
Mapping of OMOP to RAMSES.pdf
image

image

@AngharadGreen
Copy link

AngharadGreen commented Dec 12, 2024

I now want to just pull out the relevant columns from the Merged_df_1 table created in code from #45 that represent the Ramses Inpatient_microbiology table to see if I can use the OMOP data to create the data frames required by Ramses

Ramses OMOP
patient_id person_id (Observation table)
specimen_id specimen_id (Specimen table)
status All entries are NA so will ignore
specimen_type_display concept_name_specimen (Concept table)
specimen_datetime specimen_datetime (Specimen table)
organism_display_name concept_name_observation (Concept table)
isolate_id observation_id (Observation Table)
agent_display_name Antibiotics filtered from the Drug Exposure and Concept tables
rsi_code Measurement Table??
isolation_datetime observation_datetime (Observastion table)

I want to tidy up the code from issue #45 that produced the merged specimen and observation table, and also use the dplyr function left_join instead of merge following advice from colleagues:

library(arrow) ## to read in the parquet files

## Read in the required tables 
df_observation <- read_parquet("~/Ramses_extract_Nov2024/OBSERVATION.parquet")
df_specimen <- read_parquet("~/Ramses_extract_Nov2024/SPECIMEN.parquet")
df_concept <- read_parquet("~/data/omop-metadata/data/concept.parquet")

library(tidyverse)## for the left_join and filter functions

## I now want to create the observation table showing just the microbial_isolates as described previously
Joined_concept_observation <- left_join(df_observation, df_concept, by = c("observation_concept_id" = "concept_id"))

SNOMED_Organisms_Code <-read.csv("~/OMOP_RAMSES/SNOMED_Organisms.csv", header = TRUE, sep = ",")

Observation_SNOMED_Microorganisms <- filter(Joined_concept_observation, vocabulary_id == 'SNOMED',
                                            concept_code %in% SNOMED_Organisms_Code$SNOMED_Code)

## As these data frames are large, I will remove any older ones from the R environment as I go along 
rm(Joined_concept_observation)

## I will now pull out the person_ids, observation_ids and observation_date from Observation_SNOMED_Microorganisms as these will be used to do a join with the specimen table

subset_observation_df_1 <- subset(Observation_SNOMED_Microorganisms, select = c("person_id", "observation_id", "observation_date"))

## Join df_specimen with observation_Observation_SNOMED_Microorganisms_2 via person_id and observation_date and specimen_date

Specimen_joined_df_1 <- left_join(subset_observation_df_1, df_specimen, by=c("person_id" = "person_id"))

##Warning message: 
##In left_join(subset_observation_df_1, df_specimen, by = c(person_id = "person_id")) :
##Detected an unexpected many-to-many relationship between `x` and `y`.
##ℹ Row 1 of `x` matches multiple rows in `y`.
##ℹ Row 152 of `y` matches multiple rows in `x`.
##ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
## This warning message is to be expected as many-to-many relationships are to be expected so will silence this for future joins where there will be many-to-many relationships

## The specimen date needs to be before the observation date as observation date is the date a bacteria was isolated from a specimen, so a specimen would need to be collected before a bacteria is identified. 
## In the inpatient_microbiology table in Ramses the isolation_date (observation_date in OMOP) is always greater than the specimen_date
## I will now just filter to show the specimen information where the specimen_date is less than the observation_date - this was not done in #45 

Specimen_joined_df_2 <- Specimen_joined_df_1 %>%
  filter(specimen_date < observation_date) #this reduces the number of rows from 988252 to 536039

## As these data frames are large, I will remove any older ones from the R environment as I go along 
rm(subset_observation_df_1)
rm(Specimen_joined_df_1)

## Join df_2 with concept based on concept_id to retrieve the information about specimen type
Specimen_joined_df_3 <- left_join(Specimen_joined_df_2, df_concept, by=c("specimen_concept_id" = "concept_id"))

rm(Specimen_joined_df_2)

## In this join there are columns that just contain NA values so I want to remove these 
Specimen_joined_df_4 <- Filter(function(x)!all(is.na(x)), Specimen_joined_df_3) ## this reduces the columns from 26 to 19

rm(Specimen_joined_df_3)

# I only want the information relevant to specimen so I will check what the values are in the domain_id table
unique(Specimen_joined_df_4$domain_id) ##[1] "Metadata" "Specimen"

## I now want to remove any rows related to metadata in this data frame 
Specimen_joined_df_5 <- filter(Specimen_joined_df_4, domain_id == 'Specimen')## this reduces the number of rows from 535039 to 527611

rm(Specimen_joined_df_4)

## I will now check if metadata occurs in the domain_id column of Observation_SNOMED_Microorganisms_6
unique(Specimen_joined_df_5$domain_id) ## [1] "Specimen

## I now want to rename the column called concept_name in this data frame to specimen_type_display as information in this column shows type of specimen collected
## I also want to rename the column concept_code to SNOMED_specimen_code as these are the SNOMED codes for specimen type
## I also want to rename valid_start_date and valid_end_date to valid_start_date_specimen and valid_end_date_specimen so these can be distinguished in future joins


Specimen_joined_df_6 <- Specimen_joined_df_5%>% rename(specimen_type_display = concept_name, 
                       SNOMED_specimen_code = concept_code,
                       valid_start_date_specimen = valid_start_date, 
                       valid_end_date_specimen = valid_end_date)

rm(Specimen_joined_df_5)

## Before doing any further joins I want to tidy up the Observation_SNOMED_Microorganisms data frame
## Lets get rid of any columns that only contain NA values in Observation_SNOMED_Microorganisms
Observation_SNOMED_Microorganisms_1 <- Filter(function(x)!all(is.na(x)), Observation_SNOMED_Microorganisms) ## this reduces the columns from 27 to 20

rm(Observation_SNOMED_Microorganisms)
## lets rename the concept_name column in Observation_SNOMED_Microorganisms_1 to organism_display_name as it shows the information about the type of microbe detected
Observation_SNOMED_Microorganisms_2 <- Observation_SNOMED_Microorganisms_1%>% rename(organism_display_name = concept_name)

rm(Observation_SNOMED_Microorganisms_1)

## I only want to keep the relevant columns within the data frame 
str(Observation_SNOMED_Microorganisms_2)

Observation_SNOMED_Microorganisms_3 <- subset(Observation_SNOMED_Microorganisms_2, select = c("person_id","observation_date","observation_datetime","visit_occurrence_id","unit_source_value","organism_display_name","concept_code","valid_start_date","valid_end_date"))

rm(Observation_SNOMED_Microorganisms_2)

## I want to rename the column concept_code in Observation_SNOMED_Microorganisms_2 to SNOMED_microorganism_code as these are the SNOMED codes for microorganisms
## I also want to rename valid_start_date and valid_end_date to valid_start_date_observation and valid_end_date_observation so these can be distinguished in future merges

Observation_SNOMED_Microorganisms_4 <- Observation_SNOMED_Microorganisms_3%>% rename(SNOMED_microorganism_code = concept_code,
                                                                                     valid_start_date_observation = valid_start_date,
                                                                                     valid_end_date_observation = valid_end_date)
rm(Observation_SNOMED_Microorganisms_3)

## I will also tidy up Specimen_joined_df_6
str(Specimen_joined_df_6)

## I only want to keep the relevant columns in df_7

Specimen_joined_df_7 <- subset(Specimen_joined_df_6, select = c("specimen_concept_id","person_id","observation_id","observation_date","specimen_date","specimen_id","specimen_type_concept_id","specimen_datetime","disease_status_concept_id","specimen_type_display","SNOMED_specimen_code","valid_start_date_specimen","valid_end_date_specimen"))

rm(Specimen_joined_df_6)

## I will now join Specimen_joined_df_7 with Observation_SNOMED_Microorganisms_4 table based on just person_id
Joined_df_1 <- left_join(Observation_SNOMED_Microorganisms_4, Specimen_joined_df_7, by=c("person_id" = "person_id"), relationship = "many-to-many") ## Compared to the resulting merged table in issue #45 which had 10629195 rows, this new join, which only contains specimen info where the date is less than the observation date, has 5936419 rows, this is 4692776 rows less than in issue #45
 


## I ran the script from issue #45 and then checked the number of unique person_ids = length(unique(Merged_df_1$person_id))
## results in [1] 1141

## Lets check how many  unique person_ids are present in Joined_df_1 
length(unique(Joined_df_1$person_id))
## results in [1] 1141 which is the same as merged table in #45 

##I want to just pull out the relevant columns from the Joined_df_1 that represent the Ramses Inpatient_microbiology table but also include the unit_source_value column from observation as this give useful information about bacterial growth
## In issue #45 I used the datetime columns from the observation and specimen tables, but inm the Ramses Inpatient_microbiology table the columns are called datetime but only contain the date info, so I am just going to use the date columns from OMOP observation and specimen

Joined_df_2 <- subset(Joined_df_1, select = c("person_id","specimen_id","specimen_type_display","specimen_date","unit_source_value","organism_display_name","SNOMED_microorganism_code","observation_id","observation_date.x"))

rm(Joined_df_1)
## lets rename these columns to match the Ramses Inpatient_microbiology table 

Joined_df_3 <- Joined_df_2%>% rename(patient_id = person_id,
                                     isolate_id = observation_id,
                                     isolation_datetime = observation_date.x,
                                     specimen_datetime = specimen_date)

## Lets add in the columns status, agent_display_name and rsi_code and fill them with NA as these have not been mapped to the OMOP data yet

Joined_df_3['status'] <- NA
Joined_df_3['agent_display_name'] <- NA
Joined_df_3['rsi_code'] <- NA

rm(Joined_df_2)

@AngharadGreen
Copy link

AngharadGreen commented Dec 18, 2024

I will now attempt to load the table Joined_df_3 into Ramses

library(Ramses)
library(ggplot2)
library(dplyr)

microbiology_data <- list()
microbiology_data$raw <- Joined_df_3
microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(
    organism_code = AMR::as.mo(if_else(
      organism_display_name == "No growth",
      NA_character_,
      organism_display_name)),
    agent_code = AMR::as.ab(agent_display_name)) %>% 
  mutate(organism_name = AMR::mo_name(organism_code),
         agent_name = AMR::ab_name(agent_code))

This resulted in the following output:

Image

> microbiology_data$raw <- microbiology_data$raw %>% 
+   mutate(specimen_type_code = case_when(
+     specimen_type_display == "Blood Culture" ~ 
+       "446131002", # Blood specimen obtained for blood culture
+     specimen_type_display == "Faeces" ~ 
+       "119339001", # Stool specimen
+     specimen_type_display == "MRSA Screen" ~ 
+       "697989009", # Anterior nares swab 
+     specimen_type_display == "Urine" ~ 
+       "122575003", # Urine specimen
+     TRUE ~ NA_character_
+   )) %>% 
+   left_join(transmute(reference_specimen_type,
+                       specimen_type_code = conceptId,
+                       specimen_type_name = pt_term))
Joining with `by = join_by(specimen_type_code)`

## Lets try this again but remove reference to Agent_name and also use SNOMED_microorganism_code to get the organism_name with the AMR package as this column contains the relevant SNOMED codes 
microbiology_data$raw <- Joined_df_3
microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(organism_name = AMR::mo_name(SNOMED_microorganism_code))

microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(specimen_type_code = case_when(
    specimen_type_display == "Blood Culture" ~ 
      "446131002", # Blood specimen obtained for blood culture
    specimen_type_display == "Faeces" ~ 
      "119339001", # Stool specimen
    specimen_type_display == "MRSA Screen" ~ 
      "697989009", # Anterior nares swab 
    specimen_type_display == "Urine" ~ 
      "122575003", # Urine specimen
    TRUE ~ NA_character_
  )) %>% 
  left_join(transmute(reference_specimen_type,
                      specimen_type_code = conceptId,
                      specimen_type_name = pt_term))


microbiology_data$specimens <- microbiology_data$raw %>% 
  transmute(specimen_id,
            patient_id,
            status = "available",
            specimen_datetime,
            specimen_type_code,
            specimen_type_name,
            specimen_type_display) %>% 
  distinct() # Removing duplicates created by multiple isolates and susceptibility testing

microbiology_data$isolates <- microbiology_data$raw %>% 
  transmute(isolate_id,
            specimen_id,
            patient_id,
            organism_code = SNOMED_microorganism_code,
            organism_name,
            organism_display_name,
            isolation_datetime) %>% 
  distinct() # Removing duplicates created by susceptibility testing

The two data frames specimens = microbiology_data$specimens and isolates = microbiology_data$isolates have now been created from the OMOP mapping

I now need to find the relevant data to create the data frame susceptibilities = microbiology_data$susceptibilities)

Information on RSI code:

S - Susceptible, standard dosing regimen: A microorganism is categorised as "Susceptible, standard dosing regimen", when there is a high likelihood of therapeutic success using a standard dosing regimen of the agent.
I - Susceptible, increased exposure*: A microorganism is categorised as "Susceptible, Increased exposure*" when there is a high likelihood of therapeutic success because exposure to the agent is increased by adjusting the dosing regimen or by its concentration at the site of infection.
R - Resistant: A microorganism is categorised as "Resistant" when there is a high likelihood of therapeutic failure even when there is increased exposure.

@AngharadGreen
Copy link

AngharadGreen commented Dec 18, 2024

  • We now have a data frame from OMOP where we have mapped 7 out of the 9 relevant columns for the Ramses Inpatient_microbiology table .
  • We now need to find the relevant information in OMOP that relates to agent_display_name and rsi_code
## I now want to identify where I can find information about agent display name and RSI code within the OMOP extract

## I will have a look at the drug_expsure table and identify how we can pull information about the drugs prescribed and map to the information we have in the Inpatient_microbiology table

## Lets read in the drug_exposure table
df_drug_exposure <- read_parquet("~/Ramses_extract_Nov2024/DRUG_EXPOSURE.parquet")

## I will carry out a join of the Joined_df_3 table with the drug_exposure table via the person_id 
Drug_Joined_df_3 <- left_join(Joined_df_3 , df_drug_exposure, by=c("patient_id" = "person_id")) ##this keeps causing R to crash

## I will look for a way to filter out just those person_id's that are present in the inpatient_microbiology table from the drug exposure table

@AngharadGreen
Copy link

AngharadGreen commented Jan 22, 2025

I used the following code to filter out just the antibiotics from the OMOP extract data

library(dplyr)
library(dbplyr)
library(arrow) ## to read in the parquet files

## Read in the required tables 
df_concept <- read_parquet("~/data/omop-metadata/data/concept.parquet")
df_concept_ancestor <- read_parquet("~/data/omop-metadata/data/concept_ancestor.parquet")
df_drug_exposure <- read_parquet("~/Ramses_extract_Nov2024/DRUG_EXPOSURE.parquet")

# Filter for concepts related to antibiotics using ATC code "J01"
antibiotics_concepts <- df_concept %>%
  filter(vocabulary_id == "ATC" & grepl("^J01", concept_code)) %>%
  select(concept_id, concept_name)

# Get descendants of antibiotics
antibiotic_descendants <- df_concept_ancestor %>%
  inner_join(df_concept, by = c("descendant_concept_id" = "concept_id")) %>%
  filter(ancestor_concept_id %in% antibiotics_concepts$concept_id) %>%
  select(descendant_concept_id, concept_name)

# Join with drug_exposure to get exposure records
drug_exposure_anti <- df_drug_exposure %>%
inner_join(antibiotic_descendants, by = c("drug_concept_id" = "descendant_concept_id")) %>%
 select(person_id, drug_exposure_start_date, drug_concept_id, concept_name)

## Write this output as a .csv file so it can be read in again
write.csv(drug_exposure_anti, "drug_exposure_anti.csv", row.names = FALSE)

I will now use this filtered antibiotic data and join it with the Joined_df_3 table

## I will carry out a join of the Joined_df_3 table with the drug_exposure_anti df via the person_id 
Drug_Joined_df_3 <- left_join(Joined_df_3 , drug_exposure_anti, by=c("patient_id" = "person_id")) ##this keeps causing R to crash

# I will now Sample a random subset of rows from Joined_df_3  (e.g., 1000 rows)
Joined_df_3_reduced <- Joined_df_3 %>%
  sample_n(1000)  # Adjust the number as needed 

## I will now re-run this join with the reduced dataframe and the drug_exposure_anti df
Drug_Joined_df_3 <- left_join(Joined_df_3_reduced , drug_exposure_anti, by=c("patient_id" = "person_id")) 

##This join worked with the reduced size, I will now tidy up this dataframe and change concept_name to agent display name 

Drug_Joined_df_4 <- subset(Drug_Joined_df_3, select = c("patient_id", "specimen_id", "specimen_type_display", "specimen_datetime", "unit_source_value","organism_display_name","SNOMED_microorganism_code","isolate_id", "isolation_datetime", "status","concept_name", "rsi_code"))
Drug_Joined_df_5 <- Drug_Joined_df_4%>% rename(agent_display_name = concept_name)

@AngharadGreen
Copy link

AngharadGreen commented Jan 22, 2025

I will now use the Ramses package and try to load in the Drug_Joined_df_5 table:

library(Ramses)
library(ggplot2)
library(dplyr)

microbiology_data <- list()
microbiology_data$raw <- Drug_Joined_df_5
microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(
    organism_code = AMR::as.mo(if_else(
      organism_display_name == "No growth",
      NA_character_,
      organism_display_name)),
    agent_code = AMR::as.ab(agent_display_name)) %>% 
  mutate(organism_name = AMR::mo_name(organism_code),
         agent_name = AMR::ab_name(agent_code))

This is the output message in the R console:

Image

  • This is progress as the Ramses package has found the information about the antibiotics pulled from OMOP and stored in the column agent_display_name
  • It does not seem to like Methicillin resistant Staphylococcus aureus - I am going to change that to MRSA within the data frame and see if that helps.
## Lets change to MRSA
Drug_Joined_df_5$organism_display_name <- ifelse(Drug_Joined_df_5$organism_display_name == "Methicillin resistant Staphylococcus aureus", "MRSA", Drug_Joined_df_5$organism_display_name)

## Lets rerun the Ramses command
microbiology_data <- list()
microbiology_data$raw <- Drug_Joined_df_5
microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(
    organism_code = AMR::as.mo(if_else(
      organism_display_name == "No growth",
      NA_character_,
      organism_display_name)),
    agent_code = AMR::as.ab(agent_display_name)) %>% 
  mutate(organism_name = AMR::mo_name(organism_code),
         agent_name = AMR::ab_name(agent_code))

Output message in the R console:
Image

microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(specimen_type_code = case_when(
    specimen_type_display == "Blood Culture" ~ 
      "446131002", # Blood specimen obtained for blood culture
    specimen_type_display == "Faeces" ~ 
      "119339001", # Stool specimen
    specimen_type_display == "MRSA Screen" ~ 
      "697989009", # Anterior nares swab 
    specimen_type_display == "Urine" ~ 
      "122575003", # Urine specimen
    TRUE ~ NA_character_
  )) %>% 
  left_join(transmute(reference_specimen_type,
                      specimen_type_code = conceptId,
                      specimen_type_name = pt_term))

Output message in the R console:
Image

  • This part of the code has worked as well

I will now proceed to follow the code that creates the three separate data frames:

microbiology_data$specimens <- microbiology_data$raw %>% 
  transmute(specimen_id,
            patient_id,
            status = "available",
            specimen_datetime,
            specimen_type_code,
            specimen_type_name,
            specimen_type_display) %>% 
  distinct() # Removing duplicates created by multiple isolates and susceptibility testing

microbiology_data$isolates <- microbiology_data$raw %>% 
  transmute(isolate_id,
            specimen_id,
            patient_id,
            organism_code,
            organism_name,
            organism_display_name,
            isolation_datetime) %>% 
  distinct() # Removing duplicates created by susceptibility testing

microbiology_data$susceptibilities <- microbiology_data$raw %>% 
  filter(!is.na(organism_code)) %>%  # Remove no growth
  transmute(isolate_id,
            specimen_id,
            patient_id,
            organism_code,
            organism_name,
            organism_display_name,
            agent_code,
            agent_name,
            agent_display_name,
            rsi_code,
            concept_code = NA_character_) %>% 
  distinct()

There were no error messages output with this code so i will now continue to validate the data:

validate_microbiology(specimens = microbiology_data$specimens,
                      isolates = microbiology_data$isolates,
                      susceptibilities = microbiology_data$susceptibilities)

Image

  • This error suggests there can only be unique values in the isolates dataset for isolate_id

  • The warning message suggests there can be no NA values in the agent_display_name for susceptibilities

  • In the Ramses inpatient_microbiology table there are repeated isolate_id's in the isoalte_id column and also NA values in the agent_display_name column, so I am going to explore what happens when I use this table in the analysis

@AngharadGreen
Copy link

AngharadGreen commented Jan 22, 2025

  • This is the resulting isolate table when the Ramses inpatient_microbiology data is used:
Image
  • I think the problem with the current microbiology data that has been pulled from OMOP is that the same isolate_id has been mapped to different organisms, I will look into why this has happened and get this sorted - I think it has happened when joining the Isolate information with the specimen information. I will explore the code used for this and try to ensure that each isolate_id/observation_id only matches to just one microorganism

@AngharadGreen
Copy link

AngharadGreen commented Jan 27, 2025

Updated code to ensure that only one organism_display_name is mapped to each unique observation_id/isolate_id

library(arrow) ## to read in the parquet files

## Read in the required tables 
df_observation <- read_parquet("~/Ramses_extract_Nov2024/OBSERVATION.parquet")
df_specimen <- read_parquet("~/Ramses_extract_Nov2024/SPECIMEN.parquet")
df_concept <- read_parquet("~/data/omop-metadata/data/concept.parquet")

library(tidyverse)## for the inner_join and filter functions

## I now want to filter the observation table to just show the microbial_isolates as described previously
Joined_concept_observation <- inner_join(df_observation, df_concept, by = c("observation_concept_id" = "concept_id"))

SNOMED_Organisms_Code <-read.csv("~/OMOP_RAMSES/SNOMED_Organisms.csv", header = TRUE, sep = ",")

Observation_SNOMED_Microorganisms <- filter(Joined_concept_observation, vocabulary_id == 'SNOMED',
                                            concept_code %in% SNOMED_Organisms_Code$SNOMED_Code)

## As these data frames are large, I will remove any older ones from the R environment as I go along 
rm(Joined_concept_observation)

## Join df_specimen with Observation_SNOMED_Microorganisms via person_id 

Specimen_Observ_df_1 <- inner_join(Observation_SNOMED_Microorganisms, df_specimen, by=c("person_id" = "person_id"), suffix = c("_Observ", "_Specimen"))

##Warning message:
##In inner_join(Observation_SNOMED_Microorganisms, df_specimen, by = c(person_id = "person_id")) :
##  Detected an unexpected many-to-many relationship between `x` and `y`.
##ℹ Row 1 of `x` matches multiple rows in `y`.
##ℹ Row 152 of `y` matches multiple rows in `x`.
##ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning

## This warning message is to be expected as many-to-many relationships are to be expected so will silence this for future joins where there will be many-to-many relationships

## The specimen date needs to be before the observation date as the observation date is the date a bacteria was isolated from a specimen, so a specimen would need to be collected before a bacteria is identified. 
## In the inpatient_microbiology table in Ramses the isolation_date (observation_date in OMOP) is always greater than the specimen_date

## I will now just filter to show the specimen information where the specimen_date is less than the observation_date - this was not done in #45 

Specimen_Observ_df_2 <- Specimen_Observ_df_1 %>%
  filter(specimen_date < observation_date) #this reduces the number of rows from 988252 to 536039

## As these data frames are large, I will remove any older ones from the R environment as I go along 
rm(Specimen_Observ_df_1)

## Join Specimen_Observ_df_2 with the concept table based on concept_id to retrieve the information about specimen type (the information from the observation and specimen concept tables will be joined so will use suffix = c("_Observ", "_Specimen"))
Specimen_Observ_df_3 <- inner_join(Specimen_Observ_df_2, df_concept, by=c("specimen_concept_id" = "concept_id"), suffix = c("_Observ", "_Specimen"))

rm(Specimen_Observ_df_2)

## In this join there are columns that just contain NA values so I want to remove these 
Specimen_Observ_df_4 <- Filter(function(x)!all(is.na(x)), Specimen_Observ_df_3) ## this reduces the columns from 50 to 36

rm(Specimen_Observ_df_3)

# I only want the information relevant to specimen so I will check what the values are in the domain_id table
unique(Specimen_Observ_df_4$domain_id_Specimen) ##[1] "Metadata" "Specimen"

## I now want to remove any rows related to metadata in this data frame 
Specimen_Observ_df_5 <- filter(Specimen_Observ_df_4, domain_id_Specimen == 'Specimen')## this reduces the number of rows from 535039 to 527611

rm(Specimen_Observ_df_4)

## I will now check if metadata occurs in the domain_id column of Observation_SNOMED_Microorganisms_6
unique(Specimen_Observ_df_5$domain_id_Specimen) ## [1] "Specimen

## I now want to rename the column called concept_name_Specimen in this join to specimen_type_display as information in this column shows type of specimen collected
## I also want to rename the column concept_code_Specimen to SNOMED_specimen_code as these are the SNOMED codes for specimen type

Specimen_Observ_df_6 <- Specimen_Observ_df_5%>% rename(specimen_type_display = concept_name_Specimen, 
                                                       SNOMED_specimen_code = concept_code_Specimen)
                                                       

rm(Specimen_Observ_df_5)

## Before doing any further joins I will tidy up the joined data frame

## lets rename the concept_name_Observ column to organism_display_name as it shows the information about the type of microbe detected
Specimen_Observ_df_7 <- Specimen_Observ_df_6%>% rename(organism_display_name = concept_name_Observ)

rm(Specimen_Observ_df_6)

## I only want to keep the relevant columns within the data frame 
colnames(Specimen_Observ_df_7)

Specimen_Observ_df_8 <- subset(Specimen_Observ_df_7, select = c("person_id","observation_date","observation_datetime", "observation_id", "observation_concept_id", "visit_occurrence_id","unit_source_value_Observ","organism_display_name","concept_code_Observ","valid_start_date_Observ","valid_end_date_Observ","specimen_date","specimen_id","specimen_concept_id","specimen_type_concept_id","specimen_datetime","specimen_type_display","SNOMED_specimen_code","valid_start_date_Specimen","valid_end_date_Specimen"))

rm(Specimen_Observ_df_7)

## I want to rename the column concept_code_Observ in Observation_SNOMED_Microorganisms_2 to SNOMED_microorganism_code as these are the SNOMED codes for microorganisms
Specimen_Observ_df_9 <- Specimen_Observ_df_8%>% rename(SNOMED_microorganism_code = concept_code_Observ)
                                                                                     
rm(Specimen_Observ_df_8)
## I now want to check that in this join each unique observation_id only matches to one organism_display_name

# Function to check if observation_ids map consistently to only one organism_display_name

check_replicates_consistency <- function(df, col1, col2) {
  # Group by the first column and check for unique values in the second column
  violations <- df %>%
    group_by(!!sym(col1)) %>%
    summarize(
      unique_values = n_distinct(!!sym(col2)),
      .groups = "drop"
    ) %>%
    filter(unique_values > 1) # Identify groups with more than one unique value
  
  # Return results
  if (nrow(violations) == 0) {
    return("All observation_ids map consistently to one organism_display_name")
  } else {
    return(list(
      message = "Some observation_ids map to different organism_display_names",
      violations = violations
    ))
  }
}

## lets run the function on the Specimen_Observ_df_9 table
check_replicates_consistency(Specimen_Observ_df_9, "observation_id", "organism_display_name") ## Results = [1] "All observation_ids map consistently to one organism_display_name"

  
##I want to just pull out the relevant columns from the Joined_df_1 that represent the Ramses Inpatient_microbiology table but also include the unit_source_value column from observation as this gives useful information about bacterial growth
## In issue #45 I used the datetime columns from the observation and specimen tables, but inm the Ramses Inpatient_microbiology table the columns are called datetime but only contain the date info, so I am just going to use the date columns from OMOP observation and specimen

colnames(Specimen_Observ_df_9)

##I want to just pull out the relevant columns from the Joined_df_1 that represent the Ramses Inpatient_microbiology table but also include the unit_source_value column from observation as this gives useful information about bacterial growth
## In issue #45 I used the datetime columns from the observation and specimen tables, but in the Ramses Inpatient_microbiology table the columns are called datetime but only contain the date info, so I am just going to use the date columns from OMOP observation and specimen

Specimen_Observ_df_10 <- subset(Specimen_Observ_df_9, select = c("person_id","specimen_id","specimen_type_display","specimen_date","unit_source_value_Observ","organism_display_name","SNOMED_microorganism_code","observation_id","observation_date"))

rm(Specimen_Observ_df_9)
## lets rename these columns to match the Ramses Inpatient_microbiology table 

Specimen_Observ_df_11 <- Specimen_Observ_df_10%>% rename(patient_id = person_id,
                                     isolate_id = observation_id,
                                     isolation_datetime = observation_date,
                                     specimen_datetime = specimen_date)

rm(Specimen_Observ_df_10)

## Run the check_replicates_consistency function again 

check_replicates_consistency(Specimen_Observ_df_11, "isolate_id", "organism_display_name") ## Results = [1] "All observation_ids map consistently to one organism_display_name"


## Lets add in the columns "status", "agent_display_name" and "rsi_code" and fill them with NA as these have not been mapped to the OMOP data yet

Specimen_Observ_df_11['status'] <- NA
Specimen_Observ_df_11['agent_display_name'] <- NA
Specimen_Observ_df_11['rsi_code'] <- NA

@AngharadGreen
Copy link

I will now join Specimen_Observ_df_11 with the drug_exposure_anti data frame

## Read in drug_exposure_anti data frame made previously 

drug_exposure_anti <- read.csv("drug_exposure_anti.csv") 

# I will now Sample a random subset of rows from Specimen_Observ_df_11  (e.g., 1000 rows)
Specimen_Observ_df_11_reduced <- Specimen_Observ_df_11 %>%
  sample_n(1000)  # Adjust the number as needed 

## I will now re-run this join with the reduced dataframe and the drug_exposure_anti df
Drug_Specimen_Observ_df_1 <- inner_join(Specimen_Observ_df_11_reduced , drug_exposure_anti, by=c("patient_id" = "person_id")) 

##This join worked with the reduced size, I will now tidy up this dataframe and change concept_name to agent display name 

Drug_Specimen_Observ_df_2 <- subset(Drug_Specimen_Observ_df_1, select = c("patient_id", "specimen_id", "specimen_type_display", "specimen_datetime", "unit_source_value_Observ","organism_display_name","SNOMED_microorganism_code","isolate_id", "isolation_datetime", "status","concept_name", "rsi_code"))
Drug_Specimen_Observ_df_3 <- Drug_Specimen_Observ_df_2%>% rename(agent_display_name = concept_name)

## Change Methicillin resistant Staphylococcus aureus to MRSA to stop error when loading into Ramses
Drug_Specimen_Observ_df_3$organism_display_name <- ifelse(Drug_Specimen_Observ_df_3$organism_display_name == "Methicillin resistant Staphylococcus aureus", "MRSA", Drug_Specimen_Observ_df_3$organism_display_name)

## As a new join has taken place, I will now re-run the check_replicates_consistency function again 

check_replicates_consistency(Drug_Specimen_Observ_df_3, "isolate_id", "organism_display_name") ## Results = [1] "All observation_ids map consistently to one organism_display_name"

This has produced a table that contains the information about isolate, organism name, specimen and drug

@AngharadGreen
Copy link

AngharadGreen commented Jan 27, 2025

I will now load this data frame into Ramses

library(Ramses)
library(ggplot2)
library(dplyr)

microbiology_data <- list()
microbiology_data$raw <- Drug_Specimen_Observ_df_3
microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(
    organism_code = AMR::as.mo(if_else(
      organism_display_name == "No growth",
      NA_character_,
      organism_display_name)),
    agent_code = AMR::as.ab(agent_display_name)) %>% 
  mutate(organism_name = AMR::mo_name(organism_code),
         agent_name = AMR::ab_name(agent_code))

Output:
Image

microbiology_data$raw <- microbiology_data$raw %>% 
  mutate(specimen_type_code = case_when(
    specimen_type_display == "Blood Culture" ~ 
      "446131002", # Blood specimen obtained for blood culture
    specimen_type_display == "Faeces" ~ 
      "119339001", # Stool specimen
    specimen_type_display == "MRSA Screen" ~ 
      "697989009", # Anterior nares swab 
    specimen_type_display == "Urine" ~ 
      "122575003", # Urine specimen
    TRUE ~ NA_character_
  )) %>% 
  left_join(transmute(reference_specimen_type,
                      specimen_type_code = conceptId,
                      specimen_type_name = pt_term))

Output: Joining with by = join_by(specimen_type_code)

## The three data frames can now be created.

## Specimens

microbiology_data$specimens <- microbiology_data$raw %>% 
  transmute(specimen_id,
            patient_id,
            status = "available",
            specimen_datetime,
            specimen_type_code,
            specimen_type_name,
            specimen_type_display) %>% 
  distinct() # Removing duplicates created by multiple isolates and susceptibility testing

## Isolates

microbiology_data$isolates <- microbiology_data$raw %>% 
  transmute(isolate_id,
            specimen_id,
            patient_id,
            organism_code,
            organism_name,
            organism_display_name,
            isolation_datetime) %>% 
  distinct() # Removing duplicates created by susceptibility testing

## Susceptibilities

microbiology_data$susceptibilities <- microbiology_data$raw %>% 
  filter(!is.na(organism_code)) %>%  # Remove no growth
  transmute(isolate_id,
            specimen_id,
            patient_id,
            organism_code,
            organism_name,
            organism_display_name,
            agent_code,
            agent_name,
            agent_display_name,
            rsi_code,
            concept_code = NA_character_) %>% 
  distinct()

## The data may now be validated, and loaded
validate_microbiology(specimens = microbiology_data$specimens,
                      isolates = microbiology_data$isolates,
                      susceptibilities = microbiology_data$susceptibilities)

Output: Error: The following variables must have unique values in the isolates dataset: isolate_id

The same output error has occurred, I now think this is due to there being multiple specimen_ids for each isolate_id:

View(microbiology_data$specimens)
View(microbiology_data$isolates)
View(microbiology_data$susceptibilities)  
View(microbiology_data$raw)

output for View(microbiology_data$isolates)
Image

I have run this function to check there are not multiple organism for each isolate_id:

check_replicates_consistency(microbiology_data$isolates, "isolate_id", "organism_display_name")
[1] "All observation_ids map consistently to one organism_display_name"

  • I have had a look at the Ramses inpatient_microbiology table and each isolate_id is only associated with one specimen_id number, whereas within the joined OMOP data each isolate_id can be associated with multiple specimen_id numbers .
  • I need to identify how to more specifically map each specimen_id number uniquely with a isoalte_id number.
  • The specimen and observation table were mapped only by person_id so if each person_id had multiple specimen_ids associated to it then these would have mapped to each of the unique observation_id associated with that patient_id

@AngharadGreen
Copy link

AngharadGreen commented Jan 27, 2025

I want to determine if there is any information related to antibiotic susceptibility within the measurement table

Discussions on this forum are useful to understand how microbiology lab data and susceptibility data can be stored within OMOP tables

## Exploring the concept table for terms that relate to susceptibility information
any(df_concept$concept_name == "Microbiology") # True

any(df_concept$concept_name == "Resistant") # True

any(df_concept$concept_name == "Sensitive") # True

any(df_concept$concept_name == "Susceptible") # True

any(df_concept$concept_name == "Intermediate") # True

any(df_concept$concept_name == "MIC") # True

any(df_concept$concept_name == "μg/mL") #NA

any(df_concept$concept_name == "AMR") #NA

any(df_concept$concept_name == "microgram per milliliter") #True

## I will now filter out the concept table based on the terms that resulted in a True output

values_to_filter <- c("Resistant", "Sensitive",  "Susceptible", "Intermediate", "MIC", "microgram per milliliter")
filtered_df_concept <- df_concept[df_concept$concept_name %in% values_to_filter, ] 

## I now want to search the OMOP tables to determine if the concept_ids for these terms can be found to understand how antibiotic susceptibility data is stored
values_to_search <- c(35806430, 45884178, 45881795, 36210475, 4043352, 4038110, 45878594, 4148441, 40343054, 4307105, 45769424, 8859)

result_measurement <- df_measurement[apply(df_measurement, 1, function(row) any(row %in% values_to_search)), ] 

# No values found
  • I have found this information from Athena that provides more details on concept codes related to susceptibility: https://athena.ohdsi.org/search-terms/terms/4032801
  • I will explore this more to see if these concept codes are within the measurement table in our OMOP extract.
  • There are 75 concept_ids related to susceptibility from Athena, these were read into R and the concept table was filtered based on the concept_id as follows:
## Read in the list of concept_ids that are related to antimicrobial susceptibility from Athena
Antimicrobial_susceptibility_codes <- read.csv("Antimicrobial_susceptibility_codes.csv")

## Read in concept table
df_concept <- read_parquet("~/data/omop-metadata/data/concept.parquet")

## Filter concept table based on Athena susceptibility concept_ids

Antimicrobial_susceptibility_concept <- filter(df_concept, 
                                               concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)

Out of the 75 cocnept_ids 43 were identified within the concept table
Output:
Image

unique(Antimicrobial_susceptibility_concept$domain_id)
[1] "Measurement" "Observation" "Specimen" "Drug" "Metadata"

I will now filter the other OMOP tables using these codes:


## Read in other OMOP tables to check if these concept_ids can be found anywhere else

df_measurement <- read_parquet("~/Ramses_extract_Nov2024/MEASUREMENT.parquet")
df_observation <- read_parquet("~/Ramses_extract_Nov2024/OBSERVATION.parquet")
df_specimen <- read_parquet("~/Ramses_extract_Nov2024/SPECIMEN.parquet")
df_condition_occurrence <- read_parquet("~/Ramses_extract_Nov2024/CONDITION_OCCURRENCE.parquet")
df_visit_occurrence <- read_parquet("~/Ramses_extract_Nov2024/VISIT_OCCURRENCE.parquet")
df_Procedure_occurrence <- read_parquet("~/Ramses_extract_Nov2024/PROCEDURE_OCCURRENCE.parquet")

## Filter concept table based on Athena susceptibility concept_ids


Antimicrobial_susceptibility_measurement <- filter(df_measurement,
                                                   measurement_concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)

Antimicrobial_susceptibility_observation <- filter(df_observation,
                                                   observation_concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)

Antimicrobial_susceptibility_specimen <- filter(df_specimen,
                                                   specimen_concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)

Antimicrobial_susceptibility_condition_occurrence <- filter(df_condition_occurrence,
                                                   condition_concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)

Antimicrobial_susceptibility_visit_occurrence <- filter(df_visit_occurrence,
                                                            visit_concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)
Antimicrobial_susceptibility_Procedure_occurrence <- filter(df_Procedure_occurrence,
                                                        procedure_concept_id %in% Antimicrobial_susceptibility_codes$Concept_ID)

Output:

Image

  • These concept_ids were not found in any table except for one in the specimen table =

Image

@AngharadGreen
Copy link

AngharadGreen commented Jan 31, 2025

Improving the mapping of the specimen table to the observation table

  • I will now look to see if I can increase the specificity of the mapping between the specimen and observation table
  • The specimen and observation table were mapped only by person_id so if each person_id had multiple specimen_ids associated to it then these would have mapped to each of the unique observation_id associated with that patient_id
  • I have found this table on Athena - https://athena.ohdsi.org/search-terms/terms?query=bacteria+&boosts&page=1
  • This provides all the concept_ids that are related to bacteria, I will search the current OMOP extract tables to check if these concept_ids are present, there are 7932 concept_ids to check
## Read in the list of Athen bacteria codes as a dataframe in R
Athena_bacteria_codes <- read.csv("Athena_Bacteria_codes.csv")

library(arrow) ## to read in the parquet files ## Read in the OMOP tables to check 

df_observation <- read_parquet("~/Ramses_extract_Nov2024/OBSERVATION.parquet")
df_specimen <- read_parquet("~/Ramses_extract_Nov2024/SPECIMEN.parquet")
df_concept <- read_parquet("~/data/omop-metadata/data/concept.parquet")
df_measurement <- read_parquet("~/Ramses_extract_Nov2024/MEASUREMENT.parquet")
df_condition_occurrence <- read_parquet("~/Ramses_extract_Nov2024/CONDITION_OCCURRENCE.parquet")
df_visit_occurrence <- read_parquet("~/Ramses_extract_Nov2024/VISIT_OCCURRENCE.parquet")
df_Procedure_occurrence <- read_parquet("~/Ramses_extract_Nov2024/PROCEDURE_OCCURRENCE.parquet")

# Filter each OMOP table based on the Athena bacteria codes 
Athena_bacteria_concept <- filter(df_concept, 
                                  concept_id %in% Athena_bacteria_codes$Id)

Athena_bacteria_observation <- filter(df_observation,
                                      observation_concept_id %in% Athena_bacteria_codes$Id)

Athena_bacteria_measurement <- filter(df_measurement,
                                      measurement_concept_id %in% Athena_bacteria_codes$Id)

Athena_bacteria_specimen <- filter(df_specimen,
                                   specimen_concept_id %in% Athena_bacteria_codes$Id)


Athena_bacteria_condition_occurence <- filter(df_condition_occurrence,
                                      condition_concept_id %in% Athena_bacteria_codes$Id)


Athena_bacteria_visit_occurence <- filter(df_visit_occurrence,
                                              visit_concept_id %in% Athena_bacteria_codes$Id)

Athena_bacteria_procedure_occurence <- filter(df_Procedure_occurrence,
                                          procedure_concept_id %in% Athena_bacteria_codes$Id)

Output:
Image

I ran the following R code to determine what columns are present when comparing observation to condition occurrence and specimen to condition occurrence:

> intersect(colnames(df_observation), colnames(df_condition_occurrence))
[1] "person_id"           "provider_id"         "visit_occurrence_id" "visit_detail_id"    
> intersect(colnames(df_specimen), colnames(df_condition_occurrence))
[1] "person_id"

Image

Key Relationships Between the Tables
Shared person_id → Both tables can be linked through person_id, allowing analysis of specimens collected from patients with specific conditions.
Shared visit_occurrence_id → If a specimen was collected during a hospital visit, it can be linked to a condition diagnosed in the same visit.
Temporal Relationship (specimen_datetime vs. condition_start_date) → If a specimen was collected around the time a condition was recorded, it may be related.
Concept Mapping (condition_concept_id ↔ specimen_concept_id) → Certain conditions are associated with specific specimen types (e.g., lung cancer with lung biopsy, sepsis with blood cultures).

  • I will now work to see if linking the three tables results in better mapping of observation_id to specimen_id

Lets explore the bacteria information that is stored within the condition_occurrence table:

> ## I now want to join the concept table to the Athena_bacteria_condition_occurrence data frame

> Joined_concept_Athena_bacteria_condition_occurence <- inner_join(Athena_bacteria_condition_occurence, df_concept, by = c("condition_concept_id" = "concept_id"))

> View(Joined_concept_Athena_bacteria_condition_occurence)

> unique(Joined_concept_Athena_bacteria_condition_occurence$concept_name)

 [1] "Pericarditis in bacterial diseases classified elsewhere"                                
 [2] "Other specified bacterial agents as the cause of diseases classified to other chapters" 
 [3] "Small bowel bacterial overgrowth syndrome"                                              
 [4] "Other bacterial infections of unspecified site"                                         
 [5] "Bacterial pneumonia, unspecified"                                                       
 [6] "Other specified bacterial intestinal infections"                                        
 [7] "Bacterial infection, unspecified"                                                       
 [8] "Bacteremia caused by Gram-negative bacteria"                                            
 [9] "Bacterial lower respiratory infection"                                                  
[10] "Acute bacterial otitis externa"                                                         
[11] "Infection due to Escherichia coli"                                                      
[12] "Extended spectrum beta-lactamase resistant bacterial infection"                         
[13] "Bacteremia due to Staphylococcus aureus"                                                
[14] "Bacterial arthritis"                                                                    
[15] "Bacterial arthritis of costotransverse joint"                                           
[16] "Bacterial peritonitis"                                                                  
[17] "Bacteremia"                                                                             
[18] "Bacterial infection caused by Klebsiella pneumoniae"                                    
[19] "Bacterial keratitis"                                                                    
[20] "Infection due to carbapenem resistant bacteria"                                         
[21] "Bacterial meningoencephalitis"                                                          
[22] "Bacteremia associated with intravascular line"                                          
[23] "Pneumonia due to other Gram-negative bacteria"                                          
[24] "Sepsis due to Gram negative bacteria"                                                   
[25] "Bacterial liver abscess"                                                                
[26] "Meningitis in bacterial diseases classified elsewhere"                                  
[27] "Bacteremia due to Methicillin resistant Staphylococcus aureus"                          
[28] "Bacteremia due to Salmonella"                                                           
[29] "Bacterial meningoencephalitis and meningomyelitis, not elsewhere classified"            
[30] "Bacteremia caused by Gram-positive bacteria"                                            
[31] "Infection due to ESBL bacteria"                                                         
[32] "Primary bacterial peritonitis"                                                          
[33] "Bacterial infection due to Pseudomonas"                                                 
[34] "Myositis in bacterial diseases classified elsewhere"                                    
[35] "Bacterial meningitis, unspecified"                                                      
[36] "Encephalitis, myelitis and encephalomyelitis in bacterial diseases classified elsewhere"
[37] "Other bacterial meningitis"                                                             
[38] "Bacterial infection due to Serratia"                                                    
[39] "Bacterial meningitis"                                                                   
[40] "Bacterial endocarditis"                                                                 
[41] "Bacterial pneumonia"  

  • There are 41 unique concept_names linked to bacteria in the condition_occurrence table, I am now going to try to link the observation and condition_occurrence tables

@AngharadGreen
Copy link

AngharadGreen commented Jan 31, 2025

I have noticed that in this OMOP extract there are tables for each of the OMOP tables called LINKS. I will explore these more:

> df_observation_links <- read_parquet("~/Ramses_extract_Nov2024/OBSERVATION_LINKS.parquet")
> View(df_observation_links)
> unique(df_observation_links$plugin_provenance)
[1] "flowsheet_custom_lists" "flowsheet_numeric"      "microbiology_isolates"  "problem_list"           "alcohol"                "smoking_socialfactor"  
[7] "resus_status"           "echo"                   "maternity_EDD"         

  • The information maybe relevant to understand what type of data is being recorded within the main OMOP tables - for example it stores "microbiology_isolates".
  • I will explore the other LINKS tables:
> unique(df_measurement_links$plugin_provenance)
[1] "flowsheet_custom_lists"  "flowsheet_numeric"       "flowsheet_scale_numeric" "labs_numeric"            "blood_pressure"         
> unique(df_condition_occurrence_links$plugin_provenance)
[1] "caboodle"         "echo"             "pregnancy_status"
> unique(df_specimen_links$plugin_provenance)
[1] "caboodle"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design, scoping and ToR work
Projects
None yet
Development

No branches or pull requests

3 participants