Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tufts] Step-by-Step GIS Vocabulary Package Usability Validation #386

Open
p-talapova opened this issue Feb 21, 2025 · 0 comments
Open

[Tufts] Step-by-Step GIS Vocabulary Package Usability Validation #386

p-talapova opened this issue Feb 21, 2025 · 0 comments
Assignees
Labels
documentation Requires (re) writing of documentation, no coding.

Comments

@p-talapova
Copy link
Collaborator

Step 1: Define the validation objectives

  • What are we testing? (e.g., does the GIS Vocabulary cover all key environmental exposure concepts that I need?)
  • What decisions will the results support? (e.g., integrating GIS exposure data into OMOP analytics)

Step 2: Acquire GIS data

  • Download GIS dataset (e.g., EJI). Consider complementary GIS sources (e.g., EPA air pollution data).
  • Ensure it has a unique spatial identifier, e.g., GEOID (for census tract), lat/long, or zip code.
  • Preprocess the dataset, e.g., convert all spatial units (ZIP+4 → Census Tract).

Step 3: Identify patient records matching the GIS data

  • extract patient location data from the OMOP location table (person_id, location_id, state, county, zip, latitude, longitude)
  • If GEOID (EJI) = location_source_value (OMOP) → Direct Match. If (state, county, zip) in OMOP = (STATEFP, COUNTYFP, GEOID) in EJI → Census Tract Match. If ZIP only available → Crosswalk ZIP → Census Tract using geospatial reference datasets. If Only lat/lon available → Use reverse geocoding.
  • Check for address consistency over time (patients may move, the location_history (OMOP CDM v.6.0) table may be needed).

Step 4: Map GIS variables to the OMOP GIS Vocabulary

  • Use an interim lookup table to map GIS variables to standard concept IDs (exposure_concept_id) from GIS Ontology
  • If applicable, map related units to standard concept IDs (unit_concept_id) from OHDSI Athena
  • If a GIS variable has no concept_id, decide whether to add it to the GIS Vocabulary Package.

Step 5: Populate the external_exposure table

  • Ensure that each patient in OMOP has a geospatial identifier that can be linked to GIS datasets.
  • For each matched patient-location pair assign exposure_concept_id based on the mapped GIS variable.
  • Set exposure_start_date (reference date from the GIS dataset)
  • Populate value_as_number and unit_concept_id.
  • Populate other fields if applicable.
Field Name Description Data Example
exposure_occurrence_id Unique identifier for each exposure record 123456
location_id Foreign key linking to the location table, indicating where exposure occurred 789
person_id Foreign key linking to the person table, identifying the individual exposed 100234
cohort_definition_id (Optional) Links to a defined cohort in research studies 25
exposure_concept_id Standard OMOP concept_id representing the type of exposure 2052498173 (Percentile Rank Of Annual Mean Days Above PM2.5 Regulatory Standard - 3-Year Average)
exposure_start_date Date when the exposure event started 2024-01-15
exposure_end_date Date when the exposure event ended (NULL if ongoing exposure) NULL (ongoing)
exposure_type_concept_id Concept ID defining the origin of the exposure record 2052499258 (Government Data)
exposure_relationship_concept_id Concept ID describing how exposure relates to the person NULL
exposure_source_concept_id Source-specific concept ID before standardization to OMOP 90000001
exposure_source_value Raw exposure value from source data "EPL_PM"
exposure_relationship_source_value Raw value describing the exposure-person relationship NULL
dose_unit_source_value Source unit before standardization NULL
quantity Number of exposure occurrences (if applicable) 1
modifier_source_value (Optional) Modifier describing the exposure type or intensity NULL
operator_concept_id Concept ID defining operator logic (e.g., <, >, =) NULL
value_as_number Numerical value of the exposure (e.g., concentration level) 0.8503
value_as_concept_id Concept ID for categorical exposure values NULL
unit_concept_id Concept ID representing the measurement unit NULL

Step 6: Analyze the impact of GIS exposures on health outcomes.

@p-talapova p-talapova changed the title [Tufts] Step-by-Step GIS Vocabulary Package Usability Validation [IN-PROG] [Tufts] Step-by-Step GIS Vocabulary Package Usability Validation Feb 28, 2025
@p-talapova p-talapova added the documentation Requires (re) writing of documentation, no coding. label Feb 28, 2025
@p-talapova p-talapova self-assigned this Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Requires (re) writing of documentation, no coding.
Projects
None yet
Development

No branches or pull requests

1 participant