capture transportable metadata about cohort definition, as part of the cohort json #2940

gowthamrao · 2024-05-27T10:41:53Z

We have observed that recent changes in the vocabulary to have impact on the operating characteristics of phenotype algorithms - within the same data source. Traditionally, the performance of phenotype algorithms were considered dependent only on the cohort definition and the data sources tested. However, it's becoming apparent that the vocabulary version plays a crucial role.

To better manage, we propose including additional metadata into the cohort definition itself:

Include Vocabulary Version in Cohort Metadata:
- Implement metadata capture within the cohort JSON for the user to record the vocabulary version where the cohort definition was last updated and/or evaluated. This addition will help users track changes and understand the impact of vocabulary updates on their studies.
Standardize Metadata Framework:
- Develop a more generalized framework for metadata using name-value pairs in the JSON format. This should include:
  - Standard fields like vocabularyVersion, firstDevelopedDate, and lastUpdatedDate.
  - Extendable user-defined fields that can describe broader metadata aspects, such as:
    - Library cohort status (e.g., isLibraryCohort: true/false)
    - Peer review status (e.g., isPeerReviewed: true/false)
    - Approval status (e.g., isApproved: true/false)
    - Usage in specific studies (e.g., usedInStudy: Study A)
    - Descriptive text blobs providing additional context or notes.
    - Author(s) attribution
- Add a global hash signature id that can uniquely identify the cohort json across atlas instances. This hash should update when changes are made to core cohort definition logic.

Some of these metadata are captured in public and private phenotype libraries. However, they are now becoming attributes of the cohort definition that is captured in the context of the library. If we can extend these attributes to be part of cohort json, then it can

Facilitate Metadata Transportability:
- Ensure that this metadata is structured in a way that allows it to be easily transported with the cohort JSON across different systems and studies, enhancing reproducibility and transparency.

This structured approach to metadata management will not only improve the fidelity of cohort definitions in the face of vocabulary changes but also enhance the overall utility and governance of cohorts in Atlas.

Discussed this idea with @dimshitc, Azza Shoaibi

This new metadata approach will make make public and private libraries of cohort definitions more easier to integrate. This allows Atlas to have a "librarian" role to curate definitions for reuse.

The text was updated successfully, but these errors were encountered:

dimshitc · 2024-05-27T10:48:33Z

JSON can keep the vocabulary version the cohort was created with, and the vocabulary version of the latest update.

gowthamrao · 2024-05-27T10:49:16Z

Note: a generalizable idea is that this "metadata" can be replacement of other metadata like ideas in the cohort json such as "description text box", or "tags".

gowthamrao mentioned this issue May 27, 2024

maybe atlas could capture the metadata that OHDSI Phenotype library is trying to capture? OHDSI/PhenotypeLibrary#88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capture transportable metadata about cohort definition, as part of the cohort json #2940

capture transportable metadata about cohort definition, as part of the cohort json #2940

gowthamrao commented May 27, 2024 •

edited

Loading

dimshitc commented May 27, 2024

gowthamrao commented May 27, 2024

capture transportable metadata about cohort definition, as part of the cohort json #2940

capture transportable metadata about cohort definition, as part of the cohort json #2940

Comments

gowthamrao commented May 27, 2024 • edited Loading

dimshitc commented May 27, 2024

gowthamrao commented May 27, 2024

gowthamrao commented May 27, 2024 •

edited

Loading