Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Update info on assessment availability and following data model updates #123

Merged
merged 4 commits into from
Nov 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,12 @@ You could run the CLI as follows:
...
```

## Upgrading to a newer version of the CLI
Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph file. Breaking changes will be highlighted in the release notes!

_If you have already created `.jsonld` files for your Neurobagel graph database using the CLI_,
they can be quickly re-generated under the new data model by following the instructions [here](updating_dataset.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version.

## Development environment

To set up a development environment, please run
Expand Down
12 changes: 7 additions & 5 deletions docs/dictionaries.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,10 +308,10 @@ Possible heuristics:
### Assessment tool

For assessment tools like cognitive tests or rating scales,
Neurobagel encodes whether the tool was successfully completed.
Neurobagel encodes whether a subject has a value/score for _at least one_ item or subscale of the assessment.
Because assessment tools often have several subscales or items
that can be stored as separate columns in the tabular `participant.tsv` file,
each assessment tool column gets **a minimum** of two annotations:
each assessment tool column receives **a minimum** of two annotations:

- one to classify that the column `IsAbout` the generic category of assessment tools
- one to classify that the column `IsPartOf` the specific assessment tool
Expand Down Expand Up @@ -353,8 +353,8 @@ when instances of missing values are present (see also section [Missing values](
```

To determine whether a specific assessment tool is available for a given participant,
we then combine all of the columns that were classified as `PartOf` that specific tool
and then apply a simple `all()` heuristic to check that none of the columns
we then consider all of the columns that were classified as `IsPartOf` that specific tool
and then apply a simple `any()` heuristic to check that at least one column does not
contain any `MissingValues`.

For the above example, this would be:
Expand All @@ -363,13 +363,15 @@ For the above example, this would be:
|---------------|---------|---------|
| sub-01 | 2 | |
| sub-02 | 1 | 1 |
| sub-03 | | |

Therefore:

| particpant_id | updrs_available |
|---------------|-----------------|
| sub-01 | False |
| sub-01 | True |
| sub-02 | True |
| sub-03 | False |

## Missing values
Missing values are allowed for any phenotypic variable (column) that does not describe a participant or session identifier (e.g., columns like `participant_id` or `session_id`).
Expand Down
20 changes: 16 additions & 4 deletions docs/updating_dataset.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
# Updating a harmonized dataset

## Following a change in my _dataset_

When using Neurobagel tools on a dataset that is still undergoing data collection, you may need to update the Neurobagel annotations and/or graph-ready data for the dataset when you want to add new subjects or measurements or to correct mistakes in prior data versions.

For any of the below types of changes, you will need to regenerate a graph-ready `.jsonld` file for the dataset which reflects the change.

## If the phenotypic (tabular) data have changed
### If the phenotypic (tabular) data have changed
If new variables have been added to the dataset such that there are new columns in the phenotypic TSV you previously annotated using Neurobagel's annotation tool, you will need to:

1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](annotation_tool.md)

2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) on your updated TSV and data dictionary

## If only the imaging data have changed
### If only the imaging data have changed
If the BIDS data for a dataset have changed without changes in the corresponding phenotypic TSV (e.g., if new modalities or scans have been acquired for a subject), you have two options:

- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](cli.md) on the updated BIDS directory.
Expand All @@ -23,11 +25,21 @@ OR

_When in doubt, rerun both CLI commands._

## If only the subjects have changed
### If only the subjects have changed
If subjects have been added to or removed from the dataset but the phenotypic TSV is otherwise unchanged (i.e., only new or removed rows, without changes to the available variables), you will need to:

- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary

## Following a change in the _Neurobagel data model_

As Neurobagel continues developing the data model, new tool releases may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph data file.
Breaking changes will be highlighted in the release notes.

_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph.
This will ensure that if you use the latest version of the Neurobagel CLI to process new datasets (i.e., generate new `.jsonld` files) for your database, the resulting data will not have conflicts with existing data in the graph.

Note that if upgrading to a newer version of the data model, **you should regenerate the `.jsonld` files for _all_ datasets in your existing graph**.

## Updating the graph database
To allow easy (re-)uploading of the updated `.jsonld` for your dataset to a graph database, make a copy of it in a [central directory on your research data fileserver for storing local Neurobagel `jsonld` datasets](infrastructure.md#where-to-store-neurobagel-graph-ready-data).
To allow easy (re-)uploading of the updated `.jsonld` for your dataset(s) to a graph database, make a copy of it in a [central directory on your research data fileserver for storing local Neurobagel `jsonld` datasets](infrastructure.md#where-to-store-neurobagel-graph-ready-data).
Then, follow the steps for [uploading/updating a dataset in the graph database](infrastructure.md#uploading-data-to-the-graph) (needs to be completed by user with database write access).