Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Write derivatives metadata #885

Merged
merged 3 commits into from
Jan 18, 2021

Conversation

effigies
Copy link
Member

Closes #873.

@effigies effigies requested a review from oesteban January 11, 2021 21:06
@pep8speaks
Copy link

pep8speaks commented Jan 11, 2021

Hello @effigies, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 niworkflows.

Comment last updated at 2021-01-11 21:08:18 UTC

@effigies effigies force-pushed the enh/bids_derivatives branch from 850b256 to 8ffb6e1 Compare January 11, 2021 21:08
def write_bidsignore(deriv_dir):
bids_ignore = (
"*.html", "logs/", # Reports
"*_T1w.json", "*_T2w.json", "*_bold.json", # Outputs are not yet standardized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't these standard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no standardization of QC measures at this point. These would be expected to apply to .nii.gz files, but there are none to apply to, so I suspect (but haven't checked) that the validator would get upset. Under normal circumstances, bold.json at least requires TaskName and RepetitionTime (or other timing information), which aren't present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would be expected to apply to .nii.gz files, but there are none to apply to

Perhaps BIDS-Derivatives are missing a mechanism to add metadata to a raw file? This could be hard to implement and presents the problem of how you decide the hierarchy when two derivative folders attempt to set the same metadata or if you attempt to shadow some existing metadata in the raw dataset. But for MRIQC and descriptive tools as such, it would be killer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels easy to declare, not sure about implement. @tyarkoni probably has the clearest perspective on how easy/painful that's likely to be.

Copy link

@tyarkoni tyarkoni Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the scope of the suggestion is... Just having a standalone utility that takes an arbitrary filename or BIDSFile and writes an entry to the JSON file seems trivial. Is that the idea?

If you want to do it via the BIDSLayout, so that the indexing automatically syncs with the new metadata (instead of having to re-index the whole project), that would still be pretty straightforward, I think, though I'd have to think about it some more to be sure. In essence you could just have a BIDSFile.update_metadata() method that sets new metadata once the file-writing operation has successfully completed. The SQLAlchemy query-building might be a bit of a pain in the ass, because you'd need to check if there are any new entities, and if so, create DB records for them, but I don't think there would be any serious difficulties.

Is either of those what you have in mind, or are you thinking of something else?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be hard to implement and presents the problem of how you decide the hierarchy when two derivative folders attempt to set the same metadata or if you attempt to shadow some existing metadata in the raw dataset

I don't think there's any ambiguity here, at least as I read/think about the spec. If you have two separate derivatives folders, they can have conflicting metadata; I don't see anything that implies any global harmonization. And if you have a chain of derivatives (or a derivative shadowing raw), then I think it's natural to expect later changes to always overwrite earlier ones (just like inheritance).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyarkoni I'm having trouble parsing your post right now, so this is to clarify what I understood as Oscar's proposal:

bids/
    derivatives/mriqc/sub-01/anat/
        sub-01_T1w.json
    sub-01/anat/
        sub-01_T1w.nii.gz  <- MRIQC metadata applies to this data

Possibly this was entirely clear to you and I'm too tired to understand your points. Will re-read tomorrow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@effigies that's what I understood as well; I'm just trying to figure out what the proposal would entail from a PyBIDS perspective, on the assumption that was the reason for tagging me. If this is just a question about the spec, I don't think I have any useful insight; at first blush, I don't think the spec needs to say anything more than it currently does in order to allow users to add arbitrary metadata to any file.

@oesteban
Copy link
Member

oesteban commented Jan 12, 2021 via email

@effigies
Copy link
Member Author

So if I understand, the position is that pulling in a derivative as a metadata overlay for the raw dataset should be technically doable, but that it should not be part of the spec. Hence it would be done as an app decision in PyBIDS.

If that's correct, I would suggest that we go ahead and use an overly-broad .bidsignore now, and pare it down as the BIDS validator implements derivatives and we can verify these edge cases.

@oesteban
Copy link
Member

So if I understand, the position is that pulling in a derivative as a metadata overlay for the raw dataset should be technically doable, but that it should not be part of the spec. Hence it would be done as an app decision in PyBIDS.

I am thinking on the reverse - BIDS-Derivatives has been mostly pushed forward by use-case implementations. There are already a couple of issues filed in the BIDS' GH repo regarding quality. The solution proposed above seems something we might want to propose to the broad community of BIDS to consider integrating.

So my question was (to you and to @tyarkoni), should we file a PR against the specs to propose this? I don't want to file yet another PR that people will not find useful, honestly. So if we do, I want to at least know that some other people are thinking along the same direction.

W.R.T. this PR, yes, I agree adding to .bidsignore and moving forward perhaps propagating to PyBIDS would work very well.

@effigies effigies merged commit 6db4467 into nipreps:master Jan 18, 2021
@effigies effigies deleted the enh/bids_derivatives branch January 18, 2021 14:42
@effigies
Copy link
Member Author

Okay, merged. Regarding a proposal to the spec, I think the ambiguity of conflicting metadata in derivatives would make it a very difficult thing to specify, rather than have apps that know how they want to interpret nested metadata.

That said, maybe after PET is merged people will have the brain space to think about a proposal like that?

@oesteban
Copy link
Member

But do you think it conceptually makes sense? At least to allow having metadata files that modify the raw-BIDS metadata, even if you don't anticipate how conflicts are resolved and let tools decide for themselves.

I do see the utility of this considering MRIQC and other past issues sent to the mailing list and the github repo later - but I have seen the utility of other ideas before and I'm kind of reluctant to submit something without a clear path for it into the specs.

@effigies
Copy link
Member Author

I think as currently discussed there's too much ambiguity or flexibility to be more than advice for tooling. We've talked elsewhere about documents outlining best (or at least existing) practices for tooling that don't quite reach the level of a thing that can be specified, and the recommendations for how to patch could certainly be done in something like that.

But I do see two use cases that might make it worth having a formal mechanism for "patching" metadata:

  1. QC, as here. There was previously some discussion of standardizing QC, so it's possible there's a BEP to be proposed.
  2. Selecting fieldmaps. In ABCD there were multiple fieldmaps of varying quality, and I believe our solution ended up being to patch the metadata to cause fMRIPrep to use the best quality fieldmap. I think you, @Shotgunosine or @ericearl might remember more clearly than I.

If a derivative dataset were to declare itself as an augmentation of the raw dataset's metadata, rather than a derivative dataset that can/should be interpreted as a standalone dataset, the problem might be well-defined. We could crawl derivatives/*/dataset_description.json for "DatasetType": "patch" (or some other metadata indicator) and let the dataset add to metadata or clobber data files. Regular derivatives would be left alone.

If we want to use MRIQC as an implementation case for this, then it might be worth promoting to a QC BEP so that the metadata being patched in is well-defined.

@ericearl
Copy link

@effigies on your point 2 above: The ABCD Study, like many fMRI studies, uses field maps and doesn't always perfectly collect a field map (or field map pair in this case) per fMRI series to correct. So you end up with, let's say, 3 field map pairs for 5 fMRI runs. Quality of the field map pairs in our ABCD-BIDS collection were judged by registering and averaging all field maps, then using an eta-squared metric to judge the "best" pair. This also corrected the data better than potentially using any bad run's field map (because correcting with a bad field map is usually worse than not using one at all). In our case we gave the field map pair that was "best" quality the IntendedFor field metadata filled in. While other field maps are present for data users to do with what they will, the IntendedFor field only got filled in for the best pair.

I haven't read the rest of this pull request, but I'm guessing now this wasn't the detail you wanted...

Are you just curious how fMRIPrep handles multiple field maps present? That is definitely more of an @Shotgunosine question than for me.

A QC BEP would be mighty nice to have. fmriprep has that nice output HTML report. DCAN Labs has the Executive Summary. I'm not as familiar with MRIQC outputs to say, but I know it's popularly used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify/update BIDS Derivatives compliance
5 participants