ENH: Write derivatives metadata #885

effigies · 2021-01-11T21:06:58Z

Closes #873.

pep8speaks · 2021-01-11T21:07:02Z

Hello @effigies, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 niworkflows.

Comment last updated at 2021-01-11 21:08:18 UTC

oesteban · 2021-01-11T21:17:31Z

mriqc/utils/bids.py

+def write_bidsignore(deriv_dir):
+    bids_ignore = (
+        "*.html", "logs/",  # Reports
+        "*_T1w.json", "*_T2w.json", "*_bold.json",  # Outputs are not yet standardized


aren't these standard?

There's no standardization of QC measures at this point. These would be expected to apply to .nii.gz files, but there are none to apply to, so I suspect (but haven't checked) that the validator would get upset. Under normal circumstances, bold.json at least requires TaskName and RepetitionTime (or other timing information), which aren't present.

These would be expected to apply to .nii.gz files, but there are none to apply to

Perhaps BIDS-Derivatives are missing a mechanism to add metadata to a raw file? This could be hard to implement and presents the problem of how you decide the hierarchy when two derivative folders attempt to set the same metadata or if you attempt to shadow some existing metadata in the raw dataset. But for MRIQC and descriptive tools as such, it would be killer.

This feels easy to declare, not sure about implement. @tyarkoni probably has the clearest perspective on how easy/painful that's likely to be.

Not sure what the scope of the suggestion is... Just having a standalone utility that takes an arbitrary filename or BIDSFile and writes an entry to the JSON file seems trivial. Is that the idea?

If you want to do it via the BIDSLayout, so that the indexing automatically syncs with the new metadata (instead of having to re-index the whole project), that would still be pretty straightforward, I think, though I'd have to think about it some more to be sure. In essence you could just have a BIDSFile.update_metadata() method that sets new metadata once the file-writing operation has successfully completed. The SQLAlchemy query-building might be a bit of a pain in the ass, because you'd need to check if there are any new entities, and if so, create DB records for them, but I don't think there would be any serious difficulties.

Is either of those what you have in mind, or are you thinking of something else?

This could be hard to implement and presents the problem of how you decide the hierarchy when two derivative folders attempt to set the same metadata or if you attempt to shadow some existing metadata in the raw dataset

I don't think there's any ambiguity here, at least as I read/think about the spec. If you have two separate derivatives folders, they can have conflicting metadata; I don't see anything that implies any global harmonization. And if you have a chain of derivatives (or a derivative shadowing raw), then I think it's natural to expect later changes to always overwrite earlier ones (just like inheritance).

@tyarkoni I'm having trouble parsing your post right now, so this is to clarify what I understood as Oscar's proposal:

bids/ derivatives/mriqc/sub-01/anat/ sub-01_T1w.json sub-01/anat/ sub-01_T1w.nii.gz <- MRIQC metadata applies to this data

Possibly this was entirely clear to you and I'm too tired to understand your points. Will re-read tomorrow.

@effigies that's what I understood as well; I'm just trying to figure out what the proposal would entail from a PyBIDS perspective, on the assumption that was the reason for tagging me. If this is just a question about the spec, I don't think I have any useful insight; at first blush, I don't think the spec needs to say anything more than it currently does in order to allow users to add arbitrary metadata to any file.

mriqc/utils/bids.py

oesteban · 2021-01-12T06:58:16Z

Well, it says explicitly that names of files must be unambiguous w.r.t. the raw dataset, and that's why they have to carry desc- if no other entity had been set. Also, not sure a stand-alone metadata file os acceptable. WDYT @effigies? Glad to see this is fairly doable from a pybids perspective. Thanks for the feedback!

…

On Tue, Jan 12, 2021, 04:18 Tal Yarkoni ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In mriqc/utils/bids.py <#885 (comment)>: > @@ -32,3 +35,62 @@ def collect_bids_data( ) return imaging_data + + +def write_bidsignore(deriv_dir): + bids_ignore = ( + "*.html", "logs/", # Reports + "*_T1w.json", "*_T2w.json", "*_bold.json", # Outputs are not yet standardized @effigies <https://github.com/effigies> that's what I understood as well; I'm just trying to figure out what the proposal would entail from a PyBIDS perspective, on the assumption that was the reason for tagging me. If this is just a question about the spec, I don't think I have any useful insight; at first blush, I don't think the spec needs to say anything more than it currently does in order to allow users to add arbitrary metadata to any file. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#885 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAESDRXXFNY565O3RCVDSVLSZO5QLANCNFSM4V6FLZRA> .

effigies · 2021-01-13T20:09:25Z

So if I understand, the position is that pulling in a derivative as a metadata overlay for the raw dataset should be technically doable, but that it should not be part of the spec. Hence it would be done as an app decision in PyBIDS.

If that's correct, I would suggest that we go ahead and use an overly-broad .bidsignore now, and pare it down as the BIDS validator implements derivatives and we can verify these edge cases.

oesteban · 2021-01-18T10:02:02Z

So if I understand, the position is that pulling in a derivative as a metadata overlay for the raw dataset should be technically doable, but that it should not be part of the spec. Hence it would be done as an app decision in PyBIDS.

I am thinking on the reverse - BIDS-Derivatives has been mostly pushed forward by use-case implementations. There are already a couple of issues filed in the BIDS' GH repo regarding quality. The solution proposed above seems something we might want to propose to the broad community of BIDS to consider integrating.

So my question was (to you and to @tyarkoni), should we file a PR against the specs to propose this? I don't want to file yet another PR that people will not find useful, honestly. So if we do, I want to at least know that some other people are thinking along the same direction.

W.R.T. this PR, yes, I agree adding to .bidsignore and moving forward perhaps propagating to PyBIDS would work very well.

effigies · 2021-01-18T14:45:01Z

Okay, merged. Regarding a proposal to the spec, I think the ambiguity of conflicting metadata in derivatives would make it a very difficult thing to specify, rather than have apps that know how they want to interpret nested metadata.

That said, maybe after PET is merged people will have the brain space to think about a proposal like that?

oesteban · 2021-01-18T14:55:24Z

But do you think it conceptually makes sense? At least to allow having metadata files that modify the raw-BIDS metadata, even if you don't anticipate how conflicts are resolved and let tools decide for themselves.

I do see the utility of this considering MRIQC and other past issues sent to the mailing list and the github repo later - but I have seen the utility of other ideas before and I'm kind of reluctant to submit something without a clear path for it into the specs.

effigies · 2021-01-18T15:53:24Z

I think as currently discussed there's too much ambiguity or flexibility to be more than advice for tooling. We've talked elsewhere about documents outlining best (or at least existing) practices for tooling that don't quite reach the level of a thing that can be specified, and the recommendations for how to patch could certainly be done in something like that.

But I do see two use cases that might make it worth having a formal mechanism for "patching" metadata:

QC, as here. There was previously some discussion of standardizing QC, so it's possible there's a BEP to be proposed.
Selecting fieldmaps. In ABCD there were multiple fieldmaps of varying quality, and I believe our solution ended up being to patch the metadata to cause fMRIPrep to use the best quality fieldmap. I think you, @Shotgunosine or @ericearl might remember more clearly than I.

If a derivative dataset were to declare itself as an augmentation of the raw dataset's metadata, rather than a derivative dataset that can/should be interpreted as a standalone dataset, the problem might be well-defined. We could crawl derivatives/*/dataset_description.json for "DatasetType": "patch" (or some other metadata indicator) and let the dataset add to metadata or clobber data files. Regular derivatives would be left alone.

If we want to use MRIQC as an implementation case for this, then it might be worth promoting to a QC BEP so that the metadata being patched in is well-defined.

ericearl · 2021-01-19T21:50:39Z

@effigies on your point 2 above: The ABCD Study, like many fMRI studies, uses field maps and doesn't always perfectly collect a field map (or field map pair in this case) per fMRI series to correct. So you end up with, let's say, 3 field map pairs for 5 fMRI runs. Quality of the field map pairs in our ABCD-BIDS collection were judged by registering and averaging all field maps, then using an eta-squared metric to judge the "best" pair. This also corrected the data better than potentially using any bad run's field map (because correcting with a bad field map is usually worse than not using one at all). In our case we gave the field map pair that was "best" quality the IntendedFor field metadata filled in. While other field maps are present for data users to do with what they will, the IntendedFor field only got filled in for the best pair.

I haven't read the rest of this pull request, but I'm guessing now this wasn't the detail you wanted...

Are you just curious how fMRIPrep handles multiple field maps present? That is definitely more of an @Shotgunosine question than for me.

A QC BEP would be mighty nice to have. fmriprep has that nice output HTML report. DCAN Labs has the Executive Summary. I'm not as familiar with MRIQC outputs to say, but I know it's popularly used.

effigies requested a review from oesteban January 11, 2021 21:06

effigies added 3 commits January 11, 2021 16:08

ENH: Add tools to write bidsignore and dataset_description.json

b6b8ae7

ENH: Write derivatives metadata

238c6ea

CI: Expect .bidsignore, dataset_description.json

8ffb6e1

effigies force-pushed the enh/bids_derivatives branch from 850b256 to 8ffb6e1 Compare January 11, 2021 21:08

oesteban reviewed Jan 11, 2021

View reviewed changes

mriqc/utils/bids.py Show resolved Hide resolved

oesteban reviewed Jan 11, 2021

View reviewed changes

mriqc/utils/bids.py Show resolved Hide resolved

oesteban approved these changes Jan 11, 2021

View reviewed changes

effigies merged commit 6db4467 into nipreps:master Jan 18, 2021

effigies deleted the enh/bids_derivatives branch January 18, 2021 14:42

effigies mentioned this pull request Jun 1, 2021

Patch datasets bids-standard/bids-specification#814

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Write derivatives metadata #885

ENH: Write derivatives metadata #885

effigies commented Jan 11, 2021

pep8speaks commented Jan 11, 2021 •

edited

Loading

oesteban Jan 11, 2021

effigies Jan 11, 2021

oesteban Jan 11, 2021

effigies Jan 12, 2021

tyarkoni Jan 12, 2021 •

edited

Loading

tyarkoni Jan 12, 2021

effigies Jan 12, 2021

tyarkoni Jan 12, 2021

oesteban commented Jan 12, 2021 via email

effigies commented Jan 13, 2021

oesteban commented Jan 18, 2021

effigies commented Jan 18, 2021

oesteban commented Jan 18, 2021

effigies commented Jan 18, 2021

ericearl commented Jan 19, 2021

ENH: Write derivatives metadata #885

ENH: Write derivatives metadata #885

Conversation

effigies commented Jan 11, 2021

pep8speaks commented Jan 11, 2021 • edited Loading

Comment last updated at 2021-01-11 21:08:18 UTC

oesteban Jan 11, 2021

Choose a reason for hiding this comment

effigies Jan 11, 2021

Choose a reason for hiding this comment

oesteban Jan 11, 2021

Choose a reason for hiding this comment

effigies Jan 12, 2021

Choose a reason for hiding this comment

tyarkoni Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

tyarkoni Jan 12, 2021

Choose a reason for hiding this comment

effigies Jan 12, 2021

Choose a reason for hiding this comment

tyarkoni Jan 12, 2021

Choose a reason for hiding this comment

oesteban commented Jan 12, 2021 via email

effigies commented Jan 13, 2021

oesteban commented Jan 18, 2021

effigies commented Jan 18, 2021

oesteban commented Jan 18, 2021

effigies commented Jan 18, 2021

ericearl commented Jan 19, 2021

pep8speaks commented Jan 11, 2021 •

edited

Loading

tyarkoni Jan 12, 2021 •

edited

Loading