-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Write derivatives metadata #885
Conversation
Hello @effigies, Thank you for updating! Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, Comment last updated at 2021-01-11 21:08:18 UTC |
850b256
to
8ffb6e1
Compare
def write_bidsignore(deriv_dir): | ||
bids_ignore = ( | ||
"*.html", "logs/", # Reports | ||
"*_T1w.json", "*_T2w.json", "*_bold.json", # Outputs are not yet standardized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't these standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no standardization of QC measures at this point. These would be expected to apply to .nii.gz
files, but there are none to apply to, so I suspect (but haven't checked) that the validator would get upset. Under normal circumstances, bold.json
at least requires TaskName
and RepetitionTime
(or other timing information), which aren't present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would be expected to apply to
.nii.gz
files, but there are none to apply to
Perhaps BIDS-Derivatives are missing a mechanism to add metadata to a raw file? This could be hard to implement and presents the problem of how you decide the hierarchy when two derivative folders attempt to set the same metadata or if you attempt to shadow some existing metadata in the raw dataset. But for MRIQC and descriptive tools as such, it would be killer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels easy to declare, not sure about implement. @tyarkoni probably has the clearest perspective on how easy/painful that's likely to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the scope of the suggestion is... Just having a standalone utility that takes an arbitrary filename or BIDSFile
and writes an entry to the JSON file seems trivial. Is that the idea?
If you want to do it via the BIDSLayout
, so that the indexing automatically syncs with the new metadata (instead of having to re-index the whole project), that would still be pretty straightforward, I think, though I'd have to think about it some more to be sure. In essence you could just have a BIDSFile.update_metadata()
method that sets new metadata once the file-writing operation has successfully completed. The SQLAlchemy query-building might be a bit of a pain in the ass, because you'd need to check if there are any new entities, and if so, create DB records for them, but I don't think there would be any serious difficulties.
Is either of those what you have in mind, or are you thinking of something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be hard to implement and presents the problem of how you decide the hierarchy when two derivative folders attempt to set the same metadata or if you attempt to shadow some existing metadata in the raw dataset
I don't think there's any ambiguity here, at least as I read/think about the spec. If you have two separate derivatives folders, they can have conflicting metadata; I don't see anything that implies any global harmonization. And if you have a chain of derivatives (or a derivative shadowing raw), then I think it's natural to expect later changes to always overwrite earlier ones (just like inheritance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tyarkoni I'm having trouble parsing your post right now, so this is to clarify what I understood as Oscar's proposal:
bids/
derivatives/mriqc/sub-01/anat/
sub-01_T1w.json
sub-01/anat/
sub-01_T1w.nii.gz <- MRIQC metadata applies to this data
Possibly this was entirely clear to you and I'm too tired to understand your points. Will re-read tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies that's what I understood as well; I'm just trying to figure out what the proposal would entail from a PyBIDS perspective, on the assumption that was the reason for tagging me. If this is just a question about the spec, I don't think I have any useful insight; at first blush, I don't think the spec needs to say anything more than it currently does in order to allow users to add arbitrary metadata to any file.
Well, it says explicitly that names of files must be unambiguous w.r.t. the
raw dataset, and that's why they have to carry desc- if no other entity had
been set. Also, not sure a stand-alone metadata file os acceptable. WDYT
@effigies?
Glad to see this is fairly doable from a pybids perspective.
Thanks for the feedback!
…On Tue, Jan 12, 2021, 04:18 Tal Yarkoni ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mriqc/utils/bids.py
<#885 (comment)>:
> @@ -32,3 +35,62 @@ def collect_bids_data(
)
return imaging_data
+
+
+def write_bidsignore(deriv_dir):
+ bids_ignore = (
+ "*.html", "logs/", # Reports
+ "*_T1w.json", "*_T2w.json", "*_bold.json", # Outputs are not yet standardized
@effigies <https://github.com/effigies> that's what I understood as well;
I'm just trying to figure out what the proposal would entail from a PyBIDS
perspective, on the assumption that was the reason for tagging me. If this
is just a question about the spec, I don't think I have any useful insight;
at first blush, I don't think the spec needs to say anything more than it
currently does in order to allow users to add arbitrary metadata to any
file.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#885 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAESDRXXFNY565O3RCVDSVLSZO5QLANCNFSM4V6FLZRA>
.
|
So if I understand, the position is that pulling in a derivative as a metadata overlay for the raw dataset should be technically doable, but that it should not be part of the spec. Hence it would be done as an app decision in PyBIDS. If that's correct, I would suggest that we go ahead and use an overly-broad |
I am thinking on the reverse - BIDS-Derivatives has been mostly pushed forward by use-case implementations. There are already a couple of issues filed in the BIDS' GH repo regarding quality. The solution proposed above seems something we might want to propose to the broad community of BIDS to consider integrating. So my question was (to you and to @tyarkoni), should we file a PR against the specs to propose this? I don't want to file yet another PR that people will not find useful, honestly. So if we do, I want to at least know that some other people are thinking along the same direction. W.R.T. this PR, yes, I agree adding to |
Okay, merged. Regarding a proposal to the spec, I think the ambiguity of conflicting metadata in derivatives would make it a very difficult thing to specify, rather than have apps that know how they want to interpret nested metadata. That said, maybe after PET is merged people will have the brain space to think about a proposal like that? |
But do you think it conceptually makes sense? At least to allow having metadata files that modify the raw-BIDS metadata, even if you don't anticipate how conflicts are resolved and let tools decide for themselves. I do see the utility of this considering MRIQC and other past issues sent to the mailing list and the github repo later - but I have seen the utility of other ideas before and I'm kind of reluctant to submit something without a clear path for it into the specs. |
I think as currently discussed there's too much ambiguity or flexibility to be more than advice for tooling. We've talked elsewhere about documents outlining best (or at least existing) practices for tooling that don't quite reach the level of a thing that can be specified, and the recommendations for how to patch could certainly be done in something like that. But I do see two use cases that might make it worth having a formal mechanism for "patching" metadata:
If a derivative dataset were to declare itself as an augmentation of the raw dataset's metadata, rather than a derivative dataset that can/should be interpreted as a standalone dataset, the problem might be well-defined. We could crawl If we want to use MRIQC as an implementation case for this, then it might be worth promoting to a QC BEP so that the metadata being patched in is well-defined. |
@effigies on your point 2 above: The ABCD Study, like many fMRI studies, uses field maps and doesn't always perfectly collect a field map (or field map pair in this case) per fMRI series to correct. So you end up with, let's say, 3 field map pairs for 5 fMRI runs. Quality of the field map pairs in our ABCD-BIDS collection were judged by registering and averaging all field maps, then using an eta-squared metric to judge the "best" pair. This also corrected the data better than potentially using any bad run's field map (because correcting with a bad field map is usually worse than not using one at all). In our case we gave the field map pair that was "best" quality the IntendedFor field metadata filled in. While other field maps are present for data users to do with what they will, the IntendedFor field only got filled in for the best pair. I haven't read the rest of this pull request, but I'm guessing now this wasn't the detail you wanted... Are you just curious how fMRIPrep handles multiple field maps present? That is definitely more of an @Shotgunosine question than for me. A QC BEP would be mighty nice to have. fmriprep has that nice output HTML report. DCAN Labs has the Executive Summary. I'm not as familiar with MRIQC outputs to say, but I know it's popularly used. |
Closes #873.