Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] add _proc-<label> to all modalities having _rec (anat, fmap, func, perf, and pet) #105

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Collaborator

Initially I thought also to introduce it to behavioral data as well, since IMHO it potentially could be applicable whenever hardware might already be producing multiple files, e.g. where physio signals were minimally preprocessed (e.g. removing MR pulse effects etc). If anyone has an existing use case already - let me know or just push an additional commit for that portion.

Closes: #65

@yarikoptic yarikoptic added the enhancement New feature or request label Dec 12, 2018
The OPTIONAL `proc-<label>` key/value can be used to distinguish
between original and processed on the scanner data, e.g. `_proc-norm`
could be used for ["Normalized Pixel" field-bias corrected](https://github.com/nipy/heudiconv/issues/266#issuecomment-432662723) data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh -- just now spotted that added it a bit too late -- will move it right after initial mentioning in anat/

Copy link
Contributor

@chrisgorgo chrisgorgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide more language on how proc differs from rec. It would also be useful to revisit the existing examples that recommend using rec for differentiating on scanner motion corrected scans from others.

@chrisgorgo
Copy link
Contributor

I personally feel uneasy adding this new keyword to any data type it could possibly apply to when there is a use case only for structural scans. Without concrete use case for other data types it is hard to evaluate all angles and thus there is a risk of adding something that is suboptimal but will need to be kept for backward compatibility.

I would recommend being conservative and sticking only to enhancements driven by real-world use cases.

@yarikoptic
Copy link
Collaborator Author

I personally feel uneasy adding this new keyword to any data type it could possibly apply to when there is a use case only for structural scans. Without concrete use case for other data types it is hard to evaluate all angles and thus there is a risk of adding something that is suboptimal but will need to be kept for backward compatibility.

I would recommend being conservative and sticking only to enhancements driven by real-world use cases.

Here is some details about the use case which brought me to raise this issue/PR:

718211$ for d in *; do f=$(/bin/ls $d/| head -n 1); echo -n "$d  "; dcmdump $d/$f | grep 0008,0008; done | grep NORM
anat-scout_acq-64ch_ses-01_512x512.1  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM]             #  26, 5 ImageType
anat-scout_acq-64ch_ses-01_512x512.17  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM]             #  26, 5 ImageType
anat-T1w_ses-01_320x300.9  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM]             #  26, 5 ImageType
anat-T2w_ses-01_320x300.11  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM]             #  26, 5 ImageType
func_task-encodingFaces_acq-MB8_run-01_ses-01_936x936.2  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-encodingFaces_acq-MB8_run-02_ses-01_936x936.3  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-encodingFood_acq-MB8_run-01_ses-01_936x936.4  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-encodingFood_acq-MB8_run-02_ses-01_936x936.5  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-nbackFaces_acq-MB8_run-01_ses-01_936x936.18  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-nbackFaces_acq-MB8_run-02_ses-01_936x936.19  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-recognitionFaces_acq-MB8_run-01_ses-01_936x936.12  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-recognitionFaces_acq-MB8_run-02_ses-01_936x936.13  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-recognitionFood_acq-MB8_run-01_ses-01_936x936.14  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-recognitionFood_acq-MB8_run-02_ses-01_936x936.15  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType
func_task-rest_acq-MB8_run-01_ses-01_936x936.16  (0008,0008) CS [ORIGINAL\PRIMARY\M\ND\NORM\MOSAIC]      #  34, 6 ImageType

so NORM field was set not only for anatomy but also for functional images. So even though the "very" original image was not output, the one we obtained did undergo through normalization procedure in the scanner. BTW, I think I forgot to point to the explanation and pointers Chris Rorden (@neurolabusc ) provided: nipy/heudiconv#266 (comment) .

I will see if there will be some time on our scanner to see if scanning parameters for EPI and DWI allow output of both "original" and some preprocessed ("normalized) data. @kodiweera - may be you know from top of your head?

@yarikoptic
Copy link
Collaborator Author

@chrisfilo wrote (apparently can't quote directly from the review comment :-/):

Please provide more language on how proc differs from rec. It would also be useful to revisit the existing examples that recommend using rec for differentiating on scanner motion corrected scans from others.

Good point! Besides duplication in describing _rec- (with motion correction as an example) twice in the subsequent paragraphs (blame me for changes in 79e03e1, I guess I wanted to compress wording but forgot to remove the original one for _rec), we should indeed review it. So here is my take:

To me, _rec- is about "reconstruction" (k-space -> "physical"? space) algorithm, which could potentially differ. If in-scanner motion correction is done somehow during such transformation (big if), it is a valid information to place into _rec field. But otherwise I would expect _rec to contain information about actual reconstruction applied and not scanner post-processing. So why bother about having _rec? I do know some labs (e.g. Wandell) which I believe do collect k-space data and then do reconstruction by "themselves" (do you run into Brian any time to ask?). With all the "sparse sampling" and multi band, reconstruction question becomes really non-trivial and there could be multiple ways you could achieve final data, and thus _rec- field to discriminate. That is why it should be useful for hardcore MR ppl and physicists and should stay in the specification. Among openneuro datasets which we have as public datalad datasets on http://datasets.datalad.org/?dir=/openneuro I see only one which used _rec- with a label halfhalf, and it is our dataset ;) @snastase -- was "halfhalf" to depict spatial smoothing because there were no other field like _proc more appropriate to use at that time in BIDS?

"Motion correction" sounds IMHO more appropriately to be placed into _proc if it is done "post reconstruction", and the fact that we have listed it in there could as well be a mistake/shortcoming (unless it is truly related to reconstruction). @kodiweera , @neurolabusc - do you know how the motion correction in typically done by the scanner?
There is all those predictive "motion prediction" FOV adjustments I have heard about, but not sure if any is already implemented in the scanner, and then anyways it might not need to be depicted in the name since it would be just the feature of "original" data, and would not be related to reconstruction anyways per se.

So, may be, "motion correction" should be depicted in _proc actually, which I would be happy to adjust for in this PR.

@chrisgorgo
Copy link
Contributor

Regarding scope - I see the argument for anat and func, but not necessarily for fieldmaps. I would recommend holding off on adding the new keyword for those data types until we have a use case.

@yarikoptic
Copy link
Collaborator Author

Regarding scope - I see the argument for anat and func, but not necessarily for fieldmaps. I would recommend holding off on adding the new keyword for those data types until we have a use case.

how fieldmaps are special and avoid the doom of being preprocessed by the scanner? ;)

(git-annex)hopa:~/datalad/openneuro[master]git-annex
$> datalad -f '{path} {metadata[bids][ImageType]}' -c datalad.search.index-egrep-documenttype=files search --mode egrep path:.*fmap/.*.nii.gz bids.ImageType:NORM | sed -e "s,$PWD,,g"     
/ds001399/sub-MOCCAG001/ses-02/fmap/sub-MOCCAG001_ses-02_acq-multiband3p0MB2AP_epi.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']
/ds001399/sub-MOCCAG003/ses-01/fmap/sub-MOCCAG003_ses-01_acq-multiband2p4flatAP_epi.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']
/ds001399/sub-MOCCAG003/ses-01/fmap/sub-MOCCAG003_ses-01_acq-multiband3p0MB2PA_epi.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']
/ds001399/sub-MOCCAG003/ses-01/fmap/sub-MOCCAG003_ses-01_acq-multiband3p0MB2AP_epi.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']
/ds001399/sub-MOCCAG003/ses-01/fmap/sub-MOCCAG003_ses-01_acq-multiband2p4flatPA_epi.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']
/ds001399/sub-MOCCAG001/ses-02/fmap/sub-MOCCAG001_ses-02_acq-multiband3p0MB2PA_epi.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']

and found 'NORM' listed in locally acquired regular fieldmap's _magnitude*.json files some spin echo's used for topup fieldmap

@chrisgorgo
Copy link
Contributor

I see the snark is back. I would appreciate if someone else could take over handling this PR.

@chrisgorgo chrisgorgo dismissed their stale review December 13, 2018 23:36

out of resources to handle the PR

@yarikoptic
Copy link
Collaborator Author

just curious, what snark is back? ;-)

From The Free On-line Dictionary of Computing (05 January 2017) [foldoc]:

  snark
  
     [Lewis Carroll, via the Michigan Terminal System] 1. A system
     failure.  When a user's process bombed, the operator would get
     the message "Help, Help, Snark in MTS!"
  
     2. More generally, any kind of unexplained or threatening
     event on a computer (especially if it might be a boojum).
     Often used to refer to an event or a log file entry that might
     indicate an attempted security violation.  See {snivitz}.
  
     3. UUCP name of snark.thyrsus.com, home site of the Hacker
     {Jargon File} versions 2.*.*.
  
     [{Jargon File}]
  

@chrisfilo I don't want to be pushy, if you insist that better to leave fieldmap out of question - I could do that, and will not resist any longer. I personally just don't see a reason for that just to come back to it later when someone asks or starts sticking those values into other fields (silently). People are masters at coming up with workarounds ;-)

@sappelhoff sappelhoff changed the title ENH: add _proc-<label> to mri data [ENH] add _proc-<label> to mri data Apr 12, 2019
@yarikoptic
Copy link
Collaborator Author

@yarikoptic
Copy link
Collaborator Author

and now I am encountering desire to use _proc for some DWI (@dnkennedy use case):

$> for d in 501_26/1[345678]*; do dcmdump $d/*-000001-*.ima | grep SeriesDescrip; done
(0008,103e) LO [ep2d_diff_high35]                       #  16, 1 SeriesDescription
(0008,103e) LO [ep2d_diff_high35_ADC]                   #  20, 1 SeriesDescription
(0008,103e) LO [ep2d_diff_high35_TRACEW]                #  24, 1 SeriesDescription
(0008,103e) LO [ep2d_diff_high35_FA]                    #  20, 1 SeriesDescription
(0008,103e) LO [ep2d_diff_high35_ColFA]                 #  22, 1 SeriesDescription
(0008,103e) LO [ep2d_diff_high35_TENSOR]                #  24, 1 SeriesDescription

And AFAIK those are all derived data/estimates as processed by the scanner software. In BIDS for DWI ATM we do not have any field where they should go: freeform _acq- is there but it is really not to describe those transformations of reconstructed (from k-space) "raw" data. I also feel that _rec- should really be reserved for annotating "reconstruction" of data vs post-processing done at the device after the device (thus language actually should be adjusted).

FWIW I do not see many uses of _rec among openneuro datasets, and those which present are "debatable":
(git-annex)lena:~/datalad/openneuro[master]git
$> datalad -f '{path} {metadata[bids][ImageType]}' -c datalad.search.index-egrep-documenttype=files search --mode egrep 'path:.*/.*_rec-.*\.nii.gz'
/home/yoh/datalad/openneuro/ds000233/sub-rid000001/anat/sub-rid000001_rec-ehalfhalf_T1w.nii.gz N/A
/home/yoh/datalad/openneuro/ds000233/sub-rid000001/anat/sub-rid000001_rec-ehalfhalf_mod-T1w_defacemask.nii.gz N/A
...
/home/yoh/datalad/openneuro/ds001705/sub-000101/ses-baseline/pet/sub-000101_ses-baseline_rec-MLEM_pet.nii.gz N/A
/home/yoh/datalad/openneuro/ds001705/sub-000101/ses-displaced/pet/sub-000101_ses-displaced_rec-MLEM_pet.nii.gz N/A
...
/home/yoh/datalad/openneuro/ds001740/derivatives/sub-pilote2/me_func/sub-pilote2_task-convers_run-03_rec-arithmsum_bold.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'MB', 'TE1', 'ND', 'MOSAIC']
/home/yoh/datalad/openneuro/ds001740/derivatives/sub-pilote2/me_func/sub-pilote2_task-convers_run-03_rec-t2sweighted_bold.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'MB', 'TE1', 'ND', 'MOSAIC']
...
/home/yoh/datalad/openneuro/ds002041/sub-2001/pet/sub-2001_task-rest_acq-fallypride_rec-acdyn_pet_bpnd.nii.gz N/A
/home/yoh/datalad/openneuro/ds002041/sub-2001/pet/sub-2001_task-rest_acq-fallypride_rec-acdyn_pet_bpnd_space-MNI152.nii.gz N/A
...
/home/yoh/datalad/openneuro/ds002156/sub-23638/ses-04/anat/sub-23638_ses-04_acq-mprage_rec-ORIG_run-1_T1w.nii.gz ['ORIGINAL', 'PRIMARY', 'OTHER']
/home/yoh/datalad/openneuro/ds002276/sub-PILOT/ses-01/func/sub-PILOT_ses-01_task-StrangerThingsS01E01_rec-magnitude_run-01_echo-1_sbref.nii.gz ['ORIGINAL', 'PRIMARY', 'M', 'TE1', 'ND', 'MOSAIC']
/home/yoh/datalad/openneuro/ds002276/sub-PILOT/ses-01/func/sub-PILOT_ses-01_task-StrangerThingsS01E01_rec-phase_run-01_echo-1_sbref.nii.gz ['ORIGINAL', 'PRIMARY', 'P', 'TE1', 'ND', 'MOSAIC']
...

Dear @bids-standard/bep_leads (since I bet you have similar use cases for other modalities), @bids-standard/derivatives (since it might relate to annotating derived data), @bids-standard/steering -- please provide feedback. Also @francopestilli , since you worked a lot with DWI, what do you think/how do you annotate "derived on the scanner" func/, dwi/ and other volumes? Should we keep "abusing" _rec and _acq or use more descriptive/generic _proc?

@yarikoptic
Copy link
Collaborator Author

thinking about it - probably aforementioned dwi processed files, shouldn't be a mere _proc or _rec -- they are really derived data, not mere "slightly preprocessed original data", so probably shouldn't even have the _dwi suffix, but rather get a dedicated ones (_fa, etc). Decision should be "aligned" with "common derivatives" approach in #265

@mateuszpawlik
Copy link
Contributor

I'd like to contribute with my application. I'm converting older datasets to BIDS. Unfortunately, the Prescan Normalize filter on the scanner wasn't used consistently and it seems not always all data was exported. This resulted in some subjects having both normalized and not normalized functional images, and some having one of them. I want to document that but I couldn't make up my mind whether to put this information in the file name, or the json file, or keep only normalized images when exist and not normalized otherwise. Fortunately, I found this PR. Could anyone report if the _proc will be added to the specification or not and what to use until then? I'd like to be consistent for the datasets I'm working with.
Disclaimer: I'm not an mri person.

@octomike
Copy link

I'm also handling a number of images from our Siemens scanner right now, wondering what to do with the NORM image type and how to distinguish them from raw/non-normalized images.

Can anyone share if there are new insights into the issue? Otherwise, I would really like to see this merged :)

@CPernet
Copy link
Collaborator

CPernet commented Apr 22, 2022

@mateuszpawlik @octomike for those cases you could use the rec- option.
from the sepc: 'the OPTIONAL rec- key/value can be used to distinguish different reconstruction algorithms'

  • the prescan normalize (image type NORM\ND) is a normalization filter applied at the reconstruction stage (but @yarikoptic will argue not from k-space so it's _proc)
  • the gradient distortion correction (image type NORM\DIS2D or image type NORM\DIS3D)) is a correction also done at reconstruction (although can also be done after)

example: sub-0158_ses-baseline_acq-MPRAGE_rec-normdistcorr_T1w.nii

@CPernet
Copy link
Collaborator

CPernet commented Apr 22, 2022

a more general comment @yarikoptic is should anything coming out of the scanner be seen as raw (which seems to be the consensus, including those filtered, corrected images and I store them using _rec) or should this be under derivatives (for which _proc works nicely)

@oesteban
Copy link
Collaborator

should anything coming out of the scanner be seen as raw

FWIW in one of the latest BIDS workshops I attended, this is the least inappropriate definition we could come up with for what raw means.

In other contexts, whatever the scanner spits out is referred to as unprocessed (despite the fact that lots of processing happens within the scanner).

Regarding how this applies to this case I have no idea, but by reading Cyril's comment, it seems that _rec is the right call.

@CPernet
Copy link
Collaborator

CPernet commented Apr 22, 2022

least inappropriate oh my :-)

@poldrack
Copy link

poldrack commented Apr 22, 2022 via email

@effigies
Copy link
Collaborator

Seems like a good time to make a decision, IMO. We've clearly been living without this, and @CPernet offers a solution that does not involve adding more entities to file names.

Is rec-<label> (e.g., rec-Norm for normalized on-scanner, or rec-NormMC for normalized and motion corrected on-scanner) acceptable even if not entirely satisfying? Or is there a compelling use case that makes this use of rec-<label> untenable?

If we do make this change, we could note in the rec- definition that relevant on-scanner processing MAY be indicated in this entity.

@neurolabusc
Copy link

This is related to dcm2niix issue 597. In particular, see comments from @mharms. If users want to see explicit support for dcm2niix they will need to provide a sample DICOM series where one image has this switched on and the other has it switched off.

@CPernet
Copy link
Collaborator

CPernet commented Apr 22, 2022

@neurolabusc I will push soon phantom dicom files with different image types (been easter break here)

=== Do not change lines below ===
{
 "chain": [
  "e58542b95bdb7551638ee0e2bf337583cec55c07"
 ],
 "cmd": "git grep -l 'reconstruction: opt' | xargs sed -e 's,^\\( *\\)\\(reconstruction: optional\\),\\1\\2\\n\\1processing: optional,g' -i ",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "src"
}
^^^ Do not change lines above ^^^
@yarikoptic
Copy link
Collaborator Author

@octomike well -- the issue seems to be not technical but social. As a quick workaround, you can just follow the statements above to (ab)use _rec- for anything done in the scanner (similarly to _acq- typically abused to encapsulate any difference in acquisition).

Meanwhile we would have **only** MEG data (not even EEG) to be capable of carrying `_proc-` for some reason
❯ git grep -B8 'processing: '
src/schema/README.md-  datatypes:
src/schema/README.md-    - meg
src/schema/README.md-  entities:
src/schema/README.md-    subject: required
src/schema/README.md-    session: optional
src/schema/README.md-    task: required
src/schema/README.md-    acquisition: optional
src/schema/README.md-    run: optional
src/schema/README.md:    processing: optional
--
src/schema/README.md-  datatypes:
src/schema/README.md-    - meg
src/schema/README.md-  entities:
src/schema/README.md-    subject: required
src/schema/README.md-    session: optional
src/schema/README.md-    task: required
src/schema/README.md-    acquisition: optional
src/schema/README.md-    run: optional
src/schema/README.md:    processing: optional
--
src/schema/rules/files/raw/channels.yaml-
src/schema/rules/files/raw/channels.yaml-# MEG has an additional entity available
src/schema/rules/files/raw/channels.yaml-channels__meg:
src/schema/rules/files/raw/channels.yaml-  $ref: rules.files.raw.channels.channels
src/schema/rules/files/raw/channels.yaml-  datatypes:
src/schema/rules/files/raw/channels.yaml-    - meg
src/schema/rules/files/raw/channels.yaml-  entities:
src/schema/rules/files/raw/channels.yaml-    $ref: rules.files.raw.channels.channels.entities
src/schema/rules/files/raw/channels.yaml:    processing: optional
--
src/schema/rules/files/raw/meg.yaml-  datatypes:
src/schema/rules/files/raw/meg.yaml-    - meg
src/schema/rules/files/raw/meg.yaml-  entities:
src/schema/rules/files/raw/meg.yaml-    subject: required
src/schema/rules/files/raw/meg.yaml-    session: optional
src/schema/rules/files/raw/meg.yaml-    task: required
src/schema/rules/files/raw/meg.yaml-    acquisition: optional
src/schema/rules/files/raw/meg.yaml-    run: optional
src/schema/rules/files/raw/meg.yaml:    processing: optional
--
src/schema/rules/files/raw/task.yaml-    ceagent: optional
src/schema/rules/files/raw/task.yaml-
src/schema/rules/files/raw/task.yaml-timeseries__meg:
src/schema/rules/files/raw/task.yaml-  $ref: rules.files.raw.task.timeseries
src/schema/rules/files/raw/task.yaml-  datatypes:
src/schema/rules/files/raw/task.yaml-    - meg
src/schema/rules/files/raw/task.yaml-  entities:
src/schema/rules/files/raw/task.yaml-    $ref: rules.files.raw.task.timeseries.entities
src/schema/rules/files/raw/task.yaml:    processing: optional

While at it - coming up with potential future complimentary cases -- if ever ultrasound is supported (some work seems to be done in the scope of BEP025 - MIDS) there might also clear separation between _rec (how image reconstructed from whatever domain data instrumentally acquired in - frequencies, coils, whatnot) and _proc (how it processed after brought into the output data domain - either it is an image or channel). Moreover, for EEG/MEG, some hardware might (if not already, don't know) be fancy to produce some 3D reconstruction via inverse problem solution (e.g. LORETA etc) of which there is a good number of approaches (min norm, beamforming, etc) which are different conceptually on how then reconstructed image is processed (the point of _proc-) before output .

So, altogether, I think it would be of benefit to have clearly separated _rec and _proc and neither clump them into a single entity for MRI, nor claim that _proc in MEG (only) is like _rec in MRI. So I have updated this PR once again - may be opinions have changed since then. I also made MEG channels less special ;)

@nicholst
Copy link
Collaborator

nicholst commented Dec 17, 2022

Agree with @yarikoptic that there is value to distinct _rec and _proc tags and that there shouldn't be different interpretations of each for different modalities. I definitely like how @yarikoptic distiguished them, which I would paraphrase

  • _rec: variant of method for transforming raw data into output domain
  • _proc: fundamental processing applied to data in output domain

The only thing that is perhaps missing in the current PR is an elaboration of what is 'fundamental', i.e. how we make it clear that ideally, generally post-acquistion manipulations are the domain of BIDS derivaties, but it is understood some manipulations are so common and that data without those manipulations are regarded as having little or reduced value and hence these "processing" manipulations deserve to be part of a core BIDS instance.

One other idea: Should we add references to _proc in PET to suggest how they could be used? (e.g. different filters applied to a ramp-recon'd image). Maybe even MRI (though I can't think of any use cases myself).

@yarikoptic
Copy link
Collaborator Author

FWIW, while thinking about "references" @nicholst mentioned, decided to look at what _rec we see already in openneuro datasets.

I might have not the most recent set of datasets, but here is what I saw
(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro[master]
$> find -iname *_rec-* | sed -e 's,.*\(_rec-[^_]*\)_.*,\1,g' | sort | uniq -c | sort -n
      2 _rec-t2sweighted
      4 _rec-arithmsum
      6 _rec-ORIG
     24 _rec-deface
     36 _rec-ehalfhalf
     64 _rec-phase
     68 _rec-DynTOF
     72 _rec-1
     72 _rec-2
    101 _rec-acdyn
    152 _rec-dico7Tad2grpbold7Tad
    152 _rec-dico7Tad2grpbold7TadBrainMask
    152 _rec-dico7Tad2grpbold7TadBrainMaskNLBrainMask
    152 _rec-dico7Tad2grpbold7TadNL
    152 _rec-dico7Tad2grpbold7TadNLWarp
    152 _rec-XFMdico7Tad2grpbold7Tad
    230 _rec-swi
    342 _rec-complex
    342 _rec-mag
    514 _rec-NORM
    764 _rec-magnitude
   1215 _rec-SCIC
   1224 _rec-dico

which

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds001293[master]git
$> ls ./sub-21/ses-r30/func/*run-02*nii.gz
./sub-21/ses-r30/func/sub-21_ses-r30_task-orientation_rec-dico_run-02_bold.nii.gz@
./sub-21/ses-r30/func/sub-21_ses-r30_task-orientation_run-02_bold.nii.gz@

@mih -- do you recall what _rec-dico was used to depict ? (git grep dico is silent)

@yarikoptic
Copy link
Collaborator Author

The only thing that is perhaps missing in the current PR is an elaboration of what is 'fundamental', ...

I would have said "any postprocessing performed by/on hardware providing "raw data" for BIDS and a selected set of operations done after: at the moment only defacing`. But I do not want to open discussion here about "defacing" to be needed or not to be explicitly annotated in BIDS file names ;-)

@oesteban
Copy link
Collaborator

One question that I can't find in your above examples (e.g., #105 (comment)), @yarikoptic:

  • This is because you want to encode BOTH versions of the file within one dataset, right? Meaning you want to have under anat/, say, the NORM and the ORIGINAL variants of that particular subject, session, and T1-weighted. Is that correct?

Otherwise, this will not solve the problem because, even in a dataset where some subjects have the normalization and some don't, in both cases if the alternative is not available then you should not use _proc- because there's no need for disambiguation.

However, I totally see your point, so perhaps, BIDS should have some way to say that if images have undergone some processing by the scanner, this should be stated with the relevant metadata entries in the corresponding sidecar jsons.

Taking again the above example where every subject of the BIDS dataset has only one T1w, but sometimes they have been normalized and others they haven't, I would say that each T1w will necessarily have a json file with a field indicating whether it was normalized or not. Marking this with the entity would be problematic (e.g., I'm thinking of fMRIPrep, which doesn't really care about proc, but it could be useful to maybe drop / or include some T1ws in case of multiplicity under the same anat/ folder).

This is to say, although the changes proposed in this PR look fine to me, I believe it only partially addresses the problem and the examples above don't seem to offer a real-case scenario where proc would help disambiguate two sibling NIfTI filenames.

@oesteban
Copy link
Collaborator

TL;DR - entities should not encode metadata, only disambiguate names

@oesteban
Copy link
Collaborator

The only thing that is perhaps missing in the current PR is an elaboration of what is 'fundamental'

I agree with Yarik on his answer to this. However, if that processing is so fundamental (and defacing is another good example) that it either voids the value of the original data or the original data cannot be shared, then that primary data will not be present in the particular folder as a sibling file, and hence no annotation should be done in the filename.

However, it would be very valuable that these annotations (the NORM dicom tag, or the fact that the image has been defaced) are made highly recommended or mandatory to be present in the metadata associated with the file.

@nicholst
Copy link
Collaborator

Responding to @yarikoptic ...

  • SCIC: At least from the one article, it seems like an integrated option ("We analyzed the ... uniform phantom taken with and without SCIC") so _rec- feels right.
  • PET: acdyn,acstat, nacdyn, nacstat, the 2x2 cross of attenuation correction yes/no and static/dynamic: Statistic/dynamic is a fundamental data/acquisition aspect, and AFAIK attenuation is usually integrated into the reconstruction. So again _rec- feels right.
  • PET: "postreconstruction filtering" is a thing in PET, if maybe not so common in brain PET, and would be clearly be a _proc- thing.

@oesteban
Copy link
Collaborator

oesteban commented Dec 20, 2022

This discussion reminds me of that around dir-<label>, where @chrisgorgo was very strong on that <label> should not encode the metadata itself (hence, he was proposing using just indexes) because there's clearly a situation where you could attempt to encode the information in the filename, and have a file with dir-AP and then json sidecar with "PhaseEncodingDirection": "j" (or more confusingly, "PhaseEncodingDirection": "i-").

At the moment, I didn't understand why Chris was so strong in his concerns. After seeing how that <label> has been unintendedly misused over time, I now have no question that he was absolutely right, and an index would've been sufficient and clearer.

This is to say: _proc- should not be OPTIONAL, but rather REQUIRED iff there is a sibling file from which to disambiguate.

@robertoostenveld
Copy link
Collaborator

for continuous wave NIRS, I think that proc would apply if light intensities have been measured and initially stored in the data file, and the data file gets later converted to HbO/HbR concentrations.

@yarikoptic
Copy link
Collaborator Author

@oesteban

TL;DR - entities should not encode metadata, only disambiguate names

is this a principle of BIDS? I don't think so, or at least I failed to find it. If you think it should be -- it would be worth a separate PR/discussion and possibly even some kind of a vote since I think it is too fundamental to just be "sneaked in" in some PR with a limited set of eyes.

Since it is not a principle -- we should not assume it is! Why I think it shouldn't be: From my point of view, the beauty of BIDS is that it allows both machines and humans to read those datasets, and specifically for humans to quickly grasp the specifics of the data, without resorting to look into each and every .json file. To improve such "human readability" many people (myself included) prefer to use entities to depict specifics of the data even if no disambiguation is needed. E.g. I constantly see people using _acq- entity to encode some information about acquisition sequence. Yes, that code might be in-congruent with actual (meta)data, but since we have them all as unregulated <label>, I take them all more as a "hint" on what data is intended to be, rather than 100%-certain metadata.

FWIW -- a list of _acq entities I found in openneuro. Indeed many are "too cryptic" to be "grasped" without intimate knowledge of the dataset, but some do give a hint on the (possibly different from another file) specifics of the file:
      2 _acq-16
      2 _acq-CAP
      2 _acq-drawing01
      2 _acq-drawing02
      2 _acq-drawing03
      2 _acq-FA15
      2 _acq-FA20
      2 _acq-FA30
      2 _acq-florbetapir
      2 _acq-FullExample
      2 _acq-LPA
      2 _acq-mpragePURE
      2 _acq-NAS
      2 _acq-render01
      2 _acq-render02
      2 _acq-RPA
      2 _acq-T807AV1451
      2 _acq-te16
      3 _acq-14
      3 _acq-voxelsize-112
      4 _acq-1Daccel2shot
      4 _acq-b1000n3r22x22x22peAPA
      4 _acq-b1000n40r22x22x22peAPP
      4 _acq-D1S28
      4 _acq-D1S38
      4 _acq-D2S24
      4 _acq-FA25
      4 _acq-GD71
      4 _acq-HighResHippo
      5 _acq-0
      5 _acq-645
      5 _acq-D1S22
      5 _acq-D1S33
      5 _acq-D1S41
      5 _acq-D2S26
      5 _acq-D4S10
      5 _acq-D4S2
      5 _acq-D4S20
      5 _acq-D4S22
      5 _acq-D4S24
      5 _acq-D4S6
      5 _acq-D4S7
      5 _acq-D4S8
      5 _acq-noaccel1shot
      5 _acq-r09x09x09
      6 _acq-D3S20
      6 _acq-Ds2
      6 _acq-fspgr
      6 _acq-VariableTE
      7 _acq-b2465n90r25x25x25peAPP
      8 _acq-1400
      8 _acq-15
      8 _acq-b1000n29r21x21x22peAPP
      8 _acq-b2000n56r22x22x22peAPP
      8 _acq-c
      8 _acq-D1S1
      8 _acq-D3S18
      8 _acq-motion
      8 _acq-nomotion
      8 _acq-UNI
      9 _acq-D3S19
      9 _acq-sub0001
      9 _acq-sub0002
      9 _acq-sub0003
      9 _acq-sub0004
      9 _acq-sub0005
      9 _acq-sub0006
      9 _acq-sub0007
      9 _acq-sub0008
      9 _acq-sub0009
      9 _acq-sub0010
      9 _acq-v4
     10 _acq-D1S23
     10 _acq-D2S23
     10 _acq-space
     11 _acq-13
     11 _acq-D2S25
     11 _acq-D3S16
     11 _acq-v1
     11 _acq-v2
     12 _acq-b1000n32r09x09x25peAPP
     12 _acq-b1000n96r09x09x25peAPP
     12 _acq-b1500n96r09x09x25peAPP
     12 _acq-b2000n96r09x09x25peAPP
     12 _acq-b2500n96r09x09x25peAPP
     12 _acq-frfse
     12 _acq-fse
     12 _acq-MT
     12 _acq-NoDef
     12 _acq-post
     12 _acq-pre
     12 _acq-t2
     13 _acq-1
     14 _acq-D3S14
     14 _acq-D3S15
     14 _acq-inv1
     14 _acq-inv2
     14 _acq-multiband2p4tiltAP
     14 _acq-multiband2p4tiltPA
     14 _acq-multiband3p0MB2AP
     14 _acq-multiband3p0MB2PA
     14 _acq-multiband3p0tiltAP
     14 _acq-multiband3p0tiltPA
     14 _acq-uni
     16 _acq-b1000n40r25x25x25peAPP
     16 _acq-b2000n56r25x25x25peAPP
     16 _acq-D3S17
     16 _acq-SagittalMPRAGE
     18 _acq-scoring
     19 _acq-D2S22
     19 _acq-multiband3p0flatAP
     19 _acq-multiband3p0flatPA
     20 _acq-2d
     21 _acq-multiband2p4flatAP
     21 _acq-multiband2p4flatPA
     22 _acq-cf0PA
     22 _acq-cf1AP
     22 _acq-cf1PA
     24 _acq-concat
     25 _acq-D1S19
     25 _acq-D1S20
     25 _acq-spc
     26 _acq-sms3ip22mm1020GLM
     27 _acq-1Daccel1shot
     28 _acq-b2465n96r25x25x25peAPP
     28 _acq-topup3
     32 _acq-ADNI
     32 _acq-b1000n32r19x19x22peAPP
     32 _acq-cf2AP
     32 _acq-cf2PA
     32 _acq-PHILLIPS
     33 _acq-SB2mm
     33 _acq-SB3pt3mm
     35 _acq-D3S13
     36 _acq-b1000n3r25x25x25peAPP
     36 _acq-b1000n4r09x09x25peAPA
     36 _acq-cf0AP
     36 _acq-D3S12
     36 _acq-EPI2
     36 _acq-EPI3
     36 _acq-HDgrid
     38 _acq-D2S21
     38 _acq-EPI1
     40 _acq-ShortInv
     42 _acq-b1000n30r25x25x25peAPP
     42 _acq-fatsat
     44 _acq-dbsoff
     44 _acq-dbson
     44 _acq-SaggitalNsig1Mm1
     44 _acq-Sagittal3d
     46 _acq-Nsig1Mm1
     48 _acq-12
     48 _acq-b0n0r25x25x25peAPP
     48 _acq-D1S18
     48 _acq-mb6
     49 _acq-axial
     49 _acq-ge
     50 _acq-D3S5
     51 _acq-D3S11
     52 _acq-localizer
     52 _acq-voxelsize112
     52 _acq-voxelsize222
     52 _acq-voxelsize333
     52 _acq-voxelsize555
     53 _acq-D2S20
     56 _acq-b1000n96r19x19x22peAPP
     56 _acq-b1500n96r19x19x22peAPP
     56 _acq-b2000n96r19x19x22peAPP
     56 _acq-b2500n96r19x19x22peAPP
     56 _acq-filt
     58 _acq-D3S9
     58 _acq-MP2R
     60 _acq-D3S3
     60 _acq-dlpfcactive
     60 _acq-dlpfcsham
     60 _acq-flairpmcoff
     60 _acq-flairpmcon
     60 _acq-m1active
     60 _acq-m1sham
     60 _acq-pha
     60 _acq-ppcactive
     60 _acq-ppcsham
     61 _acq-5
     62 _acq-MB9
     64 _acq-moldON
     66 _acq-MB12
     66 _acq-MB2
     66 _acq-MB3PA
     66 _acq-MB4
     66 _acq-MB6
     66 _acq-MB8
     67 _acq-9
     67 _acq-MB3
     68 _acq-moldOFF
     70 _acq-soccer
     72 _acq-AxialNsig2Mm1
     72 _acq-b1000n3r06x06x25peAPA
     72 _acq-D3S10
     72 _acq-D3S7
     76 _acq-t2starpmcoff
     76 _acq-t2starpmcon
     78 _acq-8
     80 _acq-coronal
     82 _acq-te14
     84 _acq-4
     84 _acq-D2S18
     85 _acq-D2S19
     89 _acq-prefrontal
     92 _acq-b0
     92 _acq-b0n0r25x25x25peAPA
     96 _acq-autozshim
     96 _acq-autozshimTE30
     96 _acq-autozshimTE40
     96 _acq-autozshimTE50
     96 _acq-b1000n3r19x19x22peAPA
     96 _acq-b3000n96r25x25x25peAPP
     96 _acq-manualzshim
     96 _acq-manualzshimTE30
     96 _acq-manualzshimTE40
     96 _acq-manualzshimTE50
     96 _acq-nozshim
     96 _acq-nozshimTE30
     96 _acq-nozshimTE40
     96 _acq-nozshimTE50
    100 _acq-3dCUBE
    100 _acq-D1S17
    101 _acq-fallypride
    107 _acq-D3S8
    108 _acq-memp2rage
    110 _acq-b1000n32r25x25x25peAPP
    112 _acq-UTE
    116 _acq-FieldMap
    116 _acq-FLASH
    116 _acq-TSE
    120 _acq-GD72
    120 _acq-seepi
    122 _acq-3
    126 _acq-02
    126 _acq-FSPGR
    127 _acq-sagittal
    128 _acq-1mm
    128 _acq-revpe
    131 _acq-D3S6
    132 _acq-diff
    132 _acq-t1tirmpmcoff
    132 _acq-t1tirmpmcon
    132 _acq-t2tsepmcoff
    132 _acq-t2tsepmcon
    133 _acq-7
    135 _acq-D1S16
    136 _acq-2
    136 _acq-fmri
    140 _acq-calibration
    140 _acq-crosstalk
    144 _acq-epi
    146 _acq-11
    146 _acq-sbref
    150 _acq-25mm
    150 _acq-tse
    155 _acq-D3S4
    161 _acq-Mb4Mm2Tr1600
    161 _acq-Mb4Mm3Tr550
    161 _acq-r10x10x10
    162 _acq-6
    172 _acq-D2S16
    178 _acq-fullbrain
    179 _acq-D3S2
    182 _acq-b2500n96r25x25x25peAPP
    186 _acq-biasBody
    186 _acq-biasReceive
    191 _acq-D2S17
    192 _acq-seAP
    192 _acq-sePA
    192 _acq-sequential
    192 _acq-zref
    197 _acq-se
    198 _acq-lr
    198 _acq-rl
    204 _acq-FM
    204 _acq-te15
    206 _acq-AxialNsig1Mm1
    209 _acq-b1000n96r25x25x25peAPP
    209 _acq-b1500n96r25x25x25peAPP
    216 _acq-ASL
    216 _acq-FSEMS
    216 _acq-spinecho
    216 _acq-spinechopf68
    217 _acq-b2000n96r25x25x25peAPP
    220 _acq-mpragepmcoff
    220 _acq-mpragepmcon
    222 _acq-MP2RAGE
    224 _acq-phase
    226 _acq-D1S15
    228 _acq-pixar
    230 _acq-GEEPI
    230 _acq-GEMS
    230 _acq-SEEPI
    232 _acq-bang
    236 _acq-bold
    237 _acq-10
    272 _acq-render
    280 _acq-breathhold
    284 _acq-mag
    296 _acq-01
    299 _acq-D1S14
    302 _acq-2dADNI2
    303 _acq-Mm2
    307 _acq-CMRR
    312 _acq-GD79
    314 _acq-07
    321 _acq-4mmFA48
    334 _acq-D2S15
    335 _acq-D2S5
    343 _acq-D2S3
    348 _acq-b1000n3r25x25x25peAPA
    352 _acq-recovery
    352 _acq-SE
    354 _acq-GD33
    364 _acq-FSE
    367 _acq-D2S14
    372 _acq-cbsfl
    378 _acq-lowres
    381 _acq-D2S9
    396 _acq-b
    401 _acq-D1S4
    406 _acq-mprage
    407 _acq-D2S7
    412 _acq-D1S6
    413 _acq-task
    424 _acq-dwi
    441 _acq-topup
    473 _acq-pasat
    480 _acq-mb
    484 _acq-CUBE
    484 _acq-D1S10
    504 _acq-raw
    507 _acq-D2S12
    533 _acq-singleband
    540 _acq-Fs2
    560 _acq-SB
    562 _acq-D1S12
    564 _acq-headmotion1
    567 _acq-topup1
    567 _acq-topup2
    569 _acq-D1S8
    572 _acq-gre
    575 _acq-D1S13
    588 _acq-headmotion2
    592 _acq-standard
    602 _acq-a
    616 _acq-GD31
    635 _acq-D2S13
    657 _acq-Mb4Mm27Tr700
    688 _acq-baseline
    690 _acq-MGE
    698 _acq-dico
    704 _acq-stroop
    762 _acq-D2S10
    784 _acq-SEfmap02
    796 _acq-SEfmap01
    807 _acq-SeqMm3Tr2000
    813 _acq-D2S11
    840 _acq-MB
    848 _acq-seeg
    856 _acq-rest
    876 _acq-func
    908 _acq-SEfmapBOLDpost
    908 _acq-SEfmapBOLDpre
    908 _acq-SEfmapDWI
    924 _acq-fMRI
    999 _acq-mp2rage
   1054 _acq-T1w
   1055 _acq-D1S11
   1056 _acq-MToff
   1056 _acq-MTon
   1067 _acq-ecog
   1133 _acq-MPrageHiRes
   1214 _acq-highres
   1240 _acq-D2S8
   1245 _acq-D2S6
   1307 _acq-MPRAGE
   1320 _acq-D2S4
   1451 _acq-D2S2
   1547 _acq-D1S9
   1626 _acq-D1S7
   1664 _acq-D1S5
   1803 _acq-D1S3
   1841 _acq-AP
   1858 _acq-RARE
   1863 _acq-combecho
   1880 _acq-clinical
   2028 _acq-D1S2
   2212 _acq-b1000n40r21x21x22peAPP
   2216 _acq-b2000n56r21x21x22peAPP
   2220 _acq-b1000n3r21x21x22peAPA
   2609 _acq-multiband
   3580 _acq-ap
   3584 _acq-pa
   3722 _acq-GEfmap
   6902 _acq-EPI
   7006 _acq-mb4PA
   7038 _acq-mb4AP
   7842 _acq-PA
  11298 _acq-normal
  15268 _acq-mb3
  60744 _acq-seq

Then for BIDS conversion/processing tools it would be much easier to just add seemingly applicable entity (e.g. _proc- for NORMed somehow images) without first checking that there would be no other ambiguous file without _proc.

NB related - inspired by BIDS in our dandiarchive.org DANDI layout, our dandi organize command was using metadata entities only if needed to disambiguate. Recently we added option to explicitly specify entities to use to 1. be able to incrementally add data without needing to rename other files which now would need to be "disambiguated", 2. work only on a subset of files (we deal with some large datasets at times), 3. simply the desire to e.g. have _ses- entity always (even if all of the files have the same session per subject).

This is to say: _proc- should not be OPTIONAL, but rather REQUIRED iff there is a sibling file from which to disambiguate.

many entities (acq, part, dir...) are already used only "iff there is a sibling file from which to disambiguate". And multiple of them are available to choose the best fitting disambiguation entity. I do not think we could anyhow code up exhaustively the "REQUIRED" notion you are suggesting, although advisories could be verbalized, pretty much like @nicholst made for dir vs proc above.

@yarikoptic
Copy link
Collaborator Author

Otherwise, this will not solve the problem because, even in a dataset where some subjects have the normalization and some don't, in both cases if the alternative is not available then you should not use _proc- because there's no need for disambiguation.

oh, although it would anger bids-validator I think I might even ask people to add _proc to those subjects anatomicals which have ONLY normalized. Because otherwise such difference of preprocessing of anatomicals across subjects would go unnoticed, and then we might see nature paper like "X party members have smoother/bigger/smaller amygdala" due to apparently a confound of technician liking to give his party buddies a "smoother looking brain" or just due to a pure virtue of chance.

@oesteban
Copy link
Collaborator

is this a principle of BIDS?

I'm sure you are going to disagree, but yes, my story about the phase encoding direction is actually the example for this in the specification:

In cases where an entity and a metadata field convey similar contextual information, the presence of an entity should not be used as a replacement for the corresponding metadata field. For instance, in echo-planar imaging MRI, the dir-<label> entity MAY be used to distinguish files with different phase-encoding directions, but the file's PhaseEncodingDirection MUST be specified as metadata.

The fact that users encode metadata in entities (and I agree there are many examples in OpenNeuro) should not lead us to promote those practices from the spec itself. We can discuss about what of this can/should be enforced by the validator, but that's a different story. Recently, I stumbled upon a dataset that has fieldmaps calculated with fMRIPrep's fieldmapless under the raw's fmap/ folder. This modification opens the door to have those fieldmaps with a _proc-fmriprep in them and we circle back to @nicholst's argument about fundamentals.

Finally, I think you are confusing readability with understandability. For a human, it is always more readable sub-01_T1w.nii.gz than sub-01_proc-Norm_T1w.nii.gz. True, for an expert, the latter is way more informative and is one grep away from grep Norm sub-01_T1w.json.

Like it proved true for _dir-, I'm quite confident this is adding unnecessary flexibility with will certainly lead to "saving" relevant metadata annotations in the corresponding JSON files. Like for _dir- this should come with an explicit statement that whatever the entity _proc- encodes, should be clearly described in the metadata. If that is true, then there's no point in accepting "lazy" labels, and rather use indexes (e.g., 0 -> Norm + defacing; 1 -> NormMethod2 + defacing)

For machines I don't think your use case is compelling. As a software developer, I prefer to trust the metadata files which are more likely to have been generated by, e.g., dcm2niix than filenames that can be manipulated by humans.

In my opinion, above making it nice for humans, BIDS' goal is to make it harder for humans to commit mistakes when sharing their data. To me, and although you haven't explicitly confirmed that _proc- is proposed to be used in any context even in the absence of sibling files, this looks like opening more doors to the misusage of BIDS.

@yarikoptic
Copy link
Collaborator Author

we really need some kind of a bidscon-2023 to get some proper discussion on this aspect. But I think

  • it is orthogonal to this particular PR, unless you think we should say something specific about _proc, but:
  • we shouldn't really make some entities "very special" in how they are treated, so the "principles" should apply generally and thus I would prefer not to say anything "special" about _proc.
  • in your example @oesteban it talks about replacement, which is not what I am advocating. "MAY be used in case X" doesn't imply that it "SHOULD NOT be used in case Y", so example is not really about this.

@oesteban
Copy link
Collaborator

  • it is orthogonal to this particular PR

I don't know how you define the basis functions, but IMHO there's no misalignment between my comments and the spirit of this PR. They are colinear and in seemingly opposing directions.

  • in your example @oesteban it talks about replacement, which is not what I am advocating.

That is exactly the problem (I imagine by "your example" you are actually referring to the BIDS specs excerpt I copy-pasted, right?)

I haven't seen any language in this PR stating something along those lines:

The proc-<label> entity MAY be used to distinguish files with different processing provenance [...], but such processing annotations MUST be specified as metadata, e.g., "ProcessingSteps": ["Norm", "Defacing"].

Obviously, if you don't give those annotations the level of necessary metadata, then you don't have to bother about the sidecar json. But if they are not so important, why do we bother to allow a new entity?

Nonetheless, and since you asked for a "common principle", there's little room for interpretation of the first sentence:

In cases where an entity and a metadata field convey similar contextual information, the presence of an entity should not be used as a replacement for the corresponding metadata field.

To me, this is a clear statement that metadata cannot be omitted if an entity encodes similar contextual information. This is an unclear way of stating that entities should not encode metadata.

Finally, you haven't commented on my counter-argument of readability. For my daughter, who is a human and is starting to read, it is much easier to read sub-01_T1w.nii.gz than sub-01_proc-Norm_T1w.nii.gz.

All of this is important because if proc- (or rec-, which I know is already in the spec) can be used without disambiguation purposes, it opens more and more windows to the scenario where two different researchers with good knowledge of the BIDS standard and starting from the same DICOM dataset end up having two very different BIDS datasets. Again, you're going to tell me that bijection is not a principle of BIDS, but I hope we can agree that ideally, DICOM-BIDS conversions should be as bijective as possible. These changes will add challenges to reproducibility of secondary uses of data.

Moreover, from the software perspective, there's no answer to my example with fieldmaps generated by fMRIPrep. If proc- is allowed, researchers can do all kinds of tricks to stick heavily derived data into a BIDS (raw) tree. If we accept this as is, what is "fundamental" will be contingent on the individual intention.

@yarikoptic
Copy link
Collaborator Author

NB Shameless plug: I recalled about this PR as the 2nd oldest in the PRs https://github.com/con folks are associated with, as listed in a section of a report I have produced with https://github.com/con/solidation . This PR was initiated 5 years ago!!!

At first, given the old age of this PR and it being just me carrying this battle, I thought to just close, but then decided to continue our discussion because people recently asked again about derived files from the scanner to be provided (nipy/heudiconv#503 and more recent ReproNim/reproin#58 (comment)) and I neither like to abuse _rec- entity (since not about reconstruction), nor _desc- (since only for derived, not raw, BIDS as not "coming from hardware"), nor I want to add those files to .bidsignore. So -- let's keep the conversation going.

I will not talk about a scare of people adding _norm- whenever it is not needed (a single _T1w.nii.gz) since that argument applies to any entity and we already had adjusted wording recently https://github.com/bids-standard/bids-specification/pull/1392/files to accent on "distinctiveness" while preparing file names.

I reflected on @oesteban suggestion to add

but such processing annotations MUST be specified as metadata, e.g., "ProcessingSteps": ["Norm", "Defacing"].

and started to create a patch to add ProcessingSteps metadata field requirement
commit daedd8e9b2b1421b2aa294c311fa4c288f41641b
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Mon Mar 13 15:42:29 2023 -0400

    WIP

diff --git a/src/schema/objects/entities.yaml b/src/schema/objects/entities.yaml
index 8a992ff9..8bf103ab 100644
--- a/src/schema/objects/entities.yaml
+++ b/src/schema/objects/entities.yaml
@@ -223,6 +223,14 @@ processing:
     which some installations impose to be run on raw data because of active
     shielding software corrections before the MEG data can actually be
     exploited.
+
+    This entity represents the `"ProcessingSteps"` metadata field.
+    Therefore, if the `proc-<label>` entity is present in a filename,
+    `"ProcessingSteps"` MUST be defined in the associated metadata.
+    The order of the steps listed is not guaranteed to correspond to order
+    of steps performed by the hardware.
+    Please note that the `<label>` does not need to match the actual value of the field.
+
   type: string
   format: label
 reconstruction:
but then looked at which entities have (or not) "corresponds to metadata" statement
❯ grep -E '(^[a-z]+:|represents)' src/schema/objects/entities.yaml
acquisition:
atlas:
ceagent:
    This entity represents the `"ContrastBolusIngredient"` metadata field.
chunk:
density:
    This entity represents the `"Density"` metadata field.
description:
direction:
    This entity represents the `"PhaseEncodingDirection"` metadata field.
echo:
    This entity represents the `"EchoTime"` metadata field.
flip:
    This entity represents the `"FlipAngle"` metadata field.
hemisphere:
inversion:
    This entity represents the `"InversionTime` metadata field.
label:
modality:
mtransfer:
    This entity represents the `"MTState"` metadata field.
part:
processing:
reconstruction:
recording:
resolution:
    This entity represents the `"Resolution"` metadata field.
run:
sample:
session:
space:
split:
stain:
    This entity represents the `"SampleStaining"` metadata field.
subject:
task:
tracer:
    This entity represents the `"TracerName"` metadata field.

to realize that in near all (if not all) those we are talking about very specific value of metadata, e.g. resolution or various times etc, which might not be properly represented in file name as labeles (due to e.g. use of .). The same about PhaseEncodingDirection which was brought up as an argument -- value is important to decide to how to do fieldmap correction.
Here, as in reconstruction (AKA _rec-), acquisition (AKA _acq-) etc. we are talking about having an arbitrary label to distinguish conflicting files along that specific (reconstruction, acquisition, processing, ...) dimension without explicitly giving semantic/usable value for further analysis. Sure thing eventually someone could (and should) suggest to use some ontology to describe values for potential "ProcessingSteps", and that is when it would be useful to add it. Or may be it would be born while working on PROV support (#487 BEP028). May be someone could even suggest right away to use ImageType from DICOM right away , but I don't think that _proc- should be limiting to DICOM (as it already is not). Also, again, _proc- is already there for another modality.

So, to summarize, IMHO

  • _proc is already there and used in other modalities
  • _proc is semantically different from _rec in case of MRI and there is number of use cases/users asking for something "along its intention"
  • addition of "ProcessingSteps" to metadata would add no value whatsoever since the actual value there would not be really adding some semantic beyond distinguishing as based on label in the filename
  • not every entity carries a label which immediately warrants addition of sidecar metadata field

@yarikoptic
Copy link
Collaborator Author

Re "ProcessingSteps" metadata field: during meeting on BIDS Derivatives in Copenhagen 2023 we progressed forward (see summary) on formalization of "bids derivatives" but also capturing "descriptions" for _desc entities labels within descriptions.tsv. _proc here is very similar in purpose IMHO to desc, and thus likely could benefit as well from something like processings.tsv?

Overall it also hints on present plurality worth analysis/formalization, e.g. in relationship to generalizing BIDS to facilitate flexible layout (bids-standard/bids-2-devel#54): for an entity in BIDS we might expect either

  • a sidecar metadata field(s) (e.g. _echo- requires EchoTime). Inheritance could be used to absorb common values, but rarely done so unless tools automate that (like heudiconv does for top level task-*.json files)
  • .tsv file describing the entity and its metadata fields (sub - participants.tsv, sample - samples.tsv, etc) . Typically common across subject/sessions and defined on top, but inheritance also applies etc.

Do we have cases for both?

@octomike
Copy link

I'm a little scared to ask, but.. any decision about this on the horizon? I still keep _proc-norm hoping it'll be valid at some point :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BEP: Make proc filename keyword optional for other (T1, EPI, etc) modalities