[ENH] improve management of "intended_for" #151

Remi-Gau · 2021-02-16T11:14:13Z

anything to do with files with a potential intendedFor metadata field is done after the index of the dataset
the intended_for field of file X contains the fullpath of all the existing target files that X is actually intended for
each target file of the intendedFor list of file X get an informed_by field with the fullpath for X

Does not include a way to query the intended_for or informed_by field.

For reviewers:

This is more on the implementation side, but in terms of "output" I am wondering what would the most useful for users. Ideally, this approach would then be extended to events and other "associated files".

should informed_by (pybids uses this I think) replace the dependencies field? Which one is more "intuitive"?

See example below

BIDS = bids.layout(fullfile(pth_bids_example, 'asl004'));

>> BIDS.subjects.fmap

ans = 

  struct with fields:

        filename: 'sub-Sub1_dir-pa_m0scan.nii.gz'
             ext: '.nii.gz'
          suffix: 'm0scan'
        entities: [1×1 struct]
            meta: [1×1 struct]
    intended_for: {'/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/perf/sub-Sub1_m0scan.nii.gz'}


>> BIDS.subjects.perf(1)

ans = 

  struct with fields:

        filename: 'sub-Sub1_asl.nii.gz'
             ext: '.nii.gz'
          suffix: 'asl'
        entities: [1×1 struct]
            meta: [1×1 struct]
    dependencies: [1×1 struct]
     informed_by: [1×1 struct]
    intended_for: []

>> BIDS.subjects.perf(1).informed_by.perf

ans =

    '/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/perf/sub-Sub1_m0scan.nii.gz'

>> BIDS.subjects.perf(4)

ans = 

  struct with fields:

        filename: 'sub-Sub1_m0scan.nii.gz'
             ext: '.nii.gz'
          suffix: 'm0scan'
        entities: [1×1 struct]
            meta: [1×1 struct]
    dependencies: ''
     informed_by: [1×1 struct]
    intended_for: {'/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/perf/sub-Sub1_asl.nii.gz'}


>> BIDS.subjects.perf(4).informed_by.fmap

ans =

    '/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/fmap/sub-Sub1_dir-pa_m0scan.nii.gz'

codecov · 2021-02-16T11:27:37Z

Codecov Report

Merging #151 (39ddbcb) into dev (a38ea5c) will not change coverage.
The diff coverage is 0.00%.

@@          Coverage Diff          @@
##             dev    #151   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files         22      25    +3     
  Lines        892     920   +28     
=====================================
- Misses       892     920   +28

Flag	Coverage Δ
unittests	`0.00% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
+bids/+internal/get_metadata.m	`0.00% <ø> (ø)`
+bids/+internal/return_file_index.m	`0.00% <0.00%> (ø)`
+bids/+internal/return_file_info.m	`0.00% <0.00%> (ø)`
+bids/+internal/return_subject_index.m	`0.00% <0.00%> (ø)`
+bids/layout.m	`0.00% <0.00%> (ø)`
+bids/query.m	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a38ea5c...39ddbcb. Read the comment docs.

+bids/+internal/get_metadata.m

nbeliy · 2021-02-16T12:11:32Z

+bids/layout.m

+                                        pth,  ...
+                                        ['^' strrep(filename, ['.' ext], '\.json') '$']);
+    if ~isempty(tsv_file)
+      structure.meta = bids.util.jsondecode(tsv_file);


If both tsv and json are readed into memory, it would be worth to check their consistency -- ie if all columns in tsv are defined in json.

Yeah not sure about especially if we drop reading the files during indexing (see below).

It is question of design. My opinion (purely subjectif):
For layout metadata is needed only for parcing IntendedFor fields
For query returning full structure and just path to json file are equivalent, retrieving full stucture is trivial using bids.util.jsondecode, hence no need to return full structure.
But, what can be useful is to query on individual fields of meta, i.e. "show me all files with repetition time == 0.03". Then contains of meta will become important for query. But it is difficult to implement (espetualy if queried values are numbers and not strings).

The question if load in memory all meta, or will read given file each time we look in it -- is open. One economise time, other memory. For me it is better to read each time -- layout and query will be called once, and before processing images which will dominate calculation time. The json files are small, so user will see slow-down only in huge datasets (but there any solution will be slow).

For indexing, I can't comment beacause of lack of matlab knowelege

It is question of design. My opinion (purely subjectif):
😉

For layout metadata is needed only for parcing IntendedFor fields
For query returning full structure and just path to json file are equivalent, retrieving full stucture is trivial using bids.util.jsondecode, hence no need to return full structure.

Better to use get_metadata that should take care of the inheritance principle but I see what you mean.

But, what can be useful is to query on individual fields of meta, i.e. "show me all files with repetition time == 0.03". Then contains of meta will become important for query. But it is difficult to implement (espetualy if queried values are numbers and not strings).

There is some code "ruins" in bids.query about this: so I will leave that there for though I agree that doing this on query could be painful.

The question if load in memory all meta, or will read given file each time we look in it -- is open. One economise time, other memory. For me it is better to read each time -- layout and query will be called once, and before processing images which will dominate calculation time. The json files are small, so user will see slow-down only in huge datasets (but there any solution will be slow).

I agree that the bottle neck in our pipeline is definitely not json metadata reading. And when moving to big data set, we might have to go a completely different way anyway.

+bids/layout.m

nbeliy · 2021-02-16T12:35:04Z

+bids/query.m

@@ -257,7 +257,7 @@

            case 'dependencies'
              if isfield(d(k), 'dependencies')
-                result{end + 1} = d(k).dependencies;
+                result{end + 1, 1} = d(k).dependencies;


In general query returns a array of strings, except for dependencies, in which case it will return an array of structure.

Yeah metadata is also handled that way.

If you ask info about a single file then you get a structure.

If you ask for metada or dependencies about several files, you can get structures that have different shapes so you can't concatenate them in something prettier and easier to handle. 😭

nbeliy

Not fun of making each dependency in its own field, but I can see the logic behind.

Remi-Gau · 2021-02-18T11:58:57Z

Not fun of making each dependency in its own field, but I can see the logic behind.

yeah I agree that this is a bit cumbersome, if we find a more elegant solution down the line let's see how to improve.

nbeliy · 2021-02-18T12:56:32Z

Sorry to reopening the request. There two issues that I encountered during the tests:

Do not take into account qMRI fieldmaps (TB1EPI etc...)
The json list in IntendedFor is transformed into cell and not structure, hence it avoid elseif on line 527

ans =

  6x1 cell array

    {'anat/sub-HC1278_echo-*_acq-T1w_part-phase_MPM.nii.gz'  }
    {'anat/sub-HC1278_echo-*_acq-T1w_part-mag_MPM.nii.gz'    }
    {'anat/sub-HC1278_echo-*_acq-MToff_part-mag_MPM.nii.gz'  }
    {'anat/sub-HC1278_echo-*_acq-MToff_part-phase_MPM.nii.gz'}
    {'anat/sub-HC1278_echo-*_acq-MTon_part-mag_MPM.nii.gz'   }
    {'anat/sub-HC1278_echo-*_acq-MTon_part-phase_MPM.nii.gz' }

Proposition of solution (not sure if it will broke something else):
1a) Search IntendedFor in each of json files
1b) Search IntendedFor in each file in modality fmap, not sure how it will work for coodsystem and m0

Remi-Gau · 2021-02-18T18:00:15Z

@all-contributors please add @nbeliy for review

allcontributors · 2021-02-18T18:00:23Z

@Remi-Gau

I've put up a pull request to add @nbeliy! 🎉

Remi-Gau added 10 commits February 15, 2021 18:59

make query returns dependencies as structure

65b630e

handle dwi dependencies

d543a17

refactor handling of tsv files

9bf711a

update help section

28819b3

change query output size for metadata and depedency

aa0d469

refactor tsv handling

0b8e560

create functions to return indices in BIDS structure

fc25126

refactor intended_for field

f3119bf

add comments

0b9f538

linting

5287329

Remi-Gau requested review from gllmflndn, HenkMutsaerts, nbeliy and tiborauer February 16, 2021 11:24

nbeliy reviewed Feb 16, 2021

View reviewed changes

+bids/+internal/get_metadata.m Outdated Show resolved Hide resolved

This was linked to issues Feb 16, 2021

Dealing with intendedFor and "file dependencies" #130

Closed

generalize usage of intended_for field where applicable and dependency sub-structure #143

Closed

nbeliy reviewed Feb 16, 2021

View reviewed changes

Remi-Gau added 3 commits February 16, 2021 15:51

improve intended_for fullpath generation

786cab3

remove reading tsv content for most files at the subject level

3a9ca75

improve help section

39ddbcb

nbeliy approved these changes Feb 18, 2021

View reviewed changes

Remi-Gau merged commit 87416fd into bids-standard:dev Feb 18, 2021

Remi-Gau mentioned this pull request Feb 18, 2021

Intended for does not cover qMRI cases #157

Open

allcontributors bot mentioned this pull request Feb 18, 2021

docs: add nbeliy as a contributor #159

Merged

Remi-Gau deleted the remi-intended_for branch February 26, 2021 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] improve management of "intended_for" #151

[ENH] improve management of "intended_for" #151

Remi-Gau commented Feb 16, 2021 •

edited

Loading

codecov bot commented Feb 16, 2021 •

edited

Loading

nbeliy Feb 16, 2021

Remi-Gau Feb 16, 2021

nbeliy Feb 16, 2021

Remi-Gau Feb 16, 2021

nbeliy Feb 16, 2021

Remi-Gau Feb 16, 2021

nbeliy left a comment

Remi-Gau commented Feb 18, 2021

nbeliy commented Feb 18, 2021

Remi-Gau commented Feb 18, 2021

allcontributors bot commented Feb 18, 2021

[ENH] improve management of "intended_for" #151

[ENH] improve management of "intended_for" #151

Conversation

Remi-Gau commented Feb 16, 2021 • edited Loading

codecov bot commented Feb 16, 2021 • edited Loading

Codecov Report

nbeliy Feb 16, 2021

Choose a reason for hiding this comment

Remi-Gau Feb 16, 2021

Choose a reason for hiding this comment

nbeliy Feb 16, 2021

Choose a reason for hiding this comment

Remi-Gau Feb 16, 2021

Choose a reason for hiding this comment

nbeliy Feb 16, 2021

Choose a reason for hiding this comment

Remi-Gau Feb 16, 2021

Choose a reason for hiding this comment

nbeliy left a comment

Choose a reason for hiding this comment

Remi-Gau commented Feb 18, 2021

nbeliy commented Feb 18, 2021

Remi-Gau commented Feb 18, 2021

allcontributors bot commented Feb 18, 2021

Remi-Gau commented Feb 16, 2021 •

edited

Loading

codecov bot commented Feb 16, 2021 •

edited

Loading