Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] improve management of "intended_for" #151

Merged
merged 13 commits into from
Feb 18, 2021

Conversation

Remi-Gau
Copy link
Collaborator

@Remi-Gau Remi-Gau commented Feb 16, 2021

  • anything to do with files with a potential intendedFor metadata field is done after the index of the dataset
  • the intended_for field of file X contains the fullpath of all the existing target files that X is actually intended for
  • each target file of the intendedFor list of file X get an informed_by field with the fullpath for X

Does not include a way to query the intended_for or informed_by field.


For reviewers:

This is more on the implementation side, but in terms of "output" I am wondering what would the most useful for users. Ideally, this approach would then be extended to events and other "associated files".

  • should informed_by (pybids uses this I think) replace the dependencies field? Which one is more "intuitive"?

See example below

BIDS = bids.layout(fullfile(pth_bids_example, 'asl004'));

>> BIDS.subjects.fmap

ans = 

  struct with fields:

        filename: 'sub-Sub1_dir-pa_m0scan.nii.gz'
             ext: '.nii.gz'
          suffix: 'm0scan'
        entities: [1×1 struct]
            meta: [1×1 struct]
    intended_for: {'/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/perf/sub-Sub1_m0scan.nii.gz'}


>> BIDS.subjects.perf(1)

ans = 

  struct with fields:

        filename: 'sub-Sub1_asl.nii.gz'
             ext: '.nii.gz'
          suffix: 'asl'
        entities: [1×1 struct]
            meta: [1×1 struct]
    dependencies: [1×1 struct]
     informed_by: [1×1 struct]
    intended_for: []

>> BIDS.subjects.perf(1).informed_by.perf

ans =

    '/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/perf/sub-Sub1_m0scan.nii.gz'

>> BIDS.subjects.perf(4)

ans = 

  struct with fields:

        filename: 'sub-Sub1_m0scan.nii.gz'
             ext: '.nii.gz'
          suffix: 'm0scan'
        entities: [1×1 struct]
            meta: [1×1 struct]
    dependencies: ''
     informed_by: [1×1 struct]
    intended_for: {'/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/perf/sub-Sub1_asl.nii.gz'}


>> BIDS.subjects.perf(4).informed_by.fmap

ans =

    '/home/remi/github/BIDS-matlab/tests/bids-examples/asl004/sub-Sub1/fmap/sub-Sub1_dir-pa_m0scan.nii.gz'

@codecov
Copy link

codecov bot commented Feb 16, 2021

Codecov Report

Merging #151 (39ddbcb) into dev (a38ea5c) will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@          Coverage Diff          @@
##             dev    #151   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files         22      25    +3     
  Lines        892     920   +28     
=====================================
- Misses       892     920   +28     
Flag Coverage Δ
unittests 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
+bids/+internal/get_metadata.m 0.00% <ø> (ø)
+bids/+internal/return_file_index.m 0.00% <0.00%> (ø)
+bids/+internal/return_file_info.m 0.00% <0.00%> (ø)
+bids/+internal/return_subject_index.m 0.00% <0.00%> (ø)
+bids/layout.m 0.00% <0.00%> (ø)
+bids/query.m 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a38ea5c...39ddbcb. Read the comment docs.

pth, ...
['^' strrep(filename, ['.' ext], '\.json') '$']);
if ~isempty(tsv_file)
structure.meta = bids.util.jsondecode(tsv_file);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both tsv and json are readed into memory, it would be worth to check their consistency -- ie if all columns in tsv are defined in json.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure about especially if we drop reading the files during indexing (see below).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is question of design. My opinion (purely subjectif):
For layout metadata is needed only for parcing IntendedFor fields
For query returning full structure and just path to json file are equivalent, retrieving full stucture is trivial using bids.util.jsondecode, hence no need to return full structure.
But, what can be useful is to query on individual fields of meta, i.e. "show me all files with repetition time == 0.03". Then contains of meta will become important for query. But it is difficult to implement (espetualy if queried values are numbers and not strings).

The question if load in memory all meta, or will read given file each time we look in it -- is open. One economise time, other memory. For me it is better to read each time -- layout and query will be called once, and before processing images which will dominate calculation time. The json files are small, so user will see slow-down only in huge datasets (but there any solution will be slow).

For indexing, I can't comment beacause of lack of matlab knowelege

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is question of design. My opinion (purely subjectif):
😉

For layout metadata is needed only for parcing IntendedFor fields
For query returning full structure and just path to json file are equivalent, retrieving full stucture is trivial using bids.util.jsondecode, hence no need to return full structure.

Better to use get_metadata that should take care of the inheritance principle but I see what you mean.

But, what can be useful is to query on individual fields of meta, i.e. "show me all files with repetition time == 0.03". Then contains of meta will become important for query. But it is difficult to implement (espetualy if queried values are numbers and not strings).

There is some code "ruins" in bids.query about this: so I will leave that there for though I agree that doing this on query could be painful.

The question if load in memory all meta, or will read given file each time we look in it -- is open. One economise time, other memory. For me it is better to read each time -- layout and query will be called once, and before processing images which will dominate calculation time. The json files are small, so user will see slow-down only in huge datasets (but there any solution will be slow).

I agree that the bottle neck in our pipeline is definitely not json metadata reading. And when moving to big data set, we might have to go a completely different way anyway.

+bids/layout.m Show resolved Hide resolved
+bids/layout.m Outdated Show resolved Hide resolved
@@ -257,7 +257,7 @@

case 'dependencies'
if isfield(d(k), 'dependencies')
result{end + 1} = d(k).dependencies;
result{end + 1, 1} = d(k).dependencies;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general query returns a array of strings, except for dependencies, in which case it will return an array of structure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah metadata is also handled that way.

If you ask info about a single file then you get a structure.

If you ask for metada or dependencies about several files, you can get structures that have different shapes so you can't concatenate them in something prettier and easier to handle. 😭

Copy link
Collaborator

@nbeliy nbeliy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fun of making each dependency in its own field, but I can see the logic behind.

@Remi-Gau
Copy link
Collaborator Author

Not fun of making each dependency in its own field, but I can see the logic behind.

yeah I agree that this is a bit cumbersome, if we find a more elegant solution down the line let's see how to improve.

@Remi-Gau Remi-Gau merged commit 87416fd into bids-standard:dev Feb 18, 2021
@nbeliy
Copy link
Collaborator

nbeliy commented Feb 18, 2021

Sorry to reopening the request. There two issues that I encountered during the tests:

  1. Do not take into account qMRI fieldmaps (TB1EPI etc...)
  2. The json list in IntendedFor is transformed into cell and not structure, hence it avoid elseif on line 527
ans =

  6x1 cell array

    {'anat/sub-HC1278_echo-*_acq-T1w_part-phase_MPM.nii.gz'  }
    {'anat/sub-HC1278_echo-*_acq-T1w_part-mag_MPM.nii.gz'    }
    {'anat/sub-HC1278_echo-*_acq-MToff_part-mag_MPM.nii.gz'  }
    {'anat/sub-HC1278_echo-*_acq-MToff_part-phase_MPM.nii.gz'}
    {'anat/sub-HC1278_echo-*_acq-MTon_part-mag_MPM.nii.gz'   }
    {'anat/sub-HC1278_echo-*_acq-MTon_part-phase_MPM.nii.gz' }

Proposition of solution (not sure if it will broke something else):
1a) Search IntendedFor in each of json files
1b) Search IntendedFor in each file in modality fmap, not sure how it will work for coodsystem and m0

@Remi-Gau
Copy link
Collaborator Author

@all-contributors please add @nbeliy for review

@allcontributors
Copy link
Contributor

@Remi-Gau

I've put up a pull request to add @nbeliy! 🎉

@Remi-Gau Remi-Gau deleted the remi-intended_for branch February 26, 2021 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants