-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] improve management of "intended_for" #151
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #151 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 22 25 +3
Lines 892 920 +28
=====================================
- Misses 892 920 +28
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pth, ... | ||
['^' strrep(filename, ['.' ext], '\.json') '$']); | ||
if ~isempty(tsv_file) | ||
structure.meta = bids.util.jsondecode(tsv_file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both tsv and json are readed into memory, it would be worth to check their consistency -- ie if all columns in tsv are defined in json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure about especially if we drop reading the files during indexing (see below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is question of design. My opinion (purely subjectif):
For layout metadata is needed only for parcing IntendedFor fields
For query returning full structure and just path to json file are equivalent, retrieving full stucture is trivial using bids.util.jsondecode
, hence no need to return full structure.
But, what can be useful is to query on individual fields of meta, i.e. "show me all files with repetition time == 0.03". Then contains of meta will become important for query. But it is difficult to implement (espetualy if queried values are numbers and not strings).
The question if load in memory all meta, or will read given file each time we look in it -- is open. One economise time, other memory. For me it is better to read each time -- layout and query will be called once, and before processing images which will dominate calculation time. The json files are small, so user will see slow-down only in huge datasets (but there any solution will be slow).
For indexing, I can't comment beacause of lack of matlab knowelege
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is question of design. My opinion (purely subjectif):
😉
For layout metadata is needed only for parcing IntendedFor fields
For query returning full structure and just path to json file are equivalent, retrieving full stucture is trivial usingbids.util.jsondecode
, hence no need to return full structure.
Better to use get_metadata
that should take care of the inheritance principle but I see what you mean.
But, what can be useful is to query on individual fields of meta, i.e. "show me all files with repetition time == 0.03". Then contains of meta will become important for query. But it is difficult to implement (espetualy if queried values are numbers and not strings).
There is some code "ruins" in bids.query
about this: so I will leave that there for though I agree that doing this on query could be painful.
The question if load in memory all meta, or will read given file each time we look in it -- is open. One economise time, other memory. For me it is better to read each time -- layout and query will be called once, and before processing images which will dominate calculation time. The json files are small, so user will see slow-down only in huge datasets (but there any solution will be slow).
I agree that the bottle neck in our pipeline is definitely not json metadata reading. And when moving to big data set, we might have to go a completely different way anyway.
@@ -257,7 +257,7 @@ | |||
|
|||
case 'dependencies' | |||
if isfield(d(k), 'dependencies') | |||
result{end + 1} = d(k).dependencies; | |||
result{end + 1, 1} = d(k).dependencies; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general query returns a array of strings, except for dependencies, in which case it will return an array of structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah metadata is also handled that way.
If you ask info about a single file then you get a structure.
If you ask for metada or dependencies about several files, you can get structures that have different shapes so you can't concatenate them in something prettier and easier to handle. 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fun of making each dependency in its own field, but I can see the logic behind.
yeah I agree that this is a bit cumbersome, if we find a more elegant solution down the line let's see how to improve. |
Sorry to reopening the request. There two issues that I encountered during the tests:
Proposition of solution (not sure if it will broke something else): |
@all-contributors please add @nbeliy for review |
I've put up a pull request to add @nbeliy! 🎉 |
intendedFor
metadata field is done after the index of the datasetintended_for
field of file X contains the fullpath of all the existing target files that X is actually intended forintendedFor
list of file X get aninformed_by
field with the fullpath for XDoes not include a way to query the
intended_for
orinformed_by
field.For reviewers:
This is more on the implementation side, but in terms of "output" I am wondering what would the most useful for users. Ideally, this approach would then be extended to
events
and other "associated files".informed_by
(pybids uses this I think) replace thedependencies
field? Which one is more "intuitive"?See example below