Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Proposal for model-<modelname> directory #1280

Closed
wants to merge 42 commits into from

Conversation

arokem
Copy link
Collaborator

@arokem arokem commented Sep 12, 2022

As an auxilliary to work being done on BEP16 (DWI derivatives), we are proposing a change to the common derivatives specification to allow the creation of a sub-directory into which model parameters will be saved, together with a models.tsv metadata file to describe all the different models that are fit within a pipeline.

EDIT BY @oesteban
We arrived to this proposal as a way of partially address the issues identified by @Lestropie (see bids-standard/bids-bep016#50), building from his proposal n. 3.

@codecov
Copy link

codecov bot commented Sep 12, 2022

Codecov Report

Patch and project coverage have no change.

Comparison is base (8bc376e) 87.90% compared to head (104abd5) 87.90%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1280   +/-   ##
=======================================
  Coverage   87.90%   87.90%           
=======================================
  Files          14       14           
  Lines        1273     1273           
=======================================
  Hits         1119     1119           
  Misses        154      154           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@arokem
Copy link
Collaborator Author

arokem commented Sep 12, 2022

See bids-standard/bids-bep016#50 for relevant context.

Specifically bids-standard/bids-bep016#50 (comment)

@effigies
Copy link
Collaborator

I haven't really integrated this whole proposal yet, but I want to jot down some quick notes about the direction we took in FitLins, which is at least superficially quite different. Not sure if it can apply here at all...

In FitLins, we have hierarchical models (e.g., run-level -> subject-level -> dataset-level), such that you could potentially have multiple dataset-level models that build off the results in a common subject-level model. Instead of nesting models more deeply inside modalities, I've been pulling them to the top:

fitlins_derivatives/
    dataset_description.json
    node-run/
        sub-01/
        sub-02/
        ...
    node-subject/
        sub-01/
        sub-02/
        ...
    node-group/
        group_results

I imagined the node-* becoming derivative datasets in their own right, and then fitlins_derivatives becoming a super-dataset (a concept still to be defined). This would give the flexibility to, say, distribute first-level results as a standalone dataset for image-based meta-analysis.

@effigies
Copy link
Collaborator

@arokem If you want to fix up the markdown, use npm install remark; npx remark <markdown-file> -o.

@arokem
Copy link
Collaborator Author

arokem commented Sep 12, 2022

Thanks for the quick comments here @effigies! Are these proposals really in conflict? I can imagine having a model-<modelname> directory within every one of the node- levels that you specified. Seems complementary to me.

@arokem
Copy link
Collaborator Author

arokem commented Sep 12, 2022

@arokem If you want to fix up the markdown, use npm install remark; npx remark -o.

Thanks! Unfortunately, working in codespaces because UT's wifi won't let me pull stuff down from GitHub :-/

I'll let @oesteban do the rest of these fixes :-)

@oesteban
Copy link
Collaborator

@effigies

$ npm install remark; npx remark src/05-derivatives/01-introduction.md -o src/05-derivatives/01-introduction.md
npm WARN saveError ENOENT: no such file or directory, open '/Users/oesteban/workspace/bids-specification/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/oesteban/workspace/bids-specification/package.json'
npm WARN bids-specification No description
npm WARN bids-specification No repository field.
npm WARN bids-specification No README data
npm WARN bids-specification No license field.

+ remark@14.0.2
updated 1 package and audited 94 packages in 0.764s

42 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

npx: installed 55 in 2.831s
command not found: remark

@oesteban
Copy link
Collaborator

the npx remark command doesn't seem to work

@effigies
Copy link
Collaborator

I ran it. Apparently it doesn't fix everything, though...

@PeerHerholz
Copy link
Member

I was wondering how this fits with the computational model BEP (if it's related/needs to)?

@oesteban
Copy link
Collaborator

I was wondering how this fits with the computational model BEP (if it's related/needs to)?

As per that BEP: BIDS Computational Model files store the mathematical and computational descriptions of
computational models as well as simulation results.
Based on that I'd say the only overlap would be the simulation results bit, and not entirely because this PR shies away from specifying models themselves. I believe that BEP has a very large overlap with the BIDS-statsmodel.

Instead, we want this PR to gauge the interest/position of the BIDS community in the possibility that we add the new model-<model_id>/ level to the directory hierarchy of derivatives.

There are some deeply related aspects of this PR that we intendedly left for the future:

  • Multi-datatype models (e.g., a model that generates derivatives from both func/ and dwi/ resources). We were entertaining ideas like allowing a folder func+dwi/ to exist, or have a more general multi/ possibility. In any case, we did not want to get into that issue here, but will come up almost immediately after this one is decided.
  • The actual specification of models. There's a pretty much developed proposal, very comprehensive, in BEP016 -- thanks to the persistence, attention to detail, and overall titanic effort of @Lestropie.

Those are critical elements that affect this PR, so it's good to have them in mind (and it's a likely possibility that a majority of people push towards having one or both issues covered here too, should it be going towards an accepted decision).

It'd be great to have critical reviews from @rwblair and @tsalo, and anyone else involved in the BIDS schema feat. Does this complicate things on that front?

Historically, this PR comes from bids-standard/bids-bep016#50. It does not itself fully resolve the inheritance issue described there, but it basically evolves from Option 3 there, which indirectly resolves the problem to the extent it is necessary within BEP016.

@oesteban oesteban changed the title WIP: Proposal for model-<modelname> directory ENH: Proposal for model-<modelname> directory Sep 13, 2022
@oesteban
Copy link
Collaborator

@arokem @Lestropie I have allowed myself to remove the WIP label from the title. I have made some edits to use macros and make the text more consistent with the rest of the spec.

For a render of this PR at this point, go to https://output.circle-artifacts.com/output/job/ecd7481c-1064-41ad-a51b-e362395503e4/artifacts/0/dev_docs/01-introduction.html

Copy link
Collaborator Author

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be json files, right?

src/05-derivatives/01-introduction.md Outdated Show resolved Hide resolved
src/05-derivatives/01-introduction.md Outdated Show resolved Hide resolved
Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial thoughts:

  1. The schema doesn't have mechanism for the additional folder layer you're proposing, so we'd need to extend it. I think it would be helpful to have a general concept associated with this level beyond "models". If this is a layer the community wants to use for models, then presumably it will be applicable to other types of derivatives.
  2. I think you need a models suffix in schema/objects/suffixes.yaml, as well as rules for it- perhaps those could go in schema/rules/tabular_metadata.yaml.

src/05-derivatives/01-introduction.md Outdated Show resolved Hide resolved
arokem added a commit to arokem/bids-specification that referenced this pull request Sep 13, 2022
Following up on bids-standard#1280, this PR differentiates between model parameters
and model-derived parameters. This distinction arises from considerations
described in more detail in BEP16. For example, the distinction
between the 6 elements of the tensor that are fit in the process of
DTI, and secondary calculations of FA and MD from the eigenvalues
of the tensor.
arokem added a commit to arokem/bids-specification that referenced this pull request Sep 13, 2022
Following up on bids-standard#1280, this PR differentiates between model parameters
and model-derived parameters. This distinction arises from considerations
described in more detail in BEP16. For example, the distinction
between the 6 elements of the tensor that are fit in the process of
DTI, and secondary calculations of FA and MD from the eigenvalues
of the tensor.
arokem added a commit to arokem/bids-specification that referenced this pull request Sep 13, 2022
Following up on bids-standard#1280, this PR differentiates between model parameters
and model-derived parameters. This distinction arises from considerations
described in more detail in BEP16. For example, the distinction
between the 6 elements of the tensor that are fit in the process of
DTI, and secondary calculations of FA and MD from the eigenvalues
of the tensor.
Copy link
Collaborator Author

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suffix model or models?

@@ -669,6 +669,17 @@ meg:
description: |
Unprocessed MEG data stored in the native file format of the MEG instrument
with which the data was collected.
models:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be model in singular?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, the file is called _models.tsv.

As opposed to that, there will be a new model entry in this file in the context of #1282.

Comment on lines 79 to 80
"sub-<label>_models.tsv": "",
"sub-<label>_models.json": ""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, in singular?

Suggested change
"sub-<label>_models.tsv": "",
"sub-<label>_models.json": ""
"sub-<label>_model.tsv": "",
"sub-<label>_model.json": ""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't think so, see above)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIC. I got confused. The description makes it sound like there is one models.tsv file for the dataset. Here, we introduce the notion that different subjects (And different datatypes?) can have their own sub-<label>_models.tsv file (and <datatype>_models.tsv is also allowable?). We might want to describe this before providing an example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, when I added the sub-<>_ prefix I actually disallowed a /models.tsv implicitly. I'll send some changes to make this explicit.

@nbeliy
Copy link

nbeliy commented Sep 14, 2022

Dear all, by chance I stumbled on this BEP, and as I'm working on various BIDS-related software, I wanted to put my 5 cents.

My concern is sub-folders like dwi/model-DTI may confuse BIDS tools (and make difficult to develop new) as it deviates from standard BIDS naming convention (sub-<>_<entities>_<suffix>.<extension>).
From implementation point of view we will need to add a lot of tests into dataset parcing in order to index such files.

I would propose to put models into separate data-type folder together with the sidecar json and summury tsv file:

sub-abc/ses-def/models/sub-abc_ses-def_mod-DTI_model.d
sub-abc/ses-def/models/sub-abc_ses-def_mod-DTI_model.json
sub-abc/ses-def/models/sub-abc_ses-def_models.tsv
sub-abc/ses-def/models/sub-abc_ses-def_models.json

This way the model data is introduced exactly the same way as any other data file in the dataset, simplifying parsing.
The d extension is here just to point it's a folder.
With carefull selection of entities (for ex for parameter adjustments) you can see which folder correspond to what model:

sub-abc_mod-DTI_par1-00_par2-00_model.d
sub-abc_mod-DTI_par1-00_par2-01_model.d
sub-abc_mod-DTI_par1-00_par2-02_model.d

With convention that 00 means nominal/default value, and non-zero index the corresponding parameter adjustment.
Or alternatively using one entity for different parameter sets with meaningful naming.

For the group level models models folder can be moved into root directory:

root/
    models/acq-DTI_model.d
    sub-abc/

Cheers,
Nikita

@oesteban
Copy link
Collaborator

My concern is sub-folders like dwi/model-DTI may confuse BIDS tools (and make difficult to develop new) as it deviates from standard BIDS naming convention (sub-<>_<entities>_<suffix>.<extension>).

I don't think there's anything in this PR that is not supported by the current specifications. It's true that the BIDS Validator may require some extra heuristic, but that's more a problem of being able to formalize BIDS rules into a schema that can be used directly by the validator.

Since this only affects BIDS-Derivatives (and that's clearly stated at a couple of places in this PR) I don't think this would break any existing software either, basically because there's no specification to break at this point.

Finally, I'll also mention that this model-<label>/ folder is basically specified like sub-<label> + participants.tsv file or the ses-<label>/ folder. So it doesn't seem to break any existing conventions for folder names.

I would propose to put models into separate data-type folder together with the sidecar json and summury tsv file:

I think these suggestions fit better in #1282. Can you check that out?

Thanks a lot for the input, happy to clarify further if the argument is not sufficiently convincing.

@nbeliy
Copy link

nbeliy commented Sep 14, 2022

Since this only affects BIDS-Derivatives (and that's clearly stated at a couple of places in this PR) I don't think this would break any existing software either, basically because there's no specification to break at this point.

Indeed, the derivatives in BIDS are not regulated.
But even so, deviating from BIDS style of names can provide difficulties to use such derivatives in the analysis scripts.
If you take and example of bids-matlab (free publicity!),
it allow to parse files even if they are not in the official BIDS schema.
But the dwi/model-DTI folder will be ignores as it doesn't starts with sub-.

Imagine you want a script that compare different models inside the same dataset.
If these models folders are supported, then you just need to query all files with suffix model,
and you have all of them.

In opposite, you need to do recursive search for the folders containing model-<name>,
and then parse the subject/session names retrieve corresponding images.
It's feasible, but requires additional code.

I'm speaking less from human and more from machine perspective.
Bids software know how to deal with bids-formatted names and structure,
so any deviations will add just more complexity.

I think these suggestions fit better in #1282. Can you check that out?

My point was about model-<name> sub-folder, not about the name of suffix.
The example I give can be changed in suffixes, entities and extensions.
But I fight for data-type folder and sub- prefix)

@oesteban
Copy link
Collaborator

But even so, deviating from BIDS style

How this proposal deviates from BIDS style specifically?

If you take and example of bids-matlab (free publicity!),
it allow to parse files even if they are not in the official BIDS schema.
But the dwi/model-DTI folder will be ignores as it doesn't starts with sub-.

BIDS should remain agnostic to the decisions made by the software developer. This idea comes from the fact that scientists use folders typically to cluster the outputs of important models in their analysis. They typically group by subject first, and that's why sub-<label> is at the top. It seemed logical to offer the opportunity to encode models this way. That said, you don't need to have a model directory if you don't want (the same way you may not have the session folder).

Imagine you want a script that compare different models inside the same dataset.
If these models folders are supported, then you just need to query all files with suffix model,
and you have all of them.

Human: goes to the _models.tsv file and sees all the models with relevant info.
Software: e.g., pybids could do this tomorrow with a simple edit of the bids schema inside:

>>> layout.get_models(datatype="func")
[ list of all model identifiers under datatype func ]

@arokem arokem force-pushed the derivatives-model-directory branch from 2c7ebd4 to 104abd5 Compare March 17, 2023 17:50
@arokem
Copy link
Collaborator Author

arokem commented Mar 17, 2023

For the time being, I have rebased this on top of current master to facilitate more discussion of this proposal.

@tsalo : regarding your previous comment, should I add a new file called "models.yaml" in here and try to follow the example of the existing files in that folder?

@arokem
Copy link
Collaborator Author

arokem commented Mar 30, 2023

Update on the status of this PR: I brought this for discussion with the steering group. It met with some skepticism, particularly as to how this would apply to other modalities that we have not considered in our proposal here. I suggest that we close this PR, and that we try to deal with the issues that this PR proposes to address through the introduction of suffixes, rather than this rather extensive change to the spec and to the schema. I guess that different models can be also thought of as different pipelines, and thus separated from each other at that level for the purpose of ease of navigation. Closing this PR would open up the path to make progress on the connectivity derivatives BEPs through the addition of suffixes that described model-fit parameters (i.e., _mfp) and model-derived parameters (i.e., _mdp), e.g., as discussed in the current draft of BEP16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants