Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]: Add support for subject(s) file for junifer run #182

Merged
merged 19 commits into from
Oct 10, 2023

Conversation

synchon
Copy link
Member

@synchon synchon commented Aug 24, 2023

Are you requiring a new dataset or marker?

  • I understand this is not a marker or dataset request

Which feature do you want to include?

I think in most data processing projects it is desirable to be able to select specific subsets of all subjects. Sometimes one just wants to get all available data, but sometimes one may want only specific subsets (like for example only unrelated subjects or subjects matched on some other variable). In other cases one may want to just preprocess a specific subset for some initial exploration or testing before getting the full sample.

How do you imagine this integrated in junifer?

Add a subject parameter to in-built data grabbers. Ideally in the full pipeline one can add a list of subjects to the yaml directly, or for example by providing a "subject.txt" file that lists all desired subjects. Something like that.

Do you have a sample code that implements this outside of junifer?

No response

Anything else to say?

No response

@LeSasse LeSasse added enhancement New feature or request triage New issues waiting to be reviewed labels Feb 8, 2023
@fraimondo
Copy link
Contributor

This was thought already:

# TODO: If len == 1, check if its a file, then parse elements from file

The only thing we need to define is the file format.

  • CSV?
  • TSV?
  • both depending on the extension?
  • With headers?
  • without headers?

Main issue is elements as tuples.

For example, using HCP1200 datagrabber, we can set in the datagrabber task="REST1" and then the iterator of the datagrabber will go through all the element tuples, but only for the specified task:

(sub-000001, REST1, LR)
(sub-000001, REST1, RL)
(sub-000002, REST1, LR)
(sub-000002, REST1, RL)
...

Setting the elements on the CLI or the YAML is just a way of bypassing the iterator. So in this case, the .CSV file with the elements to process should contain all the keys of the element ("subject", "task", "phase_encoding").

Possible solutions:

  • Modify the datagrabber so we can also get an iterator using a list, in which some of the keys of the element can be left out. Something like a filter function:
def filter(self, selection) -> Iterator:

then these lines in the run function:

with datagrabber_object:
if elements is not None:
for t_element in elements:
mc.fit(datagrabber_object[t_element])
else:
for t_element in datagrabber_object:
mc.fit(datagrabber_object[t_element])

would change for something like:

 with datagrabber_object: 
     if elements is not None: 
         for t_element in datagrabber_object.filter(elements): 
             mc.fit(datagrabber_object[t_element]) 
     else: 
         for t_element in datagrabber_object: 
             mc.fit(datagrabber_object[t_element]) 
  • Ask the user to provide the list all elements with all keys in it.

@fraimondo fraimondo added question Further information is requested and removed triage New issues waiting to be reviewed labels Mar 31, 2023
@fraimondo fraimondo added this to the 0.0.3 (alpha 2) milestone Mar 31, 2023
@LeSasse
Copy link
Collaborator Author

LeSasse commented Mar 31, 2023

What you say makes sense, but I really only mean like setting one parameter of the element, i.e. mostly for subjects, to a list. The rest of the element can still be constructed. So for example, as a user I dont want to construct all elements, but just provide a list of subjects, for example of the HCP in a txt file. Junifer can then construct the elements, for example by using the default values for all other parameters (i.e. [LR, RL] and [REST1..., LAST_TASK]), without me having to make an iterator/file over all elements myself. The reasoning is that subject lists can be quite long and are therefore more difficult to set in the YAML file than other parameters. Having said that, making a csv file of all elements is also easy enough, but in my opinion puts more burden on the user (also in terms of data discovery).

@synchon
Copy link
Member

synchon commented Jul 19, 2023

Passing only the subject IDs makes sense to me as this will make it simple for users and also enable one to filter what the DataGrabber iterates over in combination with other parameters that one DataGrabber provides, for example,types, tasks and so on. From the implementation perspective, we already have a good way as @fraimondo shows.

@fraimondo
Copy link
Contributor

Then let's move on that way.

@synchon synchon added WIP Work in progress dataset Issues or pull requests related to datagrabbers or datasets and removed question Further information is requested labels Jul 19, 2023
@github-actions
Copy link

github-actions bot commented Aug 24, 2023

PR Preview Action v1.4.4
Preview removed because the pull request was closed.
2023-10-10 13:10 UTC

@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #182 (8666562) into main (22ca06f) will increase coverage by 0.01%.
Report is 3 commits behind head on main.
The diff coverage is 96.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #182      +/-   ##
==========================================
+ Coverage   93.03%   93.04%   +0.01%     
==========================================
  Files          84       84              
  Lines        3718     3739      +21     
  Branches      724      733       +9     
==========================================
+ Hits         3459     3479      +20     
  Misses        161      161              
- Partials       98       99       +1     
Flag Coverage Δ
docs 100.00% <ø> (ø)
junifer 93.04% <96.42%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
junifer/api/cli.py 71.28% <100.00%> (+2.13%) ⬆️
junifer/api/functions.py 96.23% <100.00%> (ø)
junifer/datagrabber/base.py 98.36% <93.75%> (-1.64%) ⬇️

@synchon synchon changed the title [ENH]: Add subject parameter to in-built datagrabbers [ENH]: Add support for subject(s) file for junifer run Sep 6, 2023
@synchon synchon requested a review from fraimondo September 6, 2023 12:07
@synchon synchon added ready Pull request is ready for review and merge and removed WIP Work in progress labels Sep 6, 2023
junifer/api/cli.py Outdated Show resolved Hide resolved
junifer/api/cli.py Outdated Show resolved Hide resolved
@synchon synchon force-pushed the update/allow-element-via-file branch from 5714000 to 17703ff Compare October 6, 2023 10:09
@synchon synchon requested a review from fraimondo October 6, 2023 10:10
@synchon synchon requested a review from fraimondo October 10, 2023 11:03
@synchon synchon merged commit a509a23 into main Oct 10, 2023
20 checks passed
@synchon synchon deleted the update/allow-element-via-file branch October 10, 2023 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Issues or pull requests related to datagrabbers or datasets enhancement New feature or request ready Pull request is ready for review and merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants