[ENH]: Add support for subject(s) file for `junifer run` #182

synchon · 2023-08-24T17:20:41Z

Are you requiring a new dataset or marker?

I understand this is not a marker or dataset request

Which feature do you want to include?

I think in most data processing projects it is desirable to be able to select specific subsets of all subjects. Sometimes one just wants to get all available data, but sometimes one may want only specific subsets (like for example only unrelated subjects or subjects matched on some other variable). In other cases one may want to just preprocess a specific subset for some initial exploration or testing before getting the full sample.

How do you imagine this integrated in junifer?

Add a subject parameter to in-built data grabbers. Ideally in the full pipeline one can add a list of subjects to the yaml directly, or for example by providing a "subject.txt" file that lists all desired subjects. Something like that.

Do you have a sample code that implements this outside of junifer?

No response

Anything else to say?

No response

fraimondo · 2023-03-31T08:00:19Z

This was thought already:

junifer/junifer/api/cli.py

Line 54 in ace98da

# TODO: If len == 1, check if its a file, then parse elements from file

The only thing we need to define is the file format.

CSV?
TSV?
both depending on the extension?
With headers?
without headers?

Main issue is elements as tuples.

For example, using HCP1200 datagrabber, we can set in the datagrabber task="REST1" and then the iterator of the datagrabber will go through all the element tuples, but only for the specified task:

(sub-000001, REST1, LR)
(sub-000001, REST1, RL)
(sub-000002, REST1, LR)
(sub-000002, REST1, RL)
...

Setting the elements on the CLI or the YAML is just a way of bypassing the iterator. So in this case, the .CSV file with the elements to process should contain all the keys of the element ("subject", "task", "phase_encoding").

Possible solutions:

Modify the datagrabber so we can also get an iterator using a list, in which some of the keys of the element can be left out. Something like a filter function:

def filter(self, selection) -> Iterator:

then these lines in the run function:

junifer/junifer/api/functions.py

Lines 165 to 172 in ace98da

    
           with datagrabber_object: 
        
               if elements is not None: 
        
                   for t_element in elements: 
        
                       mc.fit(datagrabber_object[t_element]) 
        
               else: 
        
                   for t_element in datagrabber_object: 
        
                       mc.fit(datagrabber_object[t_element])

would change for something like:

 with datagrabber_object: 
     if elements is not None: 
         for t_element in datagrabber_object.filter(elements): 
             mc.fit(datagrabber_object[t_element]) 
     else: 
         for t_element in datagrabber_object: 
             mc.fit(datagrabber_object[t_element])

Ask the user to provide the list all elements with all keys in it.

LeSasse · 2023-03-31T08:16:07Z

What you say makes sense, but I really only mean like setting one parameter of the element, i.e. mostly for subjects, to a list. The rest of the element can still be constructed. So for example, as a user I dont want to construct all elements, but just provide a list of subjects, for example of the HCP in a txt file. Junifer can then construct the elements, for example by using the default values for all other parameters (i.e. [LR, RL] and [REST1..., LAST_TASK]), without me having to make an iterator/file over all elements myself. The reasoning is that subject lists can be quite long and are therefore more difficult to set in the YAML file than other parameters. Having said that, making a csv file of all elements is also easy enough, but in my opinion puts more burden on the user (also in terms of data discovery).

synchon · 2023-07-19T04:51:52Z

Passing only the subject IDs makes sense to me as this will make it simple for users and also enable one to filter what the DataGrabber iterates over in combination with other parameters that one DataGrabber provides, for example,types, tasks and so on. From the implementation perspective, we already have a good way as @fraimondo shows.

fraimondo · 2023-07-19T06:50:35Z

Then let's move on that way.

github-actions · 2023-08-24T17:26:57Z

PR Preview Action v1.4.4
Preview removed because the pull request was closed.
2023-10-10 13:10 UTC

codecov · 2023-09-06T11:45:56Z

Codecov Report

Merging #182 (8666562) into main (22ca06f) will increase coverage by 0.01%.
Report is 3 commits behind head on main.
The diff coverage is 96.42%.

@@            Coverage Diff             @@
##             main     #182      +/-   ##
==========================================
+ Coverage   93.03%   93.04%   +0.01%     
==========================================
  Files          84       84              
  Lines        3718     3739      +21     
  Branches      724      733       +9     
==========================================
+ Hits         3459     3479      +20     
  Misses        161      161              
- Partials       98       99       +1

Flag	Coverage Δ
docs	`100.00% <ø> (ø)`
junifer	`93.04% <96.42%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
junifer/api/cli.py	`71.28% <100.00%> (+2.13%)`	⬆️
junifer/api/functions.py	`96.23% <100.00%> (ø)`
junifer/datagrabber/base.py	`98.36% <93.75%> (-1.64%)`	⬇️

junifer/api/cli.py

…lection

…g pandas to read csv

…ements()

junifer/api/tests/test_cli.py

…arsing elements from file

LeSasse added enhancement New feature or request triage New issues waiting to be reviewed labels Feb 8, 2023

LeSasse assigned fraimondo and synchon Feb 8, 2023

fraimondo added question Further information is requested and removed triage New issues waiting to be reviewed labels Mar 31, 2023

fraimondo added this to the 0.0.3 (alpha 2) milestone Mar 31, 2023

synchon added WIP Work in progress dataset Issues or pull requests related to datagrabbers or datasets and removed question Further information is requested labels Jul 19, 2023

synchon modified the milestones: 0.0.3 (alpha 2), 0.0.4 (alpha 3) Jul 21, 2023

synchon changed the title ~~[ENH]: Add subject parameter to in-built datagrabbers~~ [ENH]: Add support for subject(s) file for junifer run Sep 6, 2023

synchon requested a review from fraimondo September 6, 2023 12:07

synchon added ready Pull request is ready for review and merge and removed WIP Work in progress labels Sep 6, 2023

fraimondo requested changes Oct 4, 2023

View reviewed changes

junifer/api/cli.py Outdated Show resolved Hide resolved

junifer/api/cli.py Outdated Show resolved Hide resolved

synchon added 7 commits October 6, 2023 12:06

update: add support for reading elements from file

9075c93

update: add test for reading elements from file

66ea53a

update: add filter() method for BaseDataGrabber

517a1c5

update: add tests for BaseDataGrabber.filter()

744b36a

update: adapt code to allow element filtering while fitting MarkerCol…

3c79b64

…lection

chore: improve type annotation and docstring in CLI functions

96d32a0

chore: improve commentary in CLI functions

9e011f2

synchon added 9 commits October 6, 2023 12:06

chore: improve log messages in CLI functions

f19f110

chore: improve docstring in CLI function tests

4aa1d49

chore: improve type annotation and docstring in BaseDataGrabber

e823da4

fix: correct element for test_run_single_element_with_preprocessing()

7e24f46

docs: add info about specifying elements via a file for junifer run

dc95cb0

chore: add changelog 182.enh

2d24214

update: make api.cli._parse_elements_file() return list of tuple usin…

8c757dc

…g pandas to read csv

update: slight refactor in element parsing logic in api.cli._parse_el…

430ea63

…ements()

chore: add edge cases for test_run_using_element_file()

17703ff

synchon force-pushed the update/allow-element-via-file branch from 5714000 to 17703ff Compare October 6, 2023 10:09

synchon requested a review from fraimondo October 6, 2023 10:10

fraimondo requested changes Oct 10, 2023

View reviewed changes

junifer/api/tests/test_cli.py Show resolved Hide resolved

synchon added 3 commits October 10, 2023 13:01

fix: remove trailing whitespace from cell entries in dataframe when p…

569782f

…arsing elements from file

update: add test for multi-element access via element file

123d815

chore: update docstrings for tests in test_cli.py

8666562

synchon requested a review from fraimondo October 10, 2023 11:03

fraimondo approved these changes Oct 10, 2023

View reviewed changes

synchon merged commit a509a23 into main Oct 10, 2023
20 checks passed

synchon deleted the update/allow-element-via-file branch October 10, 2023 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH]: Add support for subject(s) file for `junifer run` #182

[ENH]: Add support for subject(s) file for `junifer run` #182

synchon commented Aug 24, 2023

fraimondo commented Mar 31, 2023

LeSasse commented Mar 31, 2023

synchon commented Jul 19, 2023

fraimondo commented Jul 19, 2023

github-actions bot commented Aug 24, 2023 •

edited

Loading

codecov bot commented Sep 6, 2023 •

edited

Loading

[ENH]: Add support for subject(s) file for junifer run #182

[ENH]: Add support for subject(s) file for junifer run #182

Conversation

synchon commented Aug 24, 2023

Are you requiring a new dataset or marker?

Which feature do you want to include?

How do you imagine this integrated in junifer?

Do you have a sample code that implements this outside of junifer?

Anything else to say?

fraimondo commented Mar 31, 2023

LeSasse commented Mar 31, 2023

synchon commented Jul 19, 2023

fraimondo commented Jul 19, 2023

github-actions bot commented Aug 24, 2023 • edited Loading

codecov bot commented Sep 6, 2023 • edited Loading

Codecov Report

[ENH]: Add support for subject(s) file for `junifer run` #182

[ENH]: Add support for subject(s) file for `junifer run` #182

github-actions bot commented Aug 24, 2023 •

edited

Loading

codecov bot commented Sep 6, 2023 •

edited

Loading