-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: Add support for subject(s) file for junifer run
#182
Conversation
This was thought already: Line 54 in ace98da
The only thing we need to define is the file format.
Main issue is elements as tuples. For example, using HCP1200 datagrabber, we can set in the datagrabber
Setting the elements on the CLI or the YAML is just a way of bypassing the iterator. So in this case, the .CSV file with the elements to process should contain all the keys of the element ( Possible solutions:
then these lines in the run function: junifer/junifer/api/functions.py Lines 165 to 172 in ace98da
would change for something like: with datagrabber_object:
if elements is not None:
for t_element in datagrabber_object.filter(elements):
mc.fit(datagrabber_object[t_element])
else:
for t_element in datagrabber_object:
mc.fit(datagrabber_object[t_element])
|
What you say makes sense, but I really only mean like setting one parameter of the element, i.e. mostly for subjects, to a list. The rest of the element can still be constructed. So for example, as a user I dont want to construct all elements, but just provide a list of subjects, for example of the HCP in a txt file. Junifer can then construct the elements, for example by using the default values for all other parameters (i.e. [LR, RL] and [REST1..., LAST_TASK]), without me having to make an iterator/file over all elements myself. The reasoning is that subject lists can be quite long and are therefore more difficult to set in the YAML file than other parameters. Having said that, making a csv file of all elements is also easy enough, but in my opinion puts more burden on the user (also in terms of data discovery). |
Passing only the subject IDs makes sense to me as this will make it simple for users and also enable one to filter what the |
Then let's move on that way. |
|
Codecov Report
@@ Coverage Diff @@
## main #182 +/- ##
==========================================
+ Coverage 93.03% 93.04% +0.01%
==========================================
Files 84 84
Lines 3718 3739 +21
Branches 724 733 +9
==========================================
+ Hits 3459 3479 +20
Misses 161 161
- Partials 98 99 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
junifer run
…g pandas to read csv
5714000
to
17703ff
Compare
Are you requiring a new dataset or marker?
Which feature do you want to include?
I think in most data processing projects it is desirable to be able to select specific subsets of all subjects. Sometimes one just wants to get all available data, but sometimes one may want only specific subsets (like for example only unrelated subjects or subjects matched on some other variable). In other cases one may want to just preprocess a specific subset for some initial exploration or testing before getting the full sample.
How do you imagine this integrated in junifer?
Add a subject parameter to in-built data grabbers. Ideally in the full pipeline one can add a list of subjects to the yaml directly, or for example by providing a "subject.txt" file that lists all desired subjects. Something like that.
Do you have a sample code that implements this outside of junifer?
No response
Anything else to say?
No response