Make AFQDataset work with different input formats and make indexable #105

richford · 2022-01-07T17:52:32Z

Resolves #98
This PR

makes AFQDataset initialization more general so that user can provide X, y, groups, feature_names, group_names, subjects, sessions, classes explicitly on init.
create a new static method from_files() that takes all of the filenames that are in the current AFQDataset init method and returns an AFQDataset object with all of the data read in.
Make AFQDataset more array-like by adding __getitem__ and __len__ methods and a shape parameter. This means users can do things like dataset[10:20] and also that AFQDatasets can be used as input to sklearn functions like train_test_split.
Adds a lot of detail and doctests to AFQDataset's docstring

Note that I chose a different solution to dataset splitting than the one I proposed in #98. I think making the datasets indexable and therefore interoperable with scikit-learn's already existing, performant, and robust model selection routines is a much better solution than rolling our own split method.

This doesn't resolve the need for imputation methods, which we hint at in #98 but I now think that if we still want those, we should open up another issue and PR for that.

…t to from_files static method

arokem

Overall, looks great! I think that a sphinx gallery example of using train_test_split with some real data should be helpful. But we can take that on a separate PR.

afqinsight/datasets.py

afqinsight/tests/test_datasets.py

… problems

richford added 3 commits January 6, 2022 16:04

Make AFQDataset a more general initialization object and move old ini…

12ab534

…t to from_files static method

Make AFQDataset indexable and fill in docstring/doctests

3e8190d

Update .gitignore

84607f7

richford added enhancement New feature or request documentation labels Jan 7, 2022

richford requested a review from arokem January 7, 2022 17:52

richford added 4 commits January 7, 2022 10:34

Add shape, len, and index test for AFQDataset

64c7917

BF: slice subjects only if they are not None

585589a

Automatically generate subject list if not provided to AFQDataset

86ea43b

Increase coverage of AFQDataset

da3b6de

arokem reviewed Jan 9, 2022

View reviewed changes

afqinsight/datasets.py Outdated Show resolved Hide resolved

afqinsight/tests/test_datasets.py Outdated Show resolved Hide resolved

richford added 4 commits January 11, 2022 11:48

Remove obsolete cast to float and use different repr for unsupervised…

debffb0

… problems

Fix codacity issues

add6b5d

Use try/except for y index error when computing repr

6c8cb73

Explicitly cast y to float before inserting NaNs for testing

75ca906

arokem merged commit 9836dfb into main Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make AFQDataset work with different input formats and make indexable #105

Make AFQDataset work with different input formats and make indexable #105

richford commented Jan 7, 2022

arokem left a comment

Make AFQDataset work with different input formats and make indexable #105

Make AFQDataset work with different input formats and make indexable #105

Conversation

richford commented Jan 7, 2022

arokem left a comment

Choose a reason for hiding this comment