-
Notifications
You must be signed in to change notification settings - Fork 10
Add fit, fit_transform, transform, predict, and score methods #119
Conversation
Overall, LGTM. The test failures probably have to do with API changes in new sklearn versions. I wonder whether we'd like to further automate train/test(/validation?) splitting in the imputation scenarios. Also wondering how this would play with sklearn Pipeline objects. |
I don't think this will play well with the pipeline objects since the API is different. Whereas sklearn For train/test split, I was thinking of something like dataset = AFQDataset.from_files(...)
dataset_train, dataset_test = train_test_split(dataset)
imputer = SimpleImputer()
imputer = dataset_train.fit(imputer)
dataset_train = dataset_train.transform(imputer)
dataset_test = dataset_test.transform(imputer) Since we can already train/test split the datasets themselves. But I'm certainly open to other suggestions for this workflow. |
This all makes sense. We'd still need to implement our own |
I'm not sure I understand. The second line works as is right now. Do you mean that I should write a wrapper function that implements all of lines 2-6 in my example? |
OK OK. I think that I understand. The AFQDataset ducktypes the array interface, so sklearn's |
Yeah, I need to do that. Also, I'm wondering what should be returned by the Also, I need to add tests. |
Ugh, it seems that the sklearn API does not want us to build an object that has both One proposed solution is to prepend dataset = AFQDataset.from_files(...)
dataset_train, dataset_test = train_test_split(dataset)
imputer = SimpleImputer()
imputer = dataset_train.model_fit(imputer)
dataset_train = dataset_train.model_transform(imputer)
dataset_test = dataset_test.model_transform(imputer) I'll make that change now but welcome other suggested solutions. |
This API makes sense to me. It's actually sensible that sklearn is preventing us from mimicking the estimator API, because it needs to be clear that Regarding the output of |
59bfbb7
to
f54c4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small questions/comments
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
…into enh/fit-on-dataset
I'm running into some version incompatibilities between sklearn and skopt. Running with sklearn 0.23.2 (which is what I had in the env when I fetched this branch), I get:
The after installing AFQ-Insight with
@richford : what versions of skopt, sklearn (other things?) do you have in your env? |
OK - installing into a clean env works. I don't exactly understand how, but I now get skopt 0.9.0 (not yet on pypi?? https://pypi.org/project/scikit-opt/#history). Where is this coming from? Is this installed from github in one of the AFQ-Insight dependencies? I don't see it in groupyr. At any rate, should we pin these versions here? |
We now require sklearn >= 1.0.0 (see e.g. this line), but you're right that there's no version requirement for skopt. We only get the skopt requirement through groupyr, it's not required in AFQ-Insight. Actually, I think a good solution would be to use
in groupyr and move skopt to an optional install in groupyr. I think for now, we can merge this (assuming everything else looks good to you) and then open a groupyr issue about skopt inclusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of really small things related to the examples. I think that we can punt most of these to another PR and go ahead and merge this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what it would take to make neurocombat_sklearn fully sklearn-compatible. I remember looking into this at some point and not having fun.
I'm not sure what it would take exactly but I know they'd at least have to deprecate the positional arguments and rename the |
Co-authored-by: Ariel Rokem <arokem@gmail.com>
I think this may be good to go. I started a new PR for HBN and downloading issues/enhancements. |
...to AFQDataset
Resolves #118