-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fit, fit_transform, transform, predict, and score methods #119
Conversation
Overall, LGTM. The test failures probably have to do with API changes in new sklearn versions. I wonder whether we'd like to further automate train/test(/validation?) splitting in the imputation scenarios. Also wondering how this would play with sklearn Pipeline objects. |
I don't think this will play well with the pipeline objects since the API is different. Whereas sklearn For train/test split, I was thinking of something like dataset = AFQDataset.from_files(...)
dataset_train, dataset_test = train_test_split(dataset)
imputer = SimpleImputer()
imputer = dataset_train.fit(imputer)
dataset_train = dataset_train.transform(imputer)
dataset_test = dataset_test.transform(imputer) Since we can already train/test split the datasets themselves. But I'm certainly open to other suggestions for this workflow. |
This all makes sense. We'd still need to implement our own |
I'm not sure I understand. The second line works as is right now. Do you mean that I should write a wrapper function that implements all of lines 2-6 in my example? |
OK OK. I think that I understand. The AFQDataset ducktypes the array interface, so sklearn's |
Yeah, I need to do that. Also, I'm wondering what should be returned by the Also, I need to add tests. |
Ugh, it seems that the sklearn API does not want us to build an object that has both One proposed solution is to prepend dataset = AFQDataset.from_files(...)
dataset_train, dataset_test = train_test_split(dataset)
imputer = SimpleImputer()
imputer = dataset_train.model_fit(imputer)
dataset_train = dataset_train.model_transform(imputer)
dataset_test = dataset_test.model_transform(imputer) I'll make that change now but welcome other suggested solutions. |
This API makes sense to me. It's actually sensible that sklearn is preventing us from mimicking the estimator API, because it needs to be clear that Regarding the output of |
59bfbb7
to
f54c4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small questions/comments
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Co-authored-by: Ariel Rokem <arokem@gmail.com>
…into enh/fit-on-dataset
I'm running into some version incompatibilities between sklearn and skopt. Running with sklearn 0.23.2 (which is what I had in the env when I fetched this branch), I get:
The after installing AFQ-Insight with
@richford : what versions of skopt, sklearn (other things?) do you have in your env? |
OK - installing into a clean env works. I don't exactly understand how, but I now get skopt 0.9.0 (not yet on pypi?? https://pypi.org/project/scikit-opt/#history). Where is this coming from? Is this installed from github in one of the AFQ-Insight dependencies? I don't see it in groupyr. At any rate, should we pin these versions here? |
We now require sklearn >= 1.0.0 (see e.g. this line), but you're right that there's no version requirement for skopt. We only get the skopt requirement through groupyr, it's not required in AFQ-Insight. Actually, I think a good solution would be to use
in groupyr and move skopt to an optional install in groupyr. I think for now, we can merge this (assuming everything else looks good to you) and then open a groupyr issue about skopt inclusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of really small things related to the examples. I think that we can punt most of these to another PR and go ahead and merge this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what it would take to make neurocombat_sklearn fully sklearn-compatible. I remember looking into this at some point and not having fun.
I'm not sure what it would take exactly but I know they'd at least have to deprecate the positional arguments and rename the |
Co-authored-by: Ariel Rokem <arokem@gmail.com>
I think this may be good to go. I started a new PR for HBN and downloading issues/enhancements. |
...to AFQDataset
Resolves #118