Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Method inverse_transform() signature #79

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RahulDubey391
Copy link

No description provided.

@RahulDubey391
Copy link
Author

Hi, I have added the inverse_transform() class methods for the BaseTransformer. What I understood is the method from Sklearn library is overloaded with different signature.

I need more help to understand if my assumption is correct. Also I see all the other preprocessing classes needs to have this method added.

@sfc-gh-thoyt
Copy link
Collaborator

@RahulDubey391 Thank you for your efforts. The main thing missing is the implementation of inverse_transform using snowpark dataframes.

Our codebase has a system for autogenerating scikit-learn wrapper classes that allow users to execute fits/transforms/etc with snowpark dataframes or pandas dataframes. For inverse_transform we would like to implement it in this way, and add it to wrapper classes if the underlying scikit-learn base estimator has an inverse_transform method. Take a look at the codegen/ directory for the templates and autogen logic. The autogenerated code is built via bazel.

We realize this could be daunting, but we'd like to support you if you are still interested in contributing. We could greatly improve the contribution guide in CONTRIBUTING.md.

The other thing about inverse_transform is that when using snowpark dataframes for transformations, the input columns are retained by default. The user may want to discard them and can set drop_input_cols=True. But by default since the input columns are retained, there is limited utility for inverse_transform.

Hope this makes sense, let us know if you have questions.

@RahulDubey391
Copy link
Author

@RahulDubey391 Thank you for your efforts. The main thing missing is the implementation of inverse_transform using snowpark dataframes.

Our codebase has a system for autogenerating scikit-learn wrapper classes that allow users to execute fits/transforms/etc with snowpark dataframes or pandas dataframes. For inverse_transform we would like to implement it in this way, and add it to wrapper classes if the underlying scikit-learn base estimator has an inverse_transform method. Take a look at the codegen/ directory for the templates and autogen logic. The autogenerated code is built via bazel.

We realize this could be daunting, but we'd like to support you if you are still interested in contributing. We could greatly improve the contribution guide in CONTRIBUTING.md.

The other thing about inverse_transform is that when using snowpark dataframes for transformations, the input columns are retained by default. The user may want to discard them and can set drop_input_cols=True. But by default since the input columns are retained, there is limited utility for inverse_transform.

Hope this makes sense, let us know if you have questions.

Thanks a lot @sfc-gh-thoyt for the guidance. Yes I am still interested in contributing and would like to proceed further to explore codegen. I'll go through it and will raise doubts if any comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants