-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discretize Table
#143
Comments
@robmeth : Please add a comment explaining the problem why you marked this as blocked (you mentioned in the final stand up meeting today the failing pandasEqualsTest...). |
@robmeth Use the ordinal encoding. This transforms the data and returns the bin index rather than a bin as sparse matrix. This will also resolve the problem with the tests. See #327 (comment) |
Closes #143. ### Summary of Changes * Added a class `Discretizer` in `safeds.data.tabular.transformation` that wraps the [`KBinsDiscretizer` of `scikit-learn`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html) * Made the class a subclass of `TableTransformer` * The `__init__` for now only has a parameter `number_of_bins` to control how many bins are created * If `number_of_bins` is less than 2, it raises a `ValueError`
## [0.15.0](v0.14.0...v0.15.0) (2023-07-13) ### Features * Add copy method for tables ([#405](#405)) ([72e87f0](72e87f0)), closes [#275](#275) * add gaussian noise to image ([#430](#430)) ([925a505](925a505)), closes [#381](#381) * add schema conversions when adding new rows to a table and schema conversion when creating a new table ([#432](#432)) ([6e9ff69](6e9ff69)), closes [#404](#404) [#322](#322) [#127](#127) [#322](#322) [#127](#127) * add test for empty tables for the method `Table.sort_rows` ([#431](#431)) ([f94b768](f94b768)), closes [#402](#402) * added color adjustment feature ([#409](#409)) ([2cbee36](2cbee36)), closes [#380](#380) * added test_repr table tests ([#410](#410)) ([cb77790](cb77790)), closes [#349](#349) * discretize table ([#327](#327)) ([5e3da8d](5e3da8d)), closes [#143](#143) * Improve error handling of TaggedTable ([#450](#450)) ([c5da544](c5da544)), closes [#150](#150) * Maintain tagging in methods inherited from `Table` class ([#332](#332)) ([bc73a6c](bc73a6c)), closes [#58](#58) * new error class `OutOfBoundsError` ([#438](#438)) ([1f37e4a](1f37e4a)), closes [#262](#262) * rename several `Table` methods for consistency ([#445](#445)) ([9954986](9954986)), closes [#439](#439) * suggest similar columns if column gets accessed that doesnt exist ([#385](#385)) ([6a097a4](6a097a4)), closes [#203](#203) ### Bug Fixes * added the missing ids in parameterized tests ([#412](#412)) ([dab6419](dab6419)), closes [#362](#362) * don't warn if `Imputer` transforms column without missing values ([#448](#448)) ([f0cb6a5](f0cb6a5)) * Warnings raised by underlying seaborn and numpy libraries ([#425](#425)) ([c4143af](c4143af)), closes [#357](#357)
🎉 This issue has been resolved in version 0.15.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Is your feature request related to a problem?
Discretization means to replace a continuous variable by a variable that only has a finite amount of values. This is a preprocessing step that we should support.
Desired solution
Discretizer
insafeds.data.tabular.transformation
that wraps theKBinsDiscretizer
ofscikit-learn
TableTransformer
__init__
should for now only have a parameternumber_of_bins
to control how many bins are creatednumber_of_bins
is less than 2, raise aValueError
The text was updated successfully, but these errors were encountered: