Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discretize Table #143

Closed
lars-reimann opened this issue Apr 1, 2023 · 3 comments · Fixed by #327
Closed

Discretize Table #143

lars-reimann opened this issue Apr 1, 2023 · 3 comments · Fixed by #327
Assignees
Labels
released Included in a release

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Apr 1, 2023

Is your feature request related to a problem?

Discretization means to replace a continuous variable by a variable that only has a finite amount of values. This is a preprocessing step that we should support.

Desired solution

  • Add a class Discretizer in safeds.data.tabular.transformation that wraps the KBinsDiscretizer of scikit-learn
  • Make the class a subclass of TableTransformer
  • The __init__ should for now only have a parameter number_of_bins to control how many bins are created
  • If number_of_bins is less than 2, raise a ValueError
@github-project-automation github-project-automation bot moved this to Backlog in Library Apr 1, 2023
@zzril zzril moved this from Backlog to Todo in Library May 19, 2023
@robmeth robmeth self-assigned this May 19, 2023
@robmeth robmeth moved this from Todo to In Progress in Library May 19, 2023
@robmeth robmeth linked a pull request May 26, 2023 that will close this issue
@robmeth robmeth moved this from In Progress to 🧱 Blocked in Library Jun 9, 2023
@robmeth robmeth moved this from 🧱 Blocked to In Progress in Library Jun 9, 2023
@robmeth robmeth moved this from In Progress to 🧱 Blocked in Library Jun 9, 2023
@guenterk
Copy link

guenterk commented Jun 9, 2023

@robmeth : Please add a comment explaining the problem why you marked this as blocked (you mentioned in the final stand up meeting today the failing pandasEqualsTest...).

@Marsmaennchen221
Copy link
Contributor

@robmeth Use the ordinal encoding. This transforms the data and returns the bin index rather than a bin as sparse matrix. This will also resolve the problem with the tests. See #327 (comment)

@robmeth robmeth moved this from 🧱 Blocked to In Progress in Library Jun 23, 2023
@robmeth robmeth moved this from In Progress to Ready for Review in Library Jun 23, 2023
robmeth added a commit that referenced this issue Jul 7, 2023
Closes #143.

### Summary of Changes

* Added a class `Discretizer` in `safeds.data.tabular.transformation`
that wraps the [`KBinsDiscretizer` of
`scikit-learn`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)
* Made the class a subclass of `TableTransformer`
* The `__init__` for now only has a parameter `number_of_bins` to
control how many bins are created
* If `number_of_bins` is less than 2, it raises a `ValueError`
@github-project-automation github-project-automation bot moved this from Ready for Review to ✔️ Done in Library Jul 7, 2023
lars-reimann pushed a commit that referenced this issue Jul 13, 2023
## [0.15.0](v0.14.0...v0.15.0) (2023-07-13)

### Features

* Add copy method for tables ([#405](#405)) ([72e87f0](72e87f0)), closes [#275](#275)
* add gaussian noise to image ([#430](#430)) ([925a505](925a505)), closes [#381](#381)
* add schema conversions when adding new rows to a table and schema conversion when creating a new table ([#432](#432)) ([6e9ff69](6e9ff69)), closes [#404](#404) [#322](#322) [#127](#127) [#322](#322) [#127](#127)
* add test for empty tables for the method `Table.sort_rows` ([#431](#431)) ([f94b768](f94b768)), closes [#402](#402)
* added color adjustment feature ([#409](#409)) ([2cbee36](2cbee36)), closes [#380](#380)
* added test_repr table tests ([#410](#410)) ([cb77790](cb77790)), closes [#349](#349)
* discretize table ([#327](#327)) ([5e3da8d](5e3da8d)), closes [#143](#143)
* Improve error handling of TaggedTable ([#450](#450)) ([c5da544](c5da544)), closes [#150](#150)
* Maintain tagging in methods inherited from `Table` class ([#332](#332)) ([bc73a6c](bc73a6c)), closes [#58](#58)
* new error class `OutOfBoundsError` ([#438](#438)) ([1f37e4a](1f37e4a)), closes [#262](#262)
* rename several `Table` methods for consistency ([#445](#445)) ([9954986](9954986)), closes [#439](#439)
* suggest similar columns if column gets accessed that doesnt exist ([#385](#385)) ([6a097a4](6a097a4)), closes [#203](#203)

### Bug Fixes

* added the missing ids in parameterized tests ([#412](#412)) ([dab6419](dab6419)), closes [#362](#362)
* don't warn if `Imputer` transforms column without missing values ([#448](#448)) ([f0cb6a5](f0cb6a5))
* Warnings raised by underlying seaborn and numpy libraries  ([#425](#425)) ([c4143af](c4143af)), closes [#357](#357)
@lars-reimann
Copy link
Member Author

🎉 This issue has been resolved in version 0.15.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Included in a release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants