-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discretize: rounding problem #6876
Comments
Rounding was introduced in df34d90. I think we could (and probably should) simply add Decimal binning doesn't guarantee to give the exact number of intervals specified by the user, but returns the closest match across different possible "nice" thresholds. If rounding+unique decrease the number of intervals for a certain bin width, the method may choose another (smaller) width, or return smaller number of bins; both are OK. If you wish, change this (don't forget to add a simple test). |
What's wrong?
Using PCA on the Titanic dataset and discretizing the output results in strange rounding of the values by the disretization. This results with multiple values with the same "name".
This is the workflow I have (I have also included the .ows):
Here on the left is the Data Table that shows the results of the PCA (pay attention to the PC7 attribute). On the right we can see the results of the Discretize widget, the discretized PC7 attribute has been rounded strangely, there are also multiple PC7 values with the same "name" (highlighted).
How can we reproduce the problem?
Zip of the workflow:
discretize_bug.zip
To reproduce the problem, set the PCA components to 8 in the provided workflow.
What's your environment?
The text was updated successfully, but these errors were encountered: