-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ClusterBasedNormalizer
performance
#336
Labels
feature request
Request for a new feature
Comments
npatki
added
feature request
Request for a new feature
and removed
internal
The issue doesn't change the API or functionality
labels
Jun 10, 2022
The old |
npatki
changed the title
Improve
Improve Jun 10, 2022
BayesGMMTransformer
performanceClusterBasedNormalizer
performance
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
BayesGMMTransformer
should be experimented with to improve performace. The current parameters (theweight_threshold
and the default values passed to theBayesianGaussianMixture
) should be experimented with and new default values should be chosen.The code can also be sped up. The
reverse_transform
is already much quicker than the other two methods, andfit
takes almost all of its time fitting theBayesianGaussianMixture
, which is unavoidable. Instead, the biggest gains can be achieved by improving thetransform
method, specifically the following lines:RDT/rdt/transformers/numerical.py
Lines 625 to 632 in 6b07fee
These lines take the majority of the transformation runtime, so any improvement would significantly speedup the whole process.
The text was updated successfully, but these errors were encountered: