Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help diagnosing performance issues #19

Open
jonathanstrong opened this issue Feb 27, 2021 · 3 comments
Open

help diagnosing performance issues #19

jonathanstrong opened this issue Feb 27, 2021 · 3 comments

Comments

@jonathanstrong
Copy link

hello,

thanks for releasing this library! I really like the design of the api.

I was wondering if you had any ideas to help me figure out some performance issues I am facing. I am using Pca but experiencing relatively long runtimes and high memory usage. I saved the data I was training on as a npy file and ran it through sklearn PCA and it was very quick (minutes vs < 5s). I also tried the linfa rust implementation and got good performance.

The size of my (f64) data is (17105, 900), which shouldn't be too bad, I wouldn't think. I am using the openblas feature (and not any others -- default-fatures = false).

I'm not mentioning the other libraries to criticize this one, it just confused me, as I looked at your code compared to the others and didn't see anything that would explain why I was experiencing such a big difference. Do you have any instinct for what could be going on?

@msk
Copy link
Contributor

msk commented Feb 27, 2021

Could you try RandomizedPca? sklearn uses the full SVD (the algorithm Pca implements) for a small input (< 500x500), but uses a randomized truncated SVD, like RandomizedPca, for larger inputs.

@jonathanstrong
Copy link
Author

yes RandomizedPca is much better, thank you.

somewhat related performance question: I was trying to modify FastIca to only compute for n_components instead of ncols, but I could not seem to figure it out. do you have any sense of whether that should be an easy fix, or hard fix, in the current codebase? what I mean is, is it simply a means of slicing the arrays in all the right places, or is some more substantial refactor required for that?

@msk
Copy link
Contributor

msk commented Mar 10, 2021

Slicing the array should suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants