help diagnosing performance issues #19

jonathanstrong · 2021-02-27T10:33:15Z

hello,

thanks for releasing this library! I really like the design of the api.

I was wondering if you had any ideas to help me figure out some performance issues I am facing. I am using Pca but experiencing relatively long runtimes and high memory usage. I saved the data I was training on as a npy file and ran it through sklearn PCA and it was very quick (minutes vs < 5s). I also tried the linfa rust implementation and got good performance.

The size of my (f64) data is (17105, 900), which shouldn't be too bad, I wouldn't think. I am using the openblas feature (and not any others -- default-fatures = false).

I'm not mentioning the other libraries to criticize this one, it just confused me, as I looked at your code compared to the others and didn't see anything that would explain why I was experiencing such a big difference. Do you have any instinct for what could be going on?

The text was updated successfully, but these errors were encountered:

msk · 2021-02-27T16:32:09Z

Could you try RandomizedPca? sklearn uses the full SVD (the algorithm Pca implements) for a small input (< 500x500), but uses a randomized truncated SVD, like RandomizedPca, for larger inputs.

jonathanstrong · 2021-03-03T22:30:04Z

yes RandomizedPca is much better, thank you.

somewhat related performance question: I was trying to modify FastIca to only compute for n_components instead of ncols, but I could not seem to figure it out. do you have any sense of whether that should be an easy fix, or hard fix, in the current codebase? what I mean is, is it simply a means of slicing the arrays in all the right places, or is some more substantial refactor required for that?

msk · 2021-03-10T16:34:42Z

Slicing the array should suffice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help diagnosing performance issues #19

help diagnosing performance issues #19

jonathanstrong commented Feb 27, 2021

msk commented Feb 27, 2021

jonathanstrong commented Mar 3, 2021

msk commented Mar 10, 2021

help diagnosing performance issues #19

help diagnosing performance issues #19

Comments

jonathanstrong commented Feb 27, 2021

msk commented Feb 27, 2021

jonathanstrong commented Mar 3, 2021

msk commented Mar 10, 2021