Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Methods + ideas behind further speeding up ML algorithms - Linear Regression, PCA, KNN #1009

Open
danielhanchen opened this issue May 23, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@danielhanchen
Copy link

Hey there!
First, fabulous work on speeding up Sklearn! I'm the author of Hyperlearn (https://github.com/danielhanchen/hyperlearn), where I also tried investigating how to speed up algos on the CPU (all the way from SSE4 to AVX512) and GPUs (was previously at NVIDIA's RAPIDS team - mainly responsible for 2000x faster TSNE, 50% less memory for SVD, and was working on sparse randomized svd, fast NMF etc)

I was trying to understand the main methods behind some of the algorithms.

  1. Linear regression Tall skinny matrices can be dramatically sped up using the normal equations inv(X.T @ X) @ (X.T @ y). Sklearn sadly just uses the slow SVD approach gelsd. Sklearn guarantees stability though. During my investigation, approximately fast linear regression for tall skinny matrices is:
XTX = X.T @ X = ssyrk(X)
XTy = X.T @ y = sgemm(X.T, y) / sgemv(X.T, y)
U = spotrf(XTX)
z = U@b = strtrs(U.T, Xty) [U.T@(U@b) = XTy]
b = strtrs(U, z) [U@b = z]

Cholesky can cause issues, so Ridge Regularization can help. This approach is probably the least amount of FLOPs possible to find a solution. Some workspace memory can be reduced as well.

However, in my findings, pivoted cholesky (spstrf) can make the solution in fact solve all solutions, even ill conditioned ones, albeit with a higher condition number. The pivoting is much more tricky to handle.

Is this approximately what Intel's version is using to speedup linear regression?

  1. PCA PCA requires (X - mean(X)) first. However, we show surprisingly that rather if you use syevd / syevr to find the eigenvectors, you can in fact bypass copying the matrix X for mean removal. You can use a smart matrix trick after expanding the formula to save O(NP) memory. [N = rows, P = columns]

  2. PCA Scipy's syed / syevr comparison funnily was one my old hyperlearn results. EIGH very very slow --> suggesting an easy fix scipy/scipy#9212 A long time agao I mentioned how large matrices must use syevd for superior performance, and syevr is good for smaller matrices. We found a heuristic of selecting which to use (syevr / syevd).

  3. KNN. Nearest neighbors is actually quite interesting to improve on CPUs. Pairwise distances can easily be made faster directly for Euclidean and Cosine similarity / distance using some matrix algebra. Both require the computation of X @ X.T, which you can use again ssyrk again.

  4. Randomized SVD. Not sure if Intel's package has dramatically sped this one up. We spent a lot of time into the weeds. We found by replacing the range finder from QR (we can even toggle the finding of R off) to stable LU decomposition then using stable cholesky + some other sneaking tricks, huge speedups was possible with few loss in accuracy. Likewise Randomized PCA on sparse data was 100% possible using our special PCA trick I said.

Anyways I was just curious on the details behind the algorithms Intel has devised :)) All my code is available for use - though now I'm running a startup Moonshot (www.moonshotai.org) were our goal is to make all software faster!

Thanks for your time!

@napetrov napetrov added the enhancement New feature or request label Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants