Skip to content

Commit

Permalink
reduce memory_footprint for sparse PCA transform (#5964)
Browse files Browse the repository at this point in the history
The sparse PCA still densified `X` during the transform step. This defeats the purpose of a sparse PCA in a sense. However 
 ```
precomputed_mean_impact = self.mean_ @ self.components_.T
mean_impact = cp.ones((X.shape[0], 1)) @ precomputed_mean_impact.reshape(1, -1)
X_transformed = X.dot(self.components_.T) -mean_impact
```
is the same as
```
X = X - self.mean_
X_transformed = X.dot(self.components_.T)
```
The new implementation is faster (but mainly due to the fact that we don't have to rely on cupy's `to_array()`) and uses a lot less memory.

Authors:
  - Severin Dicks (https://github.com/Intron7)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5964
  • Loading branch information
Intron7 authored Jul 28, 2024
1 parent a8fda19 commit d4535d2
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions python/cuml/cuml/decomposition/pca.pyx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -632,8 +632,9 @@ class PCA(UniversalBase,
self.components_ *= cp.sqrt(self.n_samples_ - 1)
self.components_ /= self.singular_values_.reshape((-1, 1))

X = X - self.mean_
X_transformed = X.dot(self.components_.T)
precomputed_mean_impact = self.mean_ @ self.components_.T
mean_impact = cp.ones((X.shape[0], 1)) @ precomputed_mean_impact.reshape(1, -1)
X_transformed = X.dot(self.components_.T) -mean_impact

if self.whiten:
self.components_ *= self.singular_values_.reshape((-1, 1))
Expand Down

0 comments on commit d4535d2

Please sign in to comment.