Skip to content

Commit ceeef70

Browse files
committed
ENH refactor NMF and add CD solver
1 parent b9284a7 commit ceeef70

File tree

11 files changed

+20616
-366
lines changed

11 files changed

+20616
-366
lines changed

.gitattributes

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
/sklearn/cluster/_k_means.c -diff
66
/sklearn/datasets/_svmlight_format.c -diff
77
/sklearn/decomposition/_online_lda.c -diff
8+
/sklearn/decomposition/cdnmf_fast.c -diff
89
/sklearn/ensemble/_gradient_boosting.c -diff
910
/sklearn/feature_extraction/_hashing.c -diff
1011
/sklearn/linear_model/cd_fast.c -diff

benchmarks/bench_plot_nmf.py

+9-15
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from sklearn.externals.six.moves import xrange
1717

1818

19-
def alt_nnmf(V, r, max_iter=1000, tol=1e-3, R=None):
19+
def alt_nnmf(V, r, max_iter=1000, tol=1e-3, init='random'):
2020
'''
2121
A, S = nnmf(X, r, tol=1e-3, R=None)
2222
@@ -33,8 +33,8 @@ def alt_nnmf(V, r, max_iter=1000, tol=1e-3, R=None):
3333
tol : double
3434
tolerance threshold for early exit (when the update factor is within
3535
tol of 1., the function exits)
36-
R : integer, optional
37-
random seed
36+
init : string
37+
Method used to initialize the procedure.
3838
3939
Returns
4040
-------
@@ -52,12 +52,7 @@ def alt_nnmf(V, r, max_iter=1000, tol=1e-3, R=None):
5252
# Nomenclature in the function follows Lee & Seung
5353
eps = 1e-5
5454
n, m = V.shape
55-
if R == "svd":
56-
W, H = _initialize_nmf(V, r)
57-
elif R is None:
58-
R = np.random.mtrand._rand
59-
W = np.abs(R.standard_normal((n, r)))
60-
H = np.abs(R.standard_normal((r, m)))
55+
W, H = _initialize_nmf(V, r, init, random_state=0)
6156

6257
for i in xrange(max_iter):
6358
updateH = np.dot(W.T, V) / (np.dot(np.dot(W.T, W), H) + eps)
@@ -78,17 +73,15 @@ def report(error, time):
7873

7974

8075
def benchmark(samples_range, features_range, rank=50, tolerance=1e-5):
81-
it = 0
8276
timeset = defaultdict(lambda: [])
8377
err = defaultdict(lambda: [])
8478

85-
max_it = len(samples_range) * len(features_range)
8679
for n_samples in samples_range:
8780
for n_features in features_range:
8881
print("%2d samples, %2d features" % (n_samples, n_features))
8982
print('=======================')
9083
X = np.abs(make_low_rank_matrix(n_samples, n_features,
91-
effective_rank=rank, tail_strength=0.2))
84+
effective_rank=rank, tail_strength=0.2))
9285

9386
gc.collect()
9487
print("benchmarking nndsvd-nmf: ")
@@ -122,7 +115,7 @@ def benchmark(samples_range, features_range, rank=50, tolerance=1e-5):
122115
gc.collect()
123116
print("benchmarking random-nmf")
124117
tstart = time()
125-
m = NMF(n_components=30, init=None, max_iter=1000,
118+
m = NMF(n_components=30, init='random', max_iter=1000,
126119
tol=tolerance).fit(X)
127120
tend = time() - tstart
128121
timeset['random-nmf'].append(tend)
@@ -132,7 +125,7 @@ def benchmark(samples_range, features_range, rank=50, tolerance=1e-5):
132125
gc.collect()
133126
print("benchmarking alt-random-nmf")
134127
tstart = time()
135-
W, H = alt_nnmf(X, r=30, R=None, tol=tolerance)
128+
W, H = alt_nnmf(X, r=30, init='random', tol=tolerance)
136129
tend = time() - tstart
137130
timeset['alt-random-nmf'].append(tend)
138131
err['alt-random-nmf'].append(np.linalg.norm(X - np.dot(W, H)))
@@ -151,7 +144,8 @@ def benchmark(samples_range, features_range, rank=50, tolerance=1e-5):
151144
timeset, err = benchmark(samples_range, features_range)
152145

153146
for i, results in enumerate((timeset, err)):
154-
fig = plt.figure('scikit-learn Non-Negative Matrix Factorization benchmark results')
147+
fig = plt.figure('scikit-learn Non-Negative Matrix Factorization'
148+
'benchmark results')
155149
ax = fig.gca(projection='3d')
156150
for c, (label, timings) in zip('rbgcm', sorted(results.iteritems())):
157151
X, Y = np.meshgrid(samples_range, features_range)

doc/modules/decomposition.rst

+37-11
Original file line numberDiff line numberDiff line change
@@ -656,12 +656,12 @@ into two matrices :math:`W` and :math:`H` of non-negative elements,
656656
by optimizing for the squared Frobenius norm:
657657

658658
.. math::
659-
\arg\min_{W,H} ||X - WH||^2 = \sum_{i,j} X_{ij} - {WH}_{ij}
659+
\arg\min_{W,H} \frac{1}{2} ||X - WH||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {WH}_{ij})^2
660660
661-
This norm is an obvious extension of the Euclidean norm to matrices.
662-
(Other optimization objectives have been suggested in the NMF literature,
663-
in particular Kullback-Leibler divergence,
664-
but these are not currently implemented.)
661+
This norm is an obvious extension of the Euclidean norm to matrices. (Other
662+
optimization objectives have been suggested in the NMF literature, in
663+
particular Kullback-Leibler divergence, but these are not currently
664+
implemented.)
665665

666666
Unlike :class:`PCA`, the representation of a vector is obtained in an additive
667667
fashion, by superimposing the components, without subtracting. Such additive
@@ -695,13 +695,34 @@ the mean of all elements of the data), and NNDSVDar (in which the zeros are set
695695
to random perturbations less than the mean of the data divided by 100) are
696696
recommended in the dense case.
697697

698-
:class:`NMF` can also be initialized with random non-negative matrices, by
699-
passing an integer seed or a ``RandomState`` to :attr:`init`.
698+
:class:`NMF` can also be initialized with correctly scaled random non-negative
699+
matrices by setting :attr:`init="random"`. An integer seed or a
700+
``RandomState`` can also be passed to :attr:`random_state` to control
701+
reproducibility.
700702

701-
In :class:`NMF`, sparseness can be enforced by setting the attribute
702-
:attr:`sparseness` to ``"data"`` or ``"components"``. Sparse components lead to
703-
localized features, and sparse data leads to a more efficient representation of
704-
the data.
703+
In :class:`NMF`, L1 and L2 priors can be added to the loss function in order
704+
to regularize the model. The L2 prior uses the Frobenius norm, while the L1
705+
prior uses an elementwise L1 norm. As in :class:`ElasticNet`, we control the
706+
combination of L1 and L2 with the :attr:`l1_ratio` (:math:`\rho`) parameter,
707+
and the intensity of the regularization with the :attr:`alpha`
708+
(:math:`\alpha`) parameter. Then the priors terms are:
709+
710+
.. math::
711+
\alpha \rho ||W||_1 + \alpha \rho ||H||_1
712+
+ \frac{\alpha(1-\rho)}{2} ||W||_{Fro} ^ 2
713+
+ \frac{\alpha(1-\rho)}{2} ||H||_{Fro} ^ 2
714+
715+
and the regularized objective function is:
716+
717+
.. math::
718+
\frac{1}{2}||X - WH||_{Fro}^2
719+
+ \alpha \rho ||W||_1 + \alpha \rho ||H||_1
720+
+ \frac{\alpha(1-\rho)}{2} ||W||_{Fro} ^ 2
721+
+ \frac{\alpha(1-\rho)}{2} ||H||_{Fro} ^ 2
722+
723+
:class:`NMF` regularizes both W and H. The public function
724+
:func:`non_negative_factorization` allows a finer control through the
725+
:attr:`regularization` attribute, and may regularize only W, only H, or both.
705726

706727
.. topic:: Examples:
707728

@@ -727,6 +748,11 @@ the data.
727748
<http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf>`_
728749
C. Boutsidis, E. Gallopoulos, 2008
729750

751+
* `"Fast local algorithms for large scale nonnegative matrix and tensor
752+
factorizations."
753+
<http://www.bsp.brain.riken.jp/publications/2009/Cichocki-Phan-IEICE_col.pdf>`_
754+
A. Cichocki, P. Anh-Huy, 2009
755+
730756

731757
.. _LatentDirichletAllocation:
732758

doc/whats_new.rst

+10-1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,16 @@ New features
5252
datasets. By `Danny Sullivan`_ and `Tom Dupre la Tour`_.
5353
(`#4738 <https://github.com/scikit-learn/scikit-learn/pull/4738>`_)
5454

55+
- The new solver ``cd`` implements a Coordinate Descent in
56+
:class:`decomposition.NMF`. Previous solver based on Projected Gradient is
57+
still available setting new parameter ``solver`` to ``pg``, but is
58+
deprecated and will be removed in 0.19, along with
59+
:class:`decompositionProjectedGradientNMF` and parameters``sparseness``,
60+
``eta``, ``beta`` and ``nls_max_iter``. New parameters ``alpha`` and
61+
``l1_ratio`` control L1 and L2 regularizations, and ``shuffle`` adds a
62+
shuffling step in ``cd`` solver.
63+
By `Tom Dupre la Tour`_ and `Mathieu Blondel`_.
64+
5565
Enhancements
5666
............
5767
- :class:`manifold.TSNE` now supports approximate optimization via the
@@ -192,7 +202,6 @@ Enhancements
192202
- Added ``sample_weight`` support to :class:`linear_model.LogisticRegression` for
193203
the ``lbfgs``, ``newton-cg``, and ``sag`` solvers. By `Valentin Stolbunov`_.
194204

195-
196205
Bug fixes
197206
.........
198207

examples/decomposition/plot_faces_decomposition.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,7 @@ def plot_gallery(title, images, n_col=n_col, n_row=n_row):
7171
True),
7272

7373
('Non-negative components - NMF',
74-
decomposition.NMF(n_components=n_components, init='nndsvda', beta=5.0,
75-
tol=5e-3, sparseness='components'),
74+
decomposition.NMF(n_components=n_components, init='nndsvda', tol=5e-3),
7675
False),
7776

7877
('Independent components - FastICA',

0 commit comments

Comments
 (0)