NMF is a python program that applies a choice of nonnegative matrix factorization (NMF) algorithms to a dataset for clustering.
Currently, this program supports
- Multiplicative Updates (MU)1
- Alternating Least Squares (ALS)2
- Alternating Nonnegative Least Squares with Active Set (ANLS - AS)3
Experimental results with abcnews-date-test.csv's headline_text
Multiplicative Updates (MU):
Alternating Least Squares (ALS):
-
Usage: main.py [-h] -f FILENAME -c COL_NAME -m {sklearn,all,als,anls,mu} [-d DATA_FRAC] [-r RANDOM_SAMPLE] [-n NUM_MAX_FEATURE] [-s CLUSTER_SIZE] [-k NUM_CLUSTERS] [-i NUM_ITERS] [-p PRINT_ENABLED]
-
Required arguments:
- -f FILENAME, --filename FILENAME
the input file name - -c COL_NAME, --col_name COL_NAME
the column of the input csv file for nonnegative matrix factorization. - -m {sklearn,all,als,anls_as,mu}, --method {sklearn,all,als,anls_as,mu}
the NMF method to apply
- -f FILENAME, --filename FILENAME
-
Optional arguments:
- -h, --help
show this help message and exit - -d DATA_FRAC, --data_frac DATA_FRAC
the amount of the data to be used - -r RANDOM_SAMPLE, --random_sample RANDOM_SAMPLE
if set False, disables random sampling of the data - -n NUM_MAX_FEATURE, --num_max_feature NUM_MAX_FEATURE
the maximum number of features to be discovered in the dataset - -s CLUSTER_SIZE, --cluster_size CLUSTER_SIZE
the number of features in each cluster - -k NUM_CLUSTERS, --num_clusters NUM_CLUSTERS
the number of clusters to be discovered - -i NUM_ITERS, --num_iters NUM_ITERS
the number of iterations to run a NMF algorithm - -p PRINT_ENABLED, --print_enabled PRINT_ENABLED
if ture, output print statements
- -h, --help
Algorithms for Non-negative Matrix Factorizations by D. Lee and H. Seung,
https://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf
Algorithms and applications for approximate nonnegative matrix factorization by M. Berry,
https://www.sciencedirect.com/science/article/pii/S0167947306004191
Non-negative Matrix Factorization Based on Alternating Non-negativity Constrained Least Squares and ActiveSet Method by H. Kim and H. Park,
https://www.cc.gatech.edu/~hpark/papers/simax-nmf.pdf
Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons,
https://www.cc.gatech.edu/~hpark/papers/SISC_082117RR_Kim_Park.pdf