Python Library for Generalised Non-Negative Matrix Factorisation (GNMF)
A Python implementation of GNMF method introduced in article by Lekinwala and Bhushan in an article "Generalised Non-Negative Matrix Factorisation for Air Pollution Source Apportionment".
An article describing the library is submitted to SoftwareX (Lekinwala and Bhushan, 2022).
pyGNMF
can be installed from pypi using the command pip install pyGNMF
Due to size restrictions of the GitHub, dataset used in Lekinwala and Bhushan (2022) is not made available, however, a smaller dataset with 20 samples, 6 columns and with 3 underlying factor is included in the test
folder.
Upon installation, the test files can be downloaded by cloning the git repository using git clone https://github.com/niravl/pyGNMF
. The test folder contains 3 files,
IllustrativeExample.mat
: Test matrix discussed in the article.test_data_generation.py
: Python script to generate theIllustrativeExample.mat
file, changes can be made to change the dimensions of the dataset.test_running_method.py
: Python script which uses theIllustrativeExample.mat
file to test different functions of thepyGNMF
module.
GNMF factorises matrix
The covariance matrix (
- Covariance Matrix as Identity: In this case, GNMF method with multiplicative update is same as NMF method proposed by Lee and Seung, 1999.
- Covariance Matrix is Diagonal Matrix: In this case, GNMF method with multiplicative update is same as LS-NMF method (Wang et. al., 2006).
- Covariance Matrix with elements correlated along rows or columns: In this case, GNMF method with multiplicative update is same as glsNMF method (Plis et. al. 2011).
- Covariance Matrix is Dense Matrix: Lekinwala and Bhushan, 2022 proposes the GNMF method with multiplicative and projected gradient updates.
Following are some class as part of the module.
-
gnmf_multiplicative_update
: There are four functions as part of this class,-
update_F
: This function is used for the update of F. -
update_G
: This function is used for the update of G. -
objective_function
: This function is used to compute the value of objective function after each iteration. -
running_method
: This function is used to run the GNMF method with multiplicative update. A user can call the function in the following way and pass required inputs. Following are the details about the inputs,
Usage:
G_updated, F_updated, objective_function = pyGNMF.gnmf_multiplicative_update.running_method( X_matrix, covariance, option=('row_stacked', 'column_stacked'), G_init='random', F_init='random', num_fact=None, num_init=1, max_iter=500000, tolerance=1e-06, conv_typ=('absolute', 'relative'), conv_num=3)
where,
-
X_matrix
(required): Matrix to factorise intonum_fact
factors. -
covariance
(required): Covariance (matrix of size$nm\times nm$ ) between different the elements of$X_{n\times m}$ Matrix. -
option=('row_stacked', 'column_stacked')
(required): Option to specify how the elements of the covariance matrix ($nm\times nm$ ) are stacked. If the covariance corresponds to row-stacked elements of$X$ matrix, 'row_stacked' is used else 'column_stacked' is used. -
G_init='random'
(required): Non-negative initial guess for$G$ of size$n\times p$ . Ifrandom
keyword is used,G_init
is generated randomly internally. -
F_init='random'
(required): Non-negative initial guess for$F$ of size$p\times m$ . Ifrandom
keyword is used,F_init
is generated randomly internally. -
num_fact=None
($p$ , required): A critical parameter for the GNMF to work i.e., the number of factors for$X$ matrix. -
num_init=1
(optional): Each method can be initialised multiple times depending on value of this parameter. Default value is 1. -
max_iter=500000
(optional): Each initialisation of a method under consideration will run till convergence or till the maximum number of iterations defined by value if this parameter. Default value is 500000. -
tolerance=1e-06
(optional): Each initialisation of a method under consideration will run till convergence or till the maximum number of iterations. This parameter defines the tolerance value for the convergence. Default value is 1e-6. -
conv_typ=('absolute', 'relative')
(optional): Each initialisation of a method under consideration will run till convergence or maximum number of iterations. This parameter defines the type of convergence i.e.,absolute
difference in the objective function value orrelative
difference in the objective function value. Default value isrelative
. -
conv_num=3
(optional): Each initialisation of a method under consideration will run till convergence or till maximum number of iterations. Itf a method converges,conv_num
parameter defines the minimum number of iteration which are required to have convergence criteria below tolerance. Default value is 3.
-
-
gnmf_projected_gradient
: There are four functions as part of the class,-
update_F
: This function is used for the update of F. -
update_G
: This function is used for the update of G. -
objective_function
: This function is used to compute the value of objective function -
running_method
: This function is used to run the method under consideration. Following are the inputs required.
Usage:
G_updated, F_updated, objective_function = gnmf_projected_gradient.running_method( X_matrix, covariance, G_init='random', F_init='random', beta=0.1, sigma=0.0001, alpha_init_G=1, alpha_init_F=1, option=('row_stacked', 'column_stacked'), num_fact=None, num_init=1, max_iter=500000, tolerance=1e-06, conv_typ=('absolute', 'relative'), conv_num=3)
where,
-
X_matrix
(required): Matrix to factorise -
covariance
(required): Covariance (matrix of size$nm\times nm$ ) between different the elements of$X_{n\times m}$ Matrix. -
G_init='random'
(required): Non-negative Initial guess for$G$ of size$n\times p$ . Ifrandom
keyword is used,G_init
is generated randomly internally. -
F_init='random'
(required): Non-negative Initial guess for$F$ of size$p\times m$ . Ifrandom
keyword is used,F_init
is generated randomly internally. -
option=('row_stacked', 'column_stacked')
(required):Option to specify how the elements of the covariance matrix ($nm\times nm$ ) are stacked. If the covariance corresponds to row-stacked elements of$X$ matrix, 'row_stacked' is used else 'column_stacked' is used. -
beta = 0.1
(optional):$\beta$ value used to reduce the value of initial step-length ($\alpha$ ) while search for$\alpha$ to achieve sufficient decrease. Default value is 0.1. -
sigma=0.0001
(optional): User-defined parameter used in sufficient decrease -
alpha_init_G=1
: Initial step-length for the update of$G$ . Default value is 1. -
alpha_init_F=1
: Initial step-length for the update oF$F$ . -
num_fact=None
($p$ , required): A critical parameter for the GNMF to work i.e., the number of factors for$X$ matrix. -
num_init=1
(optional): Each method can be initialised multiple times depending on this parameter. Default value is 1. -
max_iter=500000
(optional): Each initialisation of a method under consideration will run till convergence or till the maximum number of iterations defined by this parameters. Default value is 500000. -
tolerance=1e-06
(optional): Each initialisation of a method under consideration will run till convergence or till the maximum number of iterations. This parameter defines the tolerance value for the convergence. -
conv_typ=('absolute', 'relative')
(optional): Each initialisation of a method under consideration will run till convergence. This parameter defines the type of convergence i.e.,absolute
difference in the objective function value orrelative
difference in the objective function value. Default value isrelative
. -
conv_num=3
(optional): Each initialisation of a method under consideration will run till convergence.conv_num
parameter is used to declare convergence only if theabsolute
orrelative
difference is less than tolerance value for some iterations. Default value is 3.
-
-
G_updated, F_updated, objective_function = nmf_multiplicative_update
: There are two functions as part of the class,-
objective_function
: This function is used to compute the value of objective function -
running_method
: This function is used to run the method under consideration. Following are the inputs required.
Usage:
nmf_multiplicative_update.running_method(X_matrix, G_init='random', F_init='random', num_fact=None, num_init=1, max_iter=500000, tolerance=1e-06, conv_typ='relative', conv_num=3)
where,
-
X_matrix
(required): Matrix to factorise -
G_init='random'
(required): Non-negative Initial guess for$G$ of size$n\times p$ . Ifrandom
keyword is used,G_init
is generated randomly internally. -
F_init='random'
(required): Non-negative Initial guess for$F$ of size$p\times m$ . Ifrandom
keyword is used,F_init
is generated randomly internally. -
num_fact=None
($p$ , required): A critical parameter for the GNMF to work i.e., the number of factors for$X$ matrix. -
num_init=1
(optional): Each method can be initialised multiple times depending on this parameter. Default value is 1. -
max_iter=500000
(optional): Each initialisation of a method under consideration will run till convergence or till the maximum number of iterations defined by this parameters. Default value is 500000. -
tolerance=1e-06
(optional): Each initialisation of a method under consideration will run till convergence or till the maximum number of iterations. This parameter defines the tolerance value for the convergence. -
conv_typ=('absolute', 'relative')
(optional): Each initialisation of a method under consideration will run till convergence. This parameter defines the type of convergence i.e.,absolute
difference in the objective function value orrelative
difference in the objective function value. Default value isrelative
. -
conv_num=3
(optional): Each initialisation of a method under consideration will run till convergence.conv_num
parameter is used to declare convergence only if theabsolute
orrelative
difference is less than tolerance value for some iterations. Default value is 3.
-
[1] Nirav L. Lekinwala and Mani Bhushan, Generalised non-negative matrix factorisation for air pollution source apportionment Science of The Total Environment (2022), 156294, ISSN 0048-9697, https://doi.org/10.1016/j.scitotenv.2022.156294.
[2] Nirav L. Lekinwala and Mani Bhushan,
pyGNMF
: A python library for implementation of generalised non-negative matrix factorisation method
SoftwareX, 2022, (Submitted)
SSRN Preprint Article Available
[3] Lee, D. and Seung, H. Learning the parts of objects by non-negative matrix factorization Nature (1999), 401(6755):788–791. https://doi.org/10.1038/44565
[4] Wang, G., Kossenkov, A. V., and Ochs, M. F. LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics (2006), 7(1):175. https://doi.org/10.1186/1471-2105-7-175
[5] Plis, S.M., Potluru, V.K., Lane, T. et al. Correlated Noise: How it Breaks NMF, and What to Do About it J Sign Process Syst. (2011), 65, 351–359. https://doi.org/10.1007/s11265-010-0511-8