BigClust is a nonsmooth optimization based clustering algorithm for solving the minimum sum-of-squares clustering (MSSC) problem in very large-scale and big data sets. BigClust consist of two different algorithms: an incremental algorithm is used to solve clustering problems globally and at each iteration of this algorithm the stochastic limited memory bundle algorithm (SLMBA) is used to solve both the clustering and the auxiliary clustering problems with different starting points. In addition to the k-partition problem, BigClust solves all intermediate l-partition problems where l=1,…,k-1 due to the incremental approach used.
-
bigclust.f95
- Mainprogram for clustering software
-
initbigclust.f95
- Initialization of clustering parameters and SLMBA. Includes modules:
- initclust - Initialization of parameters for clustering.
- initslmb - Initialization of SLMBA.
- Initialization of clustering parameters and SLMBA. Includes modules:
-
clusteringmod.f95
- Subroutines for clustering software.
-
slmb.f95
- SLMBA - Stochastic limited memory bundle algorithm.
-
objfun.f95
- Computation of the cluster function and subgradients values.
-
subpro.f95
- Subprograms for SLMBA.
-
parameters.f95
- Parameters. Inludes modules:
- r_precision - Precision for reals,
- param - Parameters,
- exe_time - Execution time.
- Parameters. Inludes modules:
-
Makefile
- makefile: requires a Fortran compiler (gfortran) to be installed.
To use the code:
- Modify initbigclust.f95 as needed. The least, select the dataset, give the number of data points, features, and the maximum number of clusters "nclust".
- Run Makefile (by typing "make"). Makefile uses gfortran as default.
- Finally, just type "./bigclust".
The algorithm returns a txt-file with clustering function values, Dunn and Davies-Bouldin validity indices and elapsed CPU-times up to nclust clusters. In addition, separate txt-file with the final cluster centers with nclust clusters and the solutions to all intermediate l-clustering problems with l = 1,...,nclust-1 is returned.
-
BigClust and SLMBA:
- N. Karmitsa, V.-P. Eronen, M.M. Mäkelä, T. Pahikkala, A. Airola, "Stochastic limited memory bundle algorithm for clustering in big data", 2024
-
Clustering:
- A. Bagirov, N. Karmitsa, S Taheri, "Partitional Clustering via Nonsmooth Optimization", Springer, 2020.
-
LMBM:
- N. Haarala, K. Miettinen, M.M. Mäkelä, "Globally Convergent Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization", Mathematical Programming, Vol. 109, No. 1, pp. 181-205, 2007.
- M. Haarala, K. Miettinen, M.M. Mäkelä, "New Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization", Optimization Methods and Software, Vol. 19, No. 6, pp. 673-692, 2004.
-
Nonsmooth optimization:
- A. Bagirov, N. Karmitsa, M.M. Mäkelä, "Introduction to nonsmooth optimization: theory, practice and software", Springer, 2014.
The work was financially supported by the Research Council of Finland projects (Project No. #345804 and #345805) led by Antti Airola and Tapio Pahikkala.