Skip to content
/ bigClust Public

Stochastic nonsmooth optimization based clustering algorithm

License

Notifications You must be signed in to change notification settings

napsu/bigClust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BigClust - Stochastic Nonsmooth Optimization based Incremental Clustering Software (version 0.1)

BigClust is a nonsmooth optimization based clustering algorithm for solving the minimum sum-of-squares clustering (MSSC) problem in very large-scale and big data sets. BigClust consist of two different algorithms: an incremental algorithm is used to solve clustering problems globally and at each iteration of this algorithm the stochastic limited memory bundle algorithm (SLMBA) is used to solve both the clustering and the auxiliary clustering problems with different starting points. In addition to the k-partition problem, BigClust solves all intermediate l-partition problems where l=1,…,k-1 due to the incremental approach used.

Files included

  • bigclust.f95

    • Mainprogram for clustering software
  • initbigclust.f95

    • Initialization of clustering parameters and SLMBA. Includes modules:
      • initclust - Initialization of parameters for clustering.
      • initslmb - Initialization of SLMBA.
  • clusteringmod.f95

    • Subroutines for clustering software.
  • slmb.f95

    • SLMBA - Stochastic limited memory bundle algorithm.
  • objfun.f95

    • Computation of the cluster function and subgradients values.
  • subpro.f95

    • Subprograms for SLMBA.
  • parameters.f95

    • Parameters. Inludes modules:
      • r_precision - Precision for reals,
      • param - Parameters,
      • exe_time - Execution time.
  • Makefile

    • makefile: requires a Fortran compiler (gfortran) to be installed.

Installation and usage

To use the code:

  1. Modify initbigclust.f95 as needed. The least, select the dataset, give the number of data points, features, and the maximum number of clusters "nclust".
  2. Run Makefile (by typing "make"). Makefile uses gfortran as default.
  3. Finally, just type "./bigclust".

The algorithm returns a txt-file with clustering function values, Dunn and Davies-Bouldin validity indices and elapsed CPU-times up to nclust clusters. In addition, separate txt-file with the final cluster centers with nclust clusters and the solutions to all intermediate l-clustering problems with l = 1,...,nclust-1 is returned.

References:

Acknowledgements

The work was financially supported by the Research Council of Finland projects (Project No. #345804 and #345805) led by Antti Airola and Tapio Pahikkala.

About

Stochastic nonsmooth optimization based clustering algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published