Skip to content

GSoC Report

Dattatreya Mohapatra edited this page Aug 12, 2018 · 5 revisions

Hi, this wiki page lists down all my contributions to Boost uBLAS as part of my Google Summer of Code (GSoC) 2018 project -- "Add statistics and machine learning functions to Boost.uBLAS"

Basic statistical functions

  1. Create a new matrix unary operation class which returns a vector. This is required to support axis based methods. Commits - 802cd43, 189b6ed, 493dd9c
  2. min and max, with axis support. Commits - 802cd43
  3. sum, with axis support. Commits - db1245c
  4. mean, with axis support. Commits - c4950ef, 5feb08d
  5. mode, with axis support. Commits - 5058c80
  6. median, with axis support. Commits - 962b6e9
  7. variance, with axis support. Commits - dcb664e
  8. vector and matrix covariance. Commits - fc1a290, a8a722d
  9. Tests. Commits - 390eb3c, c6ee3fc

Histogram

  • Prototype and implementation with fixed number of bin-edges. Commits - 74479b1, 9cd0afd
  • Support for custom bin-edges. Commits - 9760c42
  • Add tests, remove bugs. Commits - 2119007, e0b80a4

KMeans [code]

  • Prototype and basic implementation with random initialization. Commits - 5467b96, 165b789, ac8b466
  • Add naive kmeans algorithm class. Commits - 6f3dbec, 77d9d7f
  • Add support for kmeans++ initialization. Commits - 157e554
  • Add support for Bradley-Fayyad initialization. Commits - a7d6f30
  • Add tests and remove bugs. Commits - 7b2346c, 4949971

Principal Component Analysis (PCA) [code]

  • Integrate GSoC 2015 eigensolver. Commits - 1d56042
  • Prototype and implementation. Commits - 205921d, 1105256

Gaussian Mixture Model (GMM) [code]

  • Prototype and basic implementation with random initialization. Commits - f35e899, 0af3817
  • Implement expectation-maximization algorithm for GMMs. Commits - 4c60526
  • Add tests. Commits - 0af3817, 7c98262
  • [TODO] Some tests are still failing for GMM due to imperfect training on random initialization. Adding KMeans initialization to GMM may correct this as better initial set of parameters would be chsoen. This would require reworking the GMM class to support initialization policies as a template parameter, similar to the KMeans class.