JuML (Juelich Machine Learning library) is a parallel, high-performance software library that makes large-scale data analysis in common HPC-setups simple. It supports common machine learning algorithms such as Gaussian Naive Bayes, K-Means or Neural Networks, bundled with a rich set of convenience functions for tasks like data normalization. JuML provides high-level APIs for C++ and Python that allow to compute on both, native CPUs as well as accelerators like GPGPUs.
JuML requires a number of mandatory software packages, and has additional dependencies. The build process, described below, will automatically check for their presence, but will not install them. Make sure you have everything set up in advance:
Mandatory:
- cmake
- arrayfire
- mpi
- hdf5
Optional:
- gtest
- cuda
- opencl
- doxygen
- python
- numpy
- mpi4py
- swig
JuML is built using CMake. To build it, run the following in JuML's main directory:
mkdir build && cd build && cmake .. && make
The code is documented in the source headers in Doxygen format. A browsable HTML version can be generated with doxygen:
make doc
Please refer to the index.html in the build/doc directory. Further information can be obtained here.
Testing requires that JuML is built (see above). Once the library is built, the test suite can be executed:
make vtest
Individual tests for modules can be run using:
ctest -R <TEST_NAME>
For more in-depth information about CMake's testing capabilities, refer to CMake Testing.
JuML is published under the liberal terms of the BSD License. Although the BSD License does not require you to share any modifications you make to the source code, you are very much encouraged and invited to contribute back your modifications to the community, preferably in a Github fork, of course.