Needed for python utils like data_generator
and scatter_plot
.
pip install -r py_utils/requirements.txt
Unit testing for C++.
git clone https://github.com/catchorg/Catch2.git
This script can be used to generate random datasets. If attribute -k
is specified the dataset contains clusterized points
python3 py_utils/data_generator.py -n 1000 -d 3 -k 4 -min 0 -max 10 -o datasets/3Dpoints.csv
This script can be used to display results (reading csv output files).
python3 py_utils/scatter_plot.py -f 3Dpoints.csv -d 3
These are the commands that can be used to compile GPU Kmeans.
Notice: cmake is required.
mkdir -p build
cd build
cmake ..
cmake --build .
This will build the target gpukmeans
to build/src/bin/gpukmeans
.
This code has is provided in scripts/build.sh
so that (from the root of the project) you can run ./scripts/build.sh
to build gpukmeans
.
The target unit_kernels
is excluded from ALL_TARGETS
therefore it has to be compiled explicitly.
cmake --build . --target unit_kernels
Use the script scripts/run_tests.sh
to build and run unit tests.
The executable gpukmeans
reads inputs from stdin
or from csv file. The directory datasets
contains some examples of data input.
./build/src/bin/gpukmeans --help
gpukmeans is an implementation of the K-means algorithm that uses a GPU
Usage:
gpukmeans [OPTION...]
-h, --help Print usage
-d, --n-dimensions arg Number of dimensions of a point
-n, --n-samples arg Number of points
-k, --n-clusters arg Number of clusters
-m, --maxiter arg Maximum number of iterations
-o, --out-file arg Output filename
-i, --in-file arg Input filename
-r, --runs arg Number of k-means runs (default: 1)
-s, --seed arg Seed for centroids generator
-t, --tolerance arg Tolerance to declare convergence
Example:
./build/src/bin/gpukmeans -d 64 -n 1797 -k 10 -m 300 -o res.csv -i ./datasets/N1797_D64_digits-sklearn.csv -t 0.0001
Runs gpukmeans
on the dataset N1797_D64_digits-sklearn.csv
considering: 64 dimensions, 1797 points, 10 clusters, 300 maximum iterations, tolerance 1e-4, and output the results to res.csv
.