Open
Description
The following snippet
import numpy as np
import sklearn
device = "cpu"
# device = "gpu:0"
from sklearnex import patch_sklearn
patch_sklearn()
sklearn.set_config(target_offload=f"{device}")
from sklearn.cluster import KMeans
seed = 123
rng = np.random.default_rng(seed)
n_samples = 50_000_000
dim = 14
n_clusters = 127
data = rng.random((n_samples, dim), dtype=np.float32)
init = rng.random((n_clusters, dim), dtype=np.float32)
kmeans = KMeans(n_clusters=n_clusters, algorithm="lloyd", init=init, max_iter=100, tol=0, n_init=1)
%time kmeans.fit(data)
show for device=cpu
:
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
CPU times: user 8min 24s, sys: 4.31 s, total: 8min 28s
Wall time: 8.76 s
(CPU with 224 cores)
and when device="gpu:0"
(running with a max series gpu) it's very slow (I have it running for several minuts now, it's not over yet). On 100x less data it completes in about 4.5sc, extrapolating from that, the walltime would be almost an hour.
We show with the implementation provided in the sklearn-numba-dpex project that this amount of data can run in less than 10sc on max series too.
Environment:
- Linux 5.15 kernel
- conda installation of scikit-learn-intelex and
dpcpp-cpp-rt
with-c conda
channel