Skip to content

KMeans is slow on gpu #1444

Open
Open
@fcharras

Description

@fcharras

The following snippet

import numpy as np
import sklearn

device = "cpu"
# device = "gpu:0"
from sklearnex import patch_sklearn
patch_sklearn()
sklearn.set_config(target_offload=f"{device}")
from sklearn.cluster import KMeans

seed = 123
rng = np.random.default_rng(seed)

n_samples = 50_000_000
dim = 14
n_clusters = 127

data = rng.random((n_samples, dim), dtype=np.float32)
init = rng.random((n_clusters, dim), dtype=np.float32)

kmeans = KMeans(n_clusters=n_clusters, algorithm="lloyd", init=init, max_iter=100, tol=0, n_init=1)
%time kmeans.fit(data)

show for device=cpu:

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
CPU times: user 8min 24s, sys: 4.31 s, total: 8min 28s
Wall time: 8.76 s

(CPU with 224 cores)

and when device="gpu:0" (running with a max series gpu) it's very slow (I have it running for several minuts now, it's not over yet). On 100x less data it completes in about 4.5sc, extrapolating from that, the walltime would be almost an hour.

We show with the implementation provided in the sklearn-numba-dpex project that this amount of data can run in less than 10sc on max series too.

Environment:

  • Linux 5.15 kernel
  • conda installation of scikit-learn-intelex and dpcpp-cpp-rt with -c conda channel

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions