KMeans is slow on gpu

The following snippet

```python
import numpy as np
import sklearn

device = "cpu"
# device = "gpu:0"
from sklearnex import patch_sklearn
patch_sklearn()
sklearn.set_config(target_offload=f"{device}")
from sklearn.cluster import KMeans

seed = 123
rng = np.random.default_rng(seed)

n_samples = 50_000_000
dim = 14
n_clusters = 127

data = rng.random((n_samples, dim), dtype=np.float32)
init = rng.random((n_clusters, dim), dtype=np.float32)

kmeans = KMeans(n_clusters=n_clusters, algorithm="lloyd", init=init, max_iter=100, tol=0, n_init=1)
%time kmeans.fit(data)
```

show for `device=cpu`:

```
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
CPU times: user 8min 24s, sys: 4.31 s, total: 8min 28s
Wall time: 8.76 s
```

(CPU with 224 cores)

and when `device="gpu:0"` (running with a max series gpu) it's very slow (I have it running for several minuts now, it's not over yet). On 100x less data it completes in about 4.5sc, extrapolating from that, the walltime would be almost an hour.

We show with the implementation provided in the [sklearn-numba-dpex](https://github.com/soda-inria/sklearn-numba-dpex) project that this amount of data can run in less than 10sc on max series too.

**Environment:**

- Linux 5.15 kernel
- conda installation of scikit-learn-intelex and `dpcpp-cpp-rt` with `-c conda` channel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KMeans is slow on gpu #1444

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

KMeans is slow on gpu #1444

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions