-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Version: numba_0.20.0dev3 and main
The three following dpctl calls 1 2 3 have huge wall time on edge devcloud (measured ranging from 10 to 30ms each call by py-spy, see speedscope report):
On the devcloud this add about 80 seconds to the k-means benchmark (for an expected 10 seconds).
I didn't see the issue on a local machine, but maybe the remaining small overhead that we reported comes from there.
@oleksandr-pavlyk not sure if this should be considered as an unreasonable use in numba_dpex (those calls should be expected to be that long and cached ?) or a bug in dpctl.
I've experimenting with caching the values and can confirm that caching those 3 calls completely remove the overhead.
Regarding the scope of the cache, I'll check if a hotfix that consists in storing those value in a WeakKeyDictionary where keys are val, and usm_mem, and wrapping SyclDevice(device) call in a lru_cache, is enough. (if so, will monkey-patch in sklearn_numba_dpex in the meantime).