Support parallel evaluation on CUDA systems #6

mdekstrand · 2022-12-08T18:30:44Z

When attempting to run parallel batch recommendation on CUDA-enabled systems, it fails with a CUDA initialization error in the worker process:

Traceback (most recent call last):
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/lenskit/util/parallel.py", line 130, in _mp_invoke_worker
    return __work_func(model, *args)
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/lenskit/batch/_recommend.py", line 19, in _recommend_user
    res = algo.recommend(user, n, candidates)
  File "/home/MICHAELEKSTRAND/LensKit/lenskit-implicit/lenskit_implicit/implicit.py", line 69, in recommend
    recs, scores = self.delegate.recommend(uid, matrix, N=i_n)
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/implicit/gpu/matrix_factorization_base.py", line 87, in recommend
    ids, scores = self.knn.topk(
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/implicit/gpu/matrix_factorization_base.py", line 122, in knn
    self._knn = implicit.gpu.KnnQuery()
  File "_cuda.pyx", line 47, in implicit.gpu._cuda.KnnQuery.__cinit__
RuntimeError: cublas error: CUBLAS_STATUS_NOT_INITIALIZED (/tmp/pip-req-build-b0ax806a/implicit/gpu/knn.cu:87)

Tagging @benfred in case he has any insight here.

Things to test

Deserializing in a new process with no concurrency involved

The text was updated successfully, but these errors were encountered:

benfred · 2022-12-10T21:47:40Z

The line where its failing is on initializing CUBLAS CHECK_CUBLAS(cublasCreate(&blas_handle));

Whats the GPU memory usage like when running this? According to https://discuss.pytorch.org/t/cuda-error-cublas-status-not-initialized-when-calling-cublascreate-handle/125450 - this call can fail if there is insufficient GPU memory available

benfred · 2022-12-20T22:01:30Z

It also might be worth trying out multiprocessing.set_start_method("spawn") - https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing

mdekstrand · 2022-12-21T17:17:13Z

@benfred Already doing that :) (although through a more indirect method — process pools are set up with a custom context that inherits from multiprocessing's SpawnContext).

mdekstrand · 2023-11-03T18:26:35Z

I'm going to go ahead and cut an initial release without this, so I can get this released and have (rough) 0.14 parity before working on more substantial LensKit 0.15 changes.

mdekstrand · 2023-11-04T16:49:04Z

It's currently running correctly in my perf-monitor project, so this might be fixed.

mdekstrand added this to the 0.14.0 milestone Dec 8, 2022

benfred mentioned this issue Dec 10, 2022

ALS missing _knn after unpickling on GPU benfred/implicit#631

Closed

mdekstrand modified the milestones: 0.14.0, 0.15.0 Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support parallel evaluation on CUDA systems #6

Support parallel evaluation on CUDA systems #6

mdekstrand commented Dec 8, 2022

benfred commented Dec 10, 2022

benfred commented Dec 20, 2022

mdekstrand commented Dec 21, 2022

mdekstrand commented Nov 3, 2023

mdekstrand commented Nov 4, 2023

Support parallel evaluation on CUDA systems #6

Support parallel evaluation on CUDA systems #6

Comments

mdekstrand commented Dec 8, 2022

benfred commented Dec 10, 2022

benfred commented Dec 20, 2022

mdekstrand commented Dec 21, 2022

mdekstrand commented Nov 3, 2023

mdekstrand commented Nov 4, 2023