Skip to content
This repository has been archived by the owner on Jun 12, 2024. It is now read-only.

Support parallel evaluation on CUDA systems #6

Open
1 task
mdekstrand opened this issue Dec 8, 2022 · 5 comments
Open
1 task

Support parallel evaluation on CUDA systems #6

mdekstrand opened this issue Dec 8, 2022 · 5 comments
Milestone

Comments

@mdekstrand
Copy link
Member

When attempting to run parallel batch recommendation on CUDA-enabled systems, it fails with a CUDA initialization error in the worker process:

Traceback (most recent call last):
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/lenskit/util/parallel.py", line 130, in _mp_invoke_worker
    return __work_func(model, *args)
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/lenskit/batch/_recommend.py", line 19, in _recommend_user
    res = algo.recommend(user, n, candidates)
  File "/home/MICHAELEKSTRAND/LensKit/lenskit-implicit/lenskit_implicit/implicit.py", line 69, in recommend
    recs, scores = self.delegate.recommend(uid, matrix, N=i_n)
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/implicit/gpu/matrix_factorization_base.py", line 87, in recommend
    ids, scores = self.knn.topk(
  File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/implicit/gpu/matrix_factorization_base.py", line 122, in knn
    self._knn = implicit.gpu.KnnQuery()
  File "_cuda.pyx", line 47, in implicit.gpu._cuda.KnnQuery.__cinit__
RuntimeError: cublas error: CUBLAS_STATUS_NOT_INITIALIZED (/tmp/pip-req-build-b0ax806a/implicit/gpu/knn.cu:87)

Tagging @benfred in case he has any insight here.

Things to test

  • Deserializing in a new process with no concurrency involved
@benfred
Copy link

benfred commented Dec 10, 2022

The line where its failing is on initializing CUBLAS CHECK_CUBLAS(cublasCreate(&blas_handle));

Whats the GPU memory usage like when running this? According to https://discuss.pytorch.org/t/cuda-error-cublas-status-not-initialized-when-calling-cublascreate-handle/125450 - this call can fail if there is insufficient GPU memory available

@benfred
Copy link

benfred commented Dec 20, 2022

It also might be worth trying out multiprocessing.set_start_method("spawn") - https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing

@mdekstrand
Copy link
Member Author

@benfred Already doing that :) (although through a more indirect method — process pools are set up with a custom context that inherits from multiprocessing's SpawnContext).

@mdekstrand mdekstrand modified the milestones: 0.14.0, 0.15.0 Nov 3, 2023
@mdekstrand
Copy link
Member Author

I'm going to go ahead and cut an initial release without this, so I can get this released and have (rough) 0.14 parity before working on more substantial LensKit 0.15 changes.

@mdekstrand
Copy link
Member Author

It's currently running correctly in my perf-monitor project, so this might be fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants