-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Import cudf on non-CUDA enabled machine #3661
Comments
This will be extremely challenging as cuDF Python uses Cython which loads That being said, there's nothing preventing you from installing CUDA drivers and the CUDA toolkit on a machine without GPUs which would make all of this work. I imagine this is pretty cumbersome, but making everything work without loading I'm going to close this as out of scope and my thoughts are that it would be significantly easier to decouple the dask meta object type from the partition type than to not load the CUDA libraries in cuDF. |
I can't do this from userspace though, right? This requires system administrator privileges? To be clear, my motivation here is to allow users on non-GPU devices (like a MacBook) to drive RAPIDS work on the cloud. |
The cuda toolkit is userspace. A
I understand that, but from my perspective a much easier means to that end is to decouple the Dask meta objects to allow them to use Pandas with cudf partitions. You're going to have the same problems with CuPy, PyTorch, etc. when you go to try to use their GPU objects. |
I'm not sure that decoupling meta will solve this problem. For example if we want to call The meta problem doesn't bother me that much. We can lie to dask dataframe, give it a Pandas dataframe as a meta object even when the underlying data will be cudf dataframes, and I suspect that things will be OK most of the time. |
I'm also not saying that we need to be able to import cudf. I'm just trying to figure out what the best way is to do this. |
Yea I understand the motivation and agree that it's the right user experience. Given we're going to supporting larger than the Pandas API in the future I don't think we can rely purely on Pandas to do meta calculations for us unfortunately. From my perspective, the easiest path forward is for us to somehow break our dependency on the driver library (libcuda.so) which is MUCH harder to get installed on a system without GPUs, and then expect the user to install the CUDA toolkit which is userspace (there's conda packages). Then for zero sized cudf objects there shouldn't be any kernel launches so we could get away with not needing a GPU present for |
Another way of solving this is to run some local code remotely ( dask/distributed#4003 ). |
This should be fixable today since nothing links to |
|
Do we load that while running |
I think it's only the We could defer import of the |
That sounds like a bug in our CMake then because we're definitely using the |
Scratch that, not a cmake bug, we totally link it: |
Running Edit: wonder if it's getting llinked statically for me somewhere. |
It may not get linked if nothing is used from it |
I would like to be able to
import cudf
and refer to cudf methods and classes on a machine that does not have a GPU.This is particularly useful when using cudf with Dask. While my local machine may not have an attached GPU, my Dask workers may. When using Dask with cudf I need to be able to refer to cudf functions so that I can place them into a task graph, but don't actually need to run them locally.
This fails today with an import error saying that we can't import
libcuda.so.1
. Is it feasible to make importing cudf robust to these errors?The text was updated successfully, but these errors were encountered: