Getting "cudaMallocAsync not supported with this CUDA driver/runtime version" #7636

pedastrian57 · 2023-02-01T07:23:09Z

pedastrian57
Feb 1, 2023

I am trying to run Spark 3.2.2 with rapids with rapids-4-spark_2.12-22.10.0.jar in local.

But getting the following error

ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down! ai.rapids.cudf.CudfException: RMM failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni-release-3-cuda11/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/cuda_async_memory_resource.hpp:90: cudaMallocAsync not supported with this CUDA driver/runtime version at ai.rapids.cudf.Rmm.initializeInternal(Native Method) at ai.rapids.cudf.Rmm.initialize(Rmm.java:119) at com.nvidia.spark.rapids.GpuDeviceManager$.initializeRmm(GpuDeviceManager.scala:296) at com.nvidia.spark.rapids.GpuDeviceManager$.initializeMemory(GpuDeviceManager.scala:328) at com.nvidia.spark.rapids.GpuDeviceManager$.initializeGpuAndMemory(GpuDeviceManager.scala:137) at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:258)

I am using Cuda version 11.6 with

NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6

Can someone help?

Answered by abellina

Feb 2, 2023

@pedastrian57 I am not sure we have seen this before, honestly it's the first time I see usage with vGPU. I took a look at some internal threads on this and it looks like for a vGPU setup you need to enable Unified Memory for this allocator to work (cudaMallocAsync). If you are able to, you could try this: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#enabling-unified-memory-vgpu.

In our project we normally fallback to another allocator if we detect that the driver/kernel are too old. In this case we detected everything was fine, based on version numbers, but then the async feature failed to start up after that. We should be able to make this less painful and retry w…

View full answer

pedastrian57 · 2023-02-01T07:24:27Z

pedastrian57
Feb 1, 2023
Author

I can provide other necessary information if required

2 replies

abellina Feb 1, 2023
Collaborator

@pedastrian57, I am sorry you are running into this. The plugin should automatically pick a different allocator if it detects the hardware/driver cannot support it. In you case the driver and the runtime (since it's statically linked) look to be just fine. A few more pieces of information would help:

What OS are you running?
Can you provide the full output of nvidia-smi? We are mainly interested in what GPU(s) you have to make sure they are at compute capability 6.0 or above.
Can you provide the full command you use to start the plugin? And double checking you are not using the cuDF jar directly. In the past, we used to require the cuDF jar to be submitted to spark in addition to the rapids-4-spark jar, but with the version you are using the cuDF jar should not be used.
It would be good to check that libcuda.so points to the right thing. It likely is, but here's how I would check. In my case it points to 520.61.05:

  $ ldconfig -p |grep libcuda.so
  libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1
  libcuda.so.1 (libc6) => /usr/lib/i386-linux-gnu/libcuda.so.1
  libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so
  libcuda.so (libc6) => /usr/lib/i386-linux-gnu/libcuda.so

  $ ls -las  /usr/lib/x86_64-linux-gnu/|grep libcuda.so
     0 lrwxrwxrwx   1 root root        12 Sep 29 04:22 libcuda.so -> libcuda.so.1
     0 lrwxrwxrwx   1 root root        20 Sep 29 04:22 libcuda.so.1 -> libcuda.so.520.61.05
 25672 -rw-r--r--   1 root root  26284256 Sep 29 00:56 libcuda.so.520.61.05

pedastrian57 Feb 1, 2023
Author

Sure

We are using nixos
Full command: spark-shell --master local[*] --conf spark.executor.memory=1G --conf spark.executor.cores=4 --driver-memory 1g --conf spark.yarn.executor.memoryOverhead=600 --conf spark.locality.wait=0s --conf spark.sql.files.maxPartitionBytes=1G --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.executor.resource.gpu.amount=1 --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh --conf spark.rapids.sql.explain=ALL --jars rapids-4-spark_2.12-22.10.0.jar --files /getGpusResources.sh
@abellina

abellina · 2023-02-02T15:26:35Z

abellina
Feb 2, 2023
Collaborator

@pedastrian57 I am not sure we have seen this before, honestly it's the first time I see usage with vGPU. I took a look at some internal threads on this and it looks like for a vGPU setup you need to enable Unified Memory for this allocator to work (cudaMallocAsync). If you are able to, you could try this: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#enabling-unified-memory-vgpu.

In our project we normally fallback to another allocator if we detect that the driver/kernel are too old. In this case we detected everything was fine, based on version numbers, but then the async feature failed to start up after that. We should be able to make this less painful and retry with our fallback allocator, but I am kind of glad it broke since we've learned something new here.

If you are not able to try the Unified Memory suggestion, please set --conf spark.rapids.memory.gpu.pool=ARENA and this will set the pool to our arena allocator. This allocator has more of a chance of running OOM with GPU memory fragmentation, and the reason why we default to cudaMallocAsync (or --conf spark.rapids.memory.gpu.pool=ASYNC), since this async allocator can defragment itself.

1 reply

abellina Feb 2, 2023
Collaborator

Filed this #7649 for us to improve this startup issue.

pedastrian57 · 2023-02-02T17:22:26Z

pedastrian57
Feb 2, 2023
Author

@abellina Thanks for the reply, with the arena allocator, its working on local now.

0 replies

pedastrian57 · 2023-03-24T12:01:46Z

pedastrian57
Mar 24, 2023
Author

@abellina
Since our last discussion, we have solved the issues with yarn and currently, we can run our application on yarn
But after we started doing some benchmarking in this setup we are facing some issues which I will mention below with the necessary screenshots
Our spark-submit command:
spark-submit --master yarn --deploy-mode client --conf spark.executor.memory=100G --conf spark.executor.cores=30 --num-executors=3 --conf spark.rapids.sql.input.ParquetScan=false --conf spark.rapids.sql.format.parquet.reader.type=MULTITHREADED --conf spark.rapids.memory.gpu.pool=NONE --conf spark.task.cpus=1 --driver-memory 30g --conf spark.yarn.executor.memoryOverhead=600 --conf spark.locality.wait=0s --conf spark.sql.files.maxPartitionBytes=1G --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.executor.resource.gpu.discoveryScript=/bin/getGpusResources.sh --conf spark.rapids.sql.concurrentGpuTasks=1 --conf spark.rapids.sql.multiThreadedRead.numThreads=30 --conf spark.rapids.memory.gpu.debug=STDOUT --conf spark.rapids.sql.explain=ALL --conf spark.executor.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=0.03 --jars /rapids-4-spark_2.12-22.10.0.jar --queue root.default --conf spark.executor.extraLibraryPath=$LD_LIBRARY_PATH --packages org.apache.spark:spark-hive_2.12:3.2.2,org.apache.spark:spark-hive-thriftserver_2.12:3.2.2 /OurApp-with-dependencies.jar
Now the issue we are facing is
We are seeing some speedups but mostly the scan time is quite low even for the queries we are getting speedups
So I tried to use this param --conf spark.rapids.sql.input.ParquetScan=false but it is also of no use as I can see the gpu scan parquet operator is being used still
I have also tried using spark.rapids.sql.format.parquet.reader.type=MULTITHREADED and also COALESCING but there is no considerable change
If you want I can create a separate GitHub discussion thread too
Thank you in advance

4 replies

jlowe Mar 24, 2023
Maintainer

Since this is not related to the cudaMallocAsync issue, it would be best to put this in a separate discussion if you don't mind.

Regarding disabling Parquet reads, you can try setting spark.rapids.sql.format.parquet.read.enabled=false. I'll look into the issue with spark.rapids.sql.input.ParquetScan and see if there's a bug there.

pedastrian57 Mar 24, 2023
Author

@jlowe Sure, I would create another discussion thread for this and try what you have recommended

abellina Mar 24, 2023
Collaborator

Just as an aside, @pedastrian57 and I have been going back and forth a bit on slack from RAPIDS-GoAi. I've asked to change some of the configs, specifically number of CPU threads (to much lower since this is 4 executors slicing a single A100), rapids metrics level=DEBUG (as I think the slowness in the scan is probably just the semaphore), pool to ARENA. He's going to try a few things and try and get us some more info here or via slack.

jlowe Mar 24, 2023
Maintainer

I believe the issue with spark.rapids.sql.input.ParquetScan occurs because ParquetScan is not being used in your particular query. ParquetScan is a Spark Scan which is used as part of the datasource v2 API in Spark. By default, Parquet and ORC use datasource V1 which means ParquetScan is not used, and that's why spark.rapids.sql.input.ParquetScan does not have any effect. (Both V1 and V2 APIs eventually end up using the same GPU reader code, but ParquetScan is only tied to datasource V2.). You could verify this by removing parquet from the value of spark.sql.sources.useV1SourceList (e.g.: set it to empty string), and then I would expect spark.rapids.sql.input.ParquetScan to take effect.

I'd recommend sticking with spark.rapids.sql.format.parquet.read.enabled for controlling Parquet reads, as that should work regardless of whether datasource V1 or V2 is being used. I'll file an issue to make the docs more clear about the scan controls and point to the format configs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting "cudaMallocAsync not supported with this CUDA driver/runtime version" #7636

{{title}}

Replies: 4 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Getting "cudaMallocAsync not supported with this CUDA driver/runtime version" #7636

pedastrian57 Feb 1, 2023

Replies: 4 comments · 7 replies

pedastrian57 Feb 1, 2023 Author

abellina Feb 1, 2023 Collaborator

pedastrian57 Feb 1, 2023 Author

abellina Feb 2, 2023 Collaborator

abellina Feb 2, 2023 Collaborator

pedastrian57 Feb 2, 2023 Author

pedastrian57 Mar 24, 2023 Author

jlowe Mar 24, 2023 Maintainer

pedastrian57 Mar 24, 2023 Author

abellina Mar 24, 2023 Collaborator

jlowe Mar 24, 2023 Maintainer

pedastrian57
Feb 1, 2023

Replies: 4 comments 7 replies

pedastrian57
Feb 1, 2023
Author

abellina Feb 1, 2023
Collaborator

pedastrian57 Feb 1, 2023
Author

abellina
Feb 2, 2023
Collaborator

abellina Feb 2, 2023
Collaborator

pedastrian57
Feb 2, 2023
Author

pedastrian57
Mar 24, 2023
Author

jlowe Mar 24, 2023
Maintainer

pedastrian57 Mar 24, 2023
Author

abellina Mar 24, 2023
Collaborator

jlowe Mar 24, 2023
Maintainer