-
Symptom:
If I took Environment: My Spark configuration:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I'm guessing it means you don't have any GPUs available to schedule. You can find instructions on setting up standalone cluster here: One of the main things is making sure your worker is configured to have the GPU resources available: Once you bring it up you can check the spark master ui to make sure your workers have a GPU available to hand out. The other possibility, though you don't usually see that error message, would be GPUs they are available but the executors are crashing. If you see GPUs available on the workers in the UI then go check your executor logs to see if they are crashing on startup. If they are there should be an error message there with an indication of what is wrong. |
Beta Was this translation helpful? Give feedback.
I'm guessing it means you don't have any GPUs available to schedule.
You can find instructions on setting up standalone cluster here:
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-cluster
One of the main things is making sure your worker is configured to have the GPU resources available:
SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=1 -Dspark.worker.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh"
Once you bring it up you can check the spark master ui to make sure your workers have a GPU available to hand out.
The other possibility, though you don't usually see that error message, would be GPUs they are…