how to print out GPUOverrides #9112

jackie71111 · 2023-08-25T00:17:18Z

jackie71111
Aug 25, 2023

using same setting, demo can print out GPUOverrides messages like below picture.
But in our project , those messages can`t print out.

jlowe · 2023-08-30T22:18:52Z

jlowe
Aug 30, 2023
Maintainer

Hi @jackie71111, sorry for the late reply.

There could be a number of reasons why you are not seeing the warning messages for the explain. If you have steps to reproduce the issue that would be very helpful. In the meantime I have some guesses that may explain how this is happening:

Are you sure the RAPIDS Accelerator is configured in your project? Do you see the WARN message that is emitted in the driver log on startup that the RAPIDS Accelerator is configured? e.g.: WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set spark.rapids.sql.enabled to false.
Is the logger for your project configured to allow WARN messages to be emitted? Sometimes projects are configured with logger settings that only log errors.
Is spark.rapids.sql.explain somehow being set to NONE or some other setting? The default is NOT_ON_GPU which will only log when something cannot be placed on the GPU. If you want to see everything, including things that are being placed on the GPU, you can set spark.rapids.sql.explain=ALL

0 replies

jackie71111 · 2023-08-31T02:47:02Z

jackie71111
Aug 31, 2023
Author

1, all config is enabled. and i can see the WARN messages about rapids enabled.
2, in my project i can print WARN message
3, spark.rapids.sql.explain=ALL

In one notebook, so the env setting are same.
The demo (churn.etl) can print out those mesages. but our project codes can not print.

0 replies

jackie71111 · 2023-08-31T03:28:08Z

jackie71111
Aug 31, 2023
Author

below is my spark configure :
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf

You need to update with your real hardware resource

SPARK_MASTER_URL = os.getenv("SPARK_MASTER_URL", "local[16]")
RAPIDS_JAR = os.getenv("RAPIDS_JAR", "/home/zshuang/Downloads/rapids-4-spark_2.12-23.06.0.jar")
driverMem = os.getenv("DRIVER_MEM", "50g")
executorMem = os.getenv("EXECUTOR_MEM", "32g")
maxPartionBytes = os.getenv("MAX_PARTITION_BYTES", "1g")
pinnedPoolSize = os.getenv("PINNED_POOL_SIZE", "8g")
concurrentGpuTasks = os.getenv("CONCURRENT_GPU_TASKS", "2")
executorCores = int(os.getenv("EXECUTOR_CORES", "16"))
gpuPerExecutor = 1/executorCores

Common spark settings

conf = SparkConf()
conf.setMaster(SPARK_MASTER_URL)
conf.setAppName("silicon-sight on GPU with rapids")
conf.set("spark.driver.memory", driverMem)

The tasks will run on GPU memory, so there is no need to set a high host memory

conf.set("spark.executor.memory", executorMem)

The tasks will run on GPU cores, so there is no need to use many cpu cores

#conf.set("spark.executor.cores", 2)
#conf.set("spark.locality.wait", "0")
#conf.set("spark.sql.files.maxPartitionBytes", maxPartionBytes)
conf.set("spark.dynamicAllocation.enabled", "false")
conf.set("spark.sql.adaptive.enabled", "true")

Plugin settings

#conf.set("spark.executor.resource.gpu.amount", "1")

2 tasks will run concurrently per GPU

conf.set("spark.rapids.sql.concurrentGpuTasks", concurrentGpuTasks)

Pinned 8g host memory to transfer data between GPU and host memory

conf.set("spark.rapids.memory.pinnedPool.size", pinnedPoolSize)

16 tasks will run concurrently per executor, as we set spark.executor.cores=16

#conf.set("spark.task.resource.gpu.amount", gpuPerExecutor)

conf.set("spark.rapids.sql.enabled", "true")
conf.set("spark.plugins", "com.nvidia.spark.SQLPlugin")
conf.set("spark.rapids.sql.variableFloatAgg.enabled", "true")
conf.set("spark.driver.extraClassPath", RAPIDS_JAR)
conf.set("spark.executor.extraClassPath", RAPIDS_JAR)
conf.set("spark.jars", RAPIDS_JAR)
conf.set("spark.rapids.sql.explain", "ALL")

"""
conf.set("spark.memory.fraction", 0.8)
conf.set("spark.default.parallelism", 24)
conf.set("spark.memory.storageFraction", 0.5)
conf.set("spark.executor.instances", 2)
conf.set("spark.cores.max", 16)
conf.set("spark.sql.shuffle.partitions", 24)
conf.set("spark.driver.maxResultSize", "32G")
"""

conf.set("spark.eventLog.enabled", "true")
conf.set("spark.eventLog.dir" , "/icd/dlsh_t2b/zshuang/JedAI/spark_event_log/wr")
#conf.set("spark.rapids.sql.udfCompiler.enabled", "true")
#conf.set("spark.rapids.sql.rowBasedUDF.enabled", "true")

Create spark session

spark = SparkSession.builder.config(conf=conf).getOrCreate()

0 replies

jlowe · 2023-08-31T14:22:21Z

jlowe
Aug 31, 2023
Maintainer

If the churn demo using the exact same notebook setup (and env settings) works but your project code does not, then that indicates the problem lies in the project code. If it's possible to create a small repro project that can be shared or if you can share the eventlog from the problematic project then that would be awesome. Here's another thing to check:

Is the project using the DataFrame or SQL APIs in Spark or just RDDs? The RAPIDS Accelerator only accelerates applications that are using the Spark DataFrame/SQL APIs (directly or indirectly). If you're not sure, check the Spark UI from the eventlog for the application. You should see one or more jobs on the Jobs tab (if the application is doing anything distributed with Spark). Assuming you see some jobs, there should be a "SQL / DataFrame" tab that will show at least one query. If there is no "SQL / DataFrame" tab or that tab contains no queries then the application is not using Spark's SQL / DataFrame API and that would explain why there's no explain output when running the project.

If you do see queries in the Spark UI for your project but still no output yet the churn demo on the same Spark env emits output then that would be quite mysterious. It would be interesting to know if you see any GPU operators replaced in the queries visible in the Spark SQL UI or if you see all the proper config settings under the Environment tab on the Spark UI.

Some extra notes from what I can see in the notebook setup:

Note that Spark does not support GPU scheduling in local mode. I assume that's why the *.resource.gpu.amount config settings are commented out. You may want to update the notebook to be dynamic based on whether the notebook is falling back to local mode or using cluster mode to only set those configs when running in a cluster mode.
The notebook mentions "The tasks will run on GPU cores, so there is no need to use many cpu cores" but this is not true in many cases. Spark still schedules tasks based on cores available in the executors, so running very few cores in an executor will cause very few tasks to run in parallel relative to a normal CPU cluster. There are some applications that can still run quite well with a significantly reduced executor or core count, but I would start out with the same core count first and then start pruning executors and/or cores from there.

1 reply

jackie71111 Sep 1, 2023
Author

i want to share spark_event_log to you, so you can help to check this problem, but the file is so big.

we just have one GPU core. can you give us a config setting example.

jackie71111 · 2023-09-01T08:21:34Z

jackie71111
Sep 1, 2023
Author

there is a profiling of SQL result. i found the cpu time is too high.

0 replies

jlowe · 2023-09-01T15:12:53Z

jlowe
Sep 1, 2023
Maintainer

i want to share spark_event_log to you, so you can help to check this problem, but the file is so big.

You could try to create a smaller version of the project that only executes a couple of SQL queries and still replicates the problem. That will generate a smaller eventlog (and you can also compress it). This effort may also help create a small repro case, which would really accelerate our ability to diagnose the logging issue. If we can reproduce the logging issue on our end, it won't take long to figure out what's happening. It may be interesting to start with the working churn demo and slowly add/replace pieces with the new project to see at what point the logging stops working. For example, if the project is added after doing the churn processing in the same application, does the logging still work up until the point the project processing occurs? What if the churn demo is placed after the project processing? If the logging works for the churn demo in the same Spark env as the project, the project must be doing something that suppresses the logging.

there is a profiling of SQL result. i found the cpu time is too high.

This is probably because a significant portion of these queries were not translated to GPU operations. Note that for each one it lists a potential problem of UDF, indicating there was a user defined function that could not be translated to the GPU. Depending upon how expensive the UDF is to calculate, how many rows were sent through the UDF, and what other operations fell back along with the UDF, it could be a significant amount of time spent doing CPU processing. Seeing the eventlog for one of these queries would help, along with any RAPIDS Accelerator explain output as to why some operations were not placed on the GPU. The eventlog would still be useful without the explain output. Seeing the SQL Plan Metrics for Application for one of the sqlID's in question could also help, as we will see most of the main operations in the plan and how expensive some of them were.

Note that even if all of the SQL operations of a query are translated to GPU accelerated operations, the CPU time will likely not be zero. There are some phases of a typical query where the CPU is still significantly involved, such as the processing of distributed filesystem read/write (especially if there is TLS encryption to handle) and shuffle data compress/decompress and read/write. So if a query is heavy on data transfers, either via read/write or shuffle, it will have a higher CPU time than one that does not.

1 reply

jackie71111 Sep 5, 2023
Author

hi， @jlowe
1, the logger is mark to INFO at before and after call ourself codes. But after call ourself codes the GPU warn message will not print .

2, I create a small spark event log zip file.
spark_event_log.zip
please help to check two point, one is why can not print warn messages. other is the rapids setting is correct or not.

jlowe · 2023-09-05T19:06:16Z

jlowe
Sep 5, 2023
Maintainer

Thanks for sharing the eventlog! I can see from the eventlog that the RAPIDS Accelerator is enabled and is operating on the queries. Almost everything gets translated to GPU operations. So the good news is the RAPIDS Accelerator appears to be operating as expected. Unfortunately, I cannot explain how the log messages are not being emitted in the driver log. The config settings look correct to me.

Have you had any luck whittling down the project code to a minimal reproduce? If the logging is working with the churn demo then I don't think the logging problem is an issue with the RAPIDS Accelerator. It's just using slf4j APIs for the logging, just like Spark, and in most cases is using the same Logging trait that Spark uses to perform logging. So if you're seeing log messages from Spark classes, I don't know why you wouldn't also be seeing log messages from the RAPIDS Accelerator unless the project code is explicitly configuring the logging to exclude it somehow.

My best guess at this point is something in the project is adjusting the logging setup and that's why it works for churn but not in the project. If I could reproduce it locally, I would next hook up a debugger and put a breakpoint at https://github.com/NVIDIA/spark-rapids/blob/branch-23.10/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala#L4507 to both verify that the RAPIDS Accelerator is trying to log and then step into the logWarning call to understand what is keeping the logger from logging (e.g.: is it really disabled). Alternatively, you could build your own version of the RAPIDS Accelerator jar with print messages around this area to verify it's being called and examine log.isWarnEnabled to verify it's allowed to log at warning level.

2 replies

jackie71111 Sep 6, 2023
Author

Hi， @jlowe
Thank you very much for your reply.
I want to get this warn messages is because the "Qualification Tool" tell use it will have 1.5 speedup. But we cannot get.
So I want to check the execute message to see the operator is run on the GPU or not.
1, If I dnowload the src codes from github , can I build a local jar of rapids, so I can add more debug messages in the src codes
2, do you have suggestions about the spark env setting, i still try to get the best result under current situation.

jackie71111 Sep 6, 2023
Author

i have one more question: Estimated GPU Speedup=3.17 which come from Qualification Tool, but the real speedup=1.7(1415.36 --> 824.86)

jlowe · 2023-09-06T14:21:10Z

jlowe
Sep 6, 2023
Maintainer

Regarding suggestions for config tuning, I recommend checking out the tuning guide. You might get a performance boost by tuning spark.sql.files.maxPartitionBytes, for example. I recommend looking at the time spent in each node in the query plan(s) that are slower than expected and see where all the time is being spent. Another way to tackle this is to examine the stages view and see which stages are taking a long time and which operations are being performed in those stages (from the stage DAG view). That can help focus tuning efforts on the operations that are taking the most time. I'm assuming the eventlog that was posted is just a toy example, as almost all queries are executing in under a second. Note that for such quick queriies, the overhead of Spark starts to be a significant factor.

Regarding the qualification tool, it is only an estimate and not. a guarantee of performance. There are some queries that will be mispredicted to some extent, since not all details are available in the eventlog. However we're always working to improve the accuracy of the qualification tool, and if you have a repro case we can analyze that would be great. Ideally we would be able to see the CPU eventlog fed to the qualification tool and the eventlog of the GPU run for the same query showing the lower-than-predicted performance. cc: @mattahrens

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to print out GPUOverrides #9112

{{title}}

Replies: 8 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how to print out GPUOverrides #9112

jackie71111 Aug 25, 2023

Replies: 8 comments · 4 replies

jlowe Aug 30, 2023 Maintainer

jackie71111 Aug 31, 2023 Author

jackie71111 Aug 31, 2023 Author

You need to update with your real hardware resource

Common spark settings

The tasks will run on GPU memory, so there is no need to set a high host memory

The tasks will run on GPU cores, so there is no need to use many cpu cores

Plugin settings

2 tasks will run concurrently per GPU

Pinned 8g host memory to transfer data between GPU and host memory

16 tasks will run concurrently per executor, as we set spark.executor.cores=16

Create spark session

jlowe Aug 31, 2023 Maintainer

jackie71111 Sep 1, 2023 Author

jackie71111 Sep 1, 2023 Author

jlowe Sep 1, 2023 Maintainer

jackie71111 Sep 5, 2023 Author

jlowe Sep 5, 2023 Maintainer

jackie71111 Sep 6, 2023 Author

jackie71111 Sep 6, 2023 Author

jlowe Sep 6, 2023 Maintainer

jackie71111
Aug 25, 2023

Replies: 8 comments 4 replies

jlowe
Aug 30, 2023
Maintainer

jackie71111
Aug 31, 2023
Author

jackie71111
Aug 31, 2023
Author

jlowe
Aug 31, 2023
Maintainer

jackie71111 Sep 1, 2023
Author

jackie71111
Sep 1, 2023
Author

jlowe
Sep 1, 2023
Maintainer

jackie71111 Sep 5, 2023
Author

jlowe
Sep 5, 2023
Maintainer

jackie71111 Sep 6, 2023
Author

jackie71111 Sep 6, 2023
Author

jlowe
Sep 6, 2023
Maintainer