[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 #4334

wjxiz1992 · 2021-12-09T08:04:10Z

Describe the bug
when using ColumnarRdd in Spark 3.1.2 (all after), the Shim layer will always use the default value(false) for exportColumnRdd , e.g. https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/sql-plugin/src/main/311until320-nondb/scala/com/nvidia/spark/rapids/shims/v2/Spark31XShims.scala#L457-L459

Steps/Code to reproduce bug
Calling ColumnarRdd in any version after 311, will cause this problem (XGBoost, PCA train):

Expected behavior
There should not be "columnar to row" conversion.

Environment details (please complete the following information)
Spark 3.1.2 Standalone

Additional context
Version before 311 is fine.

The text was updated successfully, but these errors were encountered:

tgravescs · 2021-12-09T15:11:29Z

this looks like its been broken since release 0.2, do we not have a test for this? Do we have an idea of what exactly it causes? If perf issue how much does it affect it.

wjxiz1992 · 2021-12-09T15:30:28Z

Not clear about the XGBoost case. but for PCA case. with 4.6G parquet data , the column is of ArrayType(DoubleType) and the array size if 2048. the perf will drop from 6 seconds to 8 minutes in this case.

without the fix in 3.1.2, training is 8 minutes

but in 3.0.1: 6 seconds

wjxiz1992 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 9, 2021

wjxiz1992 assigned wbo4958 Dec 9, 2021

wbo4958 mentioned this issue Dec 9, 2021

Fix the issue of exporting Column RDD [databricks] #4335

Merged

wbo4958 closed this as completed in #4335 Dec 13, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label Dec 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 #4334

[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 #4334

wjxiz1992 commented Dec 9, 2021 •

edited

Loading

tgravescs commented Dec 9, 2021

wjxiz1992 commented Dec 9, 2021

[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 #4334

[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 #4334

Comments

wjxiz1992 commented Dec 9, 2021 • edited Loading

tgravescs commented Dec 9, 2021

wjxiz1992 commented Dec 9, 2021

wjxiz1992 commented Dec 9, 2021 •

edited

Loading