Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 #4334

Closed
wjxiz1992 opened this issue Dec 9, 2021 · 2 comments · Fixed by #4335
Assignees
Labels
bug Something isn't working

Comments

@wjxiz1992
Copy link
Collaborator

wjxiz1992 commented Dec 9, 2021

Describe the bug
when using ColumnarRdd in Spark 3.1.2 (all after), the Shim layer will always use the default value(false) for exportColumnRdd , e.g. https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/sql-plugin/src/main/311until320-nondb/scala/com/nvidia/spark/rapids/shims/v2/Spark31XShims.scala#L457-L459

Steps/Code to reproduce bug
Calling ColumnarRdd in any version after 311, will cause this problem (XGBoost, PCA train):
image

Expected behavior
There should not be "columnar to row" conversion.

Environment details (please complete the following information)
Spark 3.1.2 Standalone

Additional context
Version before 311 is fine.

@wjxiz1992 wjxiz1992 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 9, 2021
@tgravescs
Copy link
Collaborator

this looks like its been broken since release 0.2, do we not have a test for this? Do we have an idea of what exactly it causes? If perf issue how much does it affect it.

@wjxiz1992
Copy link
Collaborator Author

Not clear about the XGBoost case. but for PCA case. with 4.6G parquet data , the column is of ArrayType(DoubleType) and the array size if 2048. the perf will drop from 6 seconds to 8 minutes in this case.

without the fix in 3.1.2, training is 8 minutes
image

but in 3.0.1: 6 seconds
image

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants