You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this looks like its been broken since release 0.2, do we not have a test for this? Do we have an idea of what exactly it causes? If perf issue how much does it affect it.
Not clear about the XGBoost case. but for PCA case. with 4.6G parquet data , the column is of ArrayType(DoubleType) and the array size if 2048. the perf will drop from 6 seconds to 8 minutes in this case.
Describe the bug
when using ColumnarRdd in Spark 3.1.2 (all after), the Shim layer will always use the default value(false) for
exportColumnRdd
, e.g. https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/sql-plugin/src/main/311until320-nondb/scala/com/nvidia/spark/rapids/shims/v2/Spark31XShims.scala#L457-L459Steps/Code to reproduce bug
Calling ColumnarRdd in any version after 311, will cause this problem (XGBoost, PCA train):
Expected behavior
There should not be "columnar to row" conversion.
Environment details (please complete the following information)
Spark 3.1.2 Standalone
Additional context
Version before 311 is fine.
The text was updated successfully, but these errors were encountered: