run testcase for Native-sql-engine 1.2, Got many errors #510

songjx010 · 2021-09-15T13:18:43Z

Describe the bug
I run the build and test command for gazelle_plugin 1.2, and got some errors.
code version as below:

arrow-4.0.0-oap-1.2.0-release.zip
gazelle_plugin-1.2.0-release.zip

errors as below:
*** RUN ABORTED ***
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(Lscala/collection/immutable/Map;Lscala/collection/Seq;Lorg/apache/spark/sql/SparkSession;)Lscala/Option;
at org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.inferSchema(ParquetUtils.scala:107)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:170)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:208)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)

As you can see, the error is in file /arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala.
And, I found that the mergeSchemasInParallel method has 3 arguments. But, that is 4 arguments in vanilla spark 3.1.
I modified ParquetFileFormat.scala as below, then it can run without this error.

diff ParquetFileFormat.scala ParquetFileFormat.scala.old

439d438
< parameters: Map[String, String],
455c454
< SchemaMergeUtils.mergeSchemasInParallel(sparkSession, parameters, filesToTouch, reader)

SchemaMergeUtils.mergeSchemasInParallel(sparkSession, null, filesToTouch, reader)

But, I got some other errors as below, Please check whether these errors are normal. I can send you a detailed log later.

Fall back to use row-based operators, error is last(value#22596)() is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.streaming.StateStoreSaveExec))
Fall back to use row-based operators, error is variance(cast(a#51859 as double)) is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.exchange.ShuffleExchangeExec))
20:59:34.377 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:59:34.442 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:57:03.655 ERROR org.apache.spark.sql.execution.streaming.MicroBatchExecution: Query [id = a188d962-aa09-487f-a6e6-1b2beacf0583, runId = 31cd986e-0306-451b-95db-115ca3800057] terminated with error
org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)

To Reproduce
build and run the test for gazelle_plugin 1.2
build cmd: mvn clean package -DskipTests -Dcpp_tests=OFF -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip
cmd: mvn test -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip

Expected behavior
run test successfully, no errors.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

songjx010 · 2021-09-26T11:22:05Z

Another question: are the test cases correctly executed for PRs?

songjx010 added the bug Something isn't working label Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run testcase for Native-sql-engine 1.2, Got many errors #510

run testcase for Native-sql-engine 1.2, Got many errors #510

songjx010 commented Sep 15, 2021

songjx010 commented Sep 26, 2021 •

edited

Loading

run testcase for Native-sql-engine 1.2, Got many errors #510

run testcase for Native-sql-engine 1.2, Got many errors #510

Comments

songjx010 commented Sep 15, 2021

diff ParquetFileFormat.scala ParquetFileFormat.scala.old

439d438 < parameters: Map[String, String], 455c454 < SchemaMergeUtils.mergeSchemasInParallel(sparkSession, parameters, filesToTouch, reader)

songjx010 commented Sep 26, 2021 • edited Loading

439d438
< parameters: Map[String, String],
455c454
< SchemaMergeUtils.mergeSchemasInParallel(sparkSession, parameters, filesToTouch, reader)

songjx010 commented Sep 26, 2021 •

edited

Loading