Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

run testcase for Native-sql-engine 1.2, Got many errors #510

Open
songjx010 opened this issue Sep 15, 2021 · 1 comment
Open

run testcase for Native-sql-engine 1.2, Got many errors #510

songjx010 opened this issue Sep 15, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@songjx010
Copy link

Describe the bug
I run the build and test command for gazelle_plugin 1.2, and got some errors.
code version as below:

  1. arrow-4.0.0-oap-1.2.0-release.zip
  2. gazelle_plugin-1.2.0-release.zip

errors as below:
*** RUN ABORTED ***
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(Lscala/collection/immutable/Map;Lscala/collection/Seq;Lorg/apache/spark/sql/SparkSession;)Lscala/Option;
at org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.inferSchema(ParquetUtils.scala:107)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:170)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:208)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)

As you can see, the error is in file /arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala.
And, I found that the mergeSchemasInParallel method has 3 arguments. But, that is 4 arguments in vanilla spark 3.1.
I modified ParquetFileFormat.scala as below, then it can run without this error.

diff ParquetFileFormat.scala ParquetFileFormat.scala.old

439d438
< parameters: Map[String, String],
455c454
< SchemaMergeUtils.mergeSchemasInParallel(sparkSession, parameters, filesToTouch, reader)

SchemaMergeUtils.mergeSchemasInParallel(sparkSession, null, filesToTouch, reader)

But, I got some other errors as below, Please check whether these errors are normal. I can send you a detailed log later.

Fall back to use row-based operators, error is last(value#22596)() is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.streaming.StateStoreSaveExec))
Fall back to use row-based operators, error is variance(cast(a#51859 as double)) is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.exchange.ShuffleExchangeExec))
20:59:34.377 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:59:34.442 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:57:03.655 ERROR org.apache.spark.sql.execution.streaming.MicroBatchExecution: Query [id = a188d962-aa09-487f-a6e6-1b2beacf0583, runId = 31cd986e-0306-451b-95db-115ca3800057] terminated with error
org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)

To Reproduce
build and run the test for gazelle_plugin 1.2
build cmd: mvn clean package -DskipTests -Dcpp_tests=OFF -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip
cmd: mvn test -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip

Expected behavior
run test successfully, no errors.

Additional context
Add any other context about the problem here.

@songjx010 songjx010 added the bug Something isn't working label Sep 15, 2021
@songjx010
Copy link
Author

songjx010 commented Sep 26, 2021

Another question: are the test cases correctly executed for PRs?
test2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant