You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.
Describe the bug
I run the build and test command for gazelle_plugin 1.2, and got some errors.
code version as below:
arrow-4.0.0-oap-1.2.0-release.zip
gazelle_plugin-1.2.0-release.zip
errors as below:
*** RUN ABORTED ***
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(Lscala/collection/immutable/Map;Lscala/collection/Seq;Lorg/apache/spark/sql/SparkSession;)Lscala/Option;
at org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.inferSchema(ParquetUtils.scala:107)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:170)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:208)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
As you can see, the error is in file /arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala.
And, I found that the mergeSchemasInParallel method has 3 arguments. But, that is 4 arguments in vanilla spark 3.1.
I modified ParquetFileFormat.scala as below, then it can run without this error.
But, I got some other errors as below, Please check whether these errors are normal. I can send you a detailed log later.
Fall back to use row-based operators, error is last(value#22596)() is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.streaming.StateStoreSaveExec))
Fall back to use row-based operators, error is variance(cast(a#51859 as double)) is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.exchange.ShuffleExchangeExec))
20:59:34.377 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:59:34.442 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:57:03.655 ERROR org.apache.spark.sql.execution.streaming.MicroBatchExecution: Query [id = a188d962-aa09-487f-a6e6-1b2beacf0583, runId = 31cd986e-0306-451b-95db-115ca3800057] terminated with error
org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)
To Reproduce
build and run the test for gazelle_plugin 1.2
build cmd: mvn clean package -DskipTests -Dcpp_tests=OFF -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip
cmd: mvn test -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip
Expected behavior
run test successfully, no errors.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
I run the build and test command for gazelle_plugin 1.2, and got some errors.
code version as below:
errors as below:
*** RUN ABORTED ***
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(Lscala/collection/immutable/Map;Lscala/collection/Seq;Lorg/apache/spark/sql/SparkSession;)Lscala/Option;
at org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.inferSchema(ParquetUtils.scala:107)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:170)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:208)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
As you can see, the error is in file /arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala.
And, I found that the mergeSchemasInParallel method has 3 arguments. But, that is 4 arguments in vanilla spark 3.1.
I modified ParquetFileFormat.scala as below, then it can run without this error.
diff ParquetFileFormat.scala ParquetFileFormat.scala.old
439d438
< parameters: Map[String, String],
455c454
< SchemaMergeUtils.mergeSchemasInParallel(sparkSession, parameters, filesToTouch, reader)
But, I got some other errors as below, Please check whether these errors are normal. I can send you a detailed log later.
Fall back to use row-based operators, error is last(value#22596)() is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.streaming.StateStoreSaveExec))
Fall back to use row-based operators, error is variance(cast(a#51859 as double)) is not supported in ColumnarAggregation, original sparkplan is class org.apache.spark.sql.execution.aggregate.HashAggregateExec(List(class org.apache.spark.sql.execution.exchange.ShuffleExchangeExec))
20:59:34.377 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:59:34.442 WARN org.apache.spark.sql.execution.datasources.v2.arrow.SparkMemoryUtils: Detected leaked memory pool, size: 127976...
20:57:03.655 ERROR org.apache.spark.sql.execution.streaming.MicroBatchExecution: Query [id = a188d962-aa09-487f-a6e6-1b2beacf0583, runId = 31cd986e-0306-451b-95db-115ca3800057] terminated with error
org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)
To Reproduce
build and run the test for gazelle_plugin 1.2
build cmd: mvn clean package -DskipTests -Dcpp_tests=OFF -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip
cmd: mvn test -Dbuild_arrow=OFF -Darrow_root=/opt/build/arrow_install -Dcheckstyle.skip
Expected behavior
run test successfully, no errors.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: