[SPARK-46752][SQL][TESTS] Use default ORC compression in data source benchmarks #44777

dongjoon-hyun · 2024-01-18T02:38:26Z

What changes were proposed in this pull request?

This PR aims to use the default ORC compression in data source benchmarks.

Why are the changes needed?

Apache ORC 2.0 and Apache Spark 4.0 will use ZStandard as the default ORC compression codec.

OrcReadBenchmark was switched to use ZStandard for comparision.

[SPARK-46737][SQL][TESTS] Use the default ORC compression in OrcReadBenchmark #44761

And, this PR aims to change the remaining three data source benchmarks.

$ git grep OrcCompressionCodec | grep Benchmark
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BuiltInDataSourceWriteBenchmark.scala:import org.apache.spark.sql.execution.datasources.orc.OrcCompressionCodec
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BuiltInDataSourceWriteBenchmark.scala:      OrcCompressionCodec.SNAPPY.lowerCaseName())
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala:import org.apache.spark.sql.execution.datasources.orc.OrcCompressionCodec
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala:      OrcCompressionCodec.SNAPPY.lowerCaseName()).orc(dir)
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala:import org.apache.spark.sql.execution.datasources.orc.OrcCompressionCodec
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala:      .setIfMissing("orc.compression", OrcCompressionCodec.SNAPPY.lowerCaseName())

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

…benchmarks

dongjoon-hyun

Could you review this benchmark update PR, @yaooqinn ?

When we upgrade to Apache ORC 2.0, we will update once more these files.

dongjoon-hyun · 2024-01-18T02:52:31Z

Oh, I missed your approval, @HyukjinKwon ! Thank you!
Merged to master.

yaooqinn · 2024-01-18T03:01:34Z

Late +1

dongjoon-hyun · 2024-01-18T04:12:27Z

Thank you, @yaooqinn .

…benchmarks This PR aims to use the default ORC compression in data source benchmarks. Apache ORC 2.0 and Apache Spark 4.0 will use ZStandard as the default ORC compression codec. - apache/orc#1733 - apache#44654 `OrcReadBenchmark` was switched to use ZStandard for comparision. - apache#44761 And, this PR aims to change the remaining three data source benchmarks. ``` $ git grep OrcCompressionCodec | grep Benchmark sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BuiltInDataSourceWriteBenchmark.scala:import org.apache.spark.sql.execution.datasources.orc.OrcCompressionCodec sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BuiltInDataSourceWriteBenchmark.scala: OrcCompressionCodec.SNAPPY.lowerCaseName()) sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala:import org.apache.spark.sql.execution.datasources.orc.OrcCompressionCodec sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala: OrcCompressionCodec.SNAPPY.lowerCaseName()).orc(dir) sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala:import org.apache.spark.sql.execution.datasources.orc.OrcCompressionCodec sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala: .setIfMissing("orc.compression", OrcCompressionCodec.SNAPPY.lowerCaseName()) ``` No. Manual review. No. Closes apache#44777 from dongjoon-hyun/SPARK-46752. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

[SPARK-46752][SQL][TESTS] Use default ORC compression in data source …

42e1a3f

…benchmarks

github-actions bot added the SQL label Jan 18, 2024

HyukjinKwon approved these changes Jan 18, 2024

View reviewed changes

dongjoon-hyun commented Jan 18, 2024

View reviewed changes

dongjoon-hyun closed this in 55bc4cb Jan 18, 2024

dongjoon-hyun deleted the SPARK-46752 branch January 18, 2024 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46752][SQL][TESTS] Use default ORC compression in data source benchmarks #44777

[SPARK-46752][SQL][TESTS] Use default ORC compression in data source benchmarks #44777

Uh oh!

dongjoon-hyun commented Jan 18, 2024 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Jan 18, 2024

Uh oh!

yaooqinn commented Jan 18, 2024

Uh oh!

dongjoon-hyun commented Jan 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-46752][SQL][TESTS] Use default ORC compression in data source benchmarks #44777

[SPARK-46752][SQL][TESTS] Use default ORC compression in data source benchmarks #44777

Uh oh!

Conversation

dongjoon-hyun commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 18, 2024

Uh oh!

yaooqinn commented Jan 18, 2024

Uh oh!

dongjoon-hyun commented Jan 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongjoon-hyun commented Jan 18, 2024 •

edited

Loading