[SPARK-52969][SQL] Support DSv2 OrcScan Dynamic Partition Pruning#52009
[SPARK-52969][SQL] Support DSv2 OrcScan Dynamic Partition Pruning#52009flaming-archer wants to merge 2 commits intoapache:masterfrom
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for making a contribution, @flaming-archer .
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScan.scala
Outdated
Show resolved
Hide resolved
...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileBasedDataSourceTest.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
In addition to the above comments, I have one question. I guess you did compare this with the corresponding ParquetScan.scala logic. Could you confirm if there is any difference, @flaming-archer ?
In fact, I also rewrote ParquetScan with the same code can work. But I haven't figured out how to change these two classes together yet. Or rewrite a class so that both inherit it. I used to write more Java and less Scala before。。 |
29356cc to
1151c13
Compare
|
@dongjoon-hyun Pls take a look at it again. I have used ./dev/scalafmt format scala files already. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Sorry for being late, @flaming-archer . I was on vacation since 8/15 until this Monday.
While revisiting your contribution, I'd like to recommend you to make a Parquet PR first to get more community attentions and to enlarge the impact of your contributions.
In fact, I also rewrote ParquetScan with the same code can work. But I haven't figured out how to change these two classes together yet. Or rewrite a class so that both inherit it.
As you know, Parquet is the Apache Spark's default file-based source format. Most test coverages are based on Parquet features. So, your contribution will be tested fully if you make a Parquet PR first. After than, we can come back to the other data sources like ORC (this PR).
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Lines 1838 to 1842 in 0c9af99
No problem, I will submit a Parquet version later in the next day or two. |
|
Thank you so much, @flaming-archer . |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Support ORCScan Dynamic Partition Prune to greatly improves performance.
Why are the changes needed?
Improve dsv2 and enhance the query performance of commonly used ORC files
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT.
Was this patch authored or co-authored using generative AI tooling?
NO.