This repository was archived by the owner on Feb 13, 2025. It is now read-only.
forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Add metrics and cost tests for partition pruning effectiveness #5
Merged
mallman
merged 25 commits into
VideoAmp:spark-16980-lazy_partition_fetching
from
ericl:more-testing
Oct 14, 2016
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
c8e3a1e
[SPARK-16980][SQL] Load only catalog table partition metadata required
ac89aef
Add a new catalyst optimizer rule to SQL core for pruning unneeded
f657256
Include the type of file catalog in the FileSourceScanExec metadata
65298f0
TODO: Consider renaming FileCatalog to better differentiate it from
4d257e1
Refactor the FileSourceScanExec.metadata val to make it prettier
d3b9f3c
try out parquet case insensitive fallback
ericl b1847ad
fix and add test for input files
ericl 84f3741
rename test
ericl 026951c
Refactor `TableFileCatalog.listFiles` to call `listDataLeafFiles` once
fb664d6
fix it
ericl 25e880f
more test cases
ericl 869d090
also fix a bug with zero partitions selected
ericl 225d0fe
feature flag
ericl 8aa1ed1
add comments
ericl 5f3061b
extend and fix flakiness in test
ericl bf6f46f
Enhance `ParquetMetastoreSuite` with mixed-case partition columns
d48ff10
Tidy up a little by removing some unused imports, an unused method and
3a072bd
Put partition count in `FileSourceScanExec.metadata` for partitioned
dc9e613
Fix some errors in my revision of `ParquetSourceSuite`
989f3b3
Thu Oct 13 17:18:14 PDT 2016
ericl 49112e6
Merge commit '765f93c' into more-testing
ericl 6a46fea
more generic
ericl 3f192cd
Thu Oct 13 18:09:42 PDT 2016
ericl a7c0d35
Thu Oct 13 18:09:55 PDT 2016
ericl 39513b7
Thu Oct 13 18:22:31 PDT 2016
ericl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -175,7 +175,7 @@ class ParquetMetastoreSuite extends ParquetPartitioningTest { | |
| (1 to 10).map(i => Tuple1(Seq(new Integer(i), null))).toDF("a") | ||
| .createOrReplaceTempView("jt_array") | ||
|
|
||
| setConf(HiveUtils.CONVERT_METASTORE_PARQUET, true) | ||
| assert(spark.sqlContext.getConf(HiveUtils.CONVERT_METASTORE_PARQUET.key) == "true") | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain why you made this change?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should no longer be needed since the flag value is true by default. I changed it to an assert to validate this. This lets us get rid of the setConf(..., false) in the afterAll(), which was causing the conf value to be leaked to other suites. |
||
| } | ||
|
|
||
| override def afterAll(): Unit = { | ||
|
|
@@ -187,7 +187,6 @@ class ParquetMetastoreSuite extends ParquetPartitioningTest { | |
| "jt", | ||
| "jt_array", | ||
| "test_parquet") | ||
| setConf(HiveUtils.CONVERT_METASTORE_PARQUET, false) | ||
| } | ||
|
|
||
| test(s"conversion is working") { | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to be 5 because this table has 5 partitions. Why does the test expect 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first 5 are from resolving the table, and the latter 5 are from ListingFileCatalog. It is possible to optimize this to only have 5, but it didn't seem worth the cost since this is (1) legacy mode and (2) not a regression..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe I can break it up into analysis and execution to make it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easy, so just added a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I think that adding the comment is good enough.