Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 5, 2017

What changes were proposed in this pull request?

Since SPARK-15639, spark.sql.parquet.cacheMetadata and PARQUET_CACHE_METADATA is not used. This PR removes from SQLConf and docs.

How was this patch tested?

Pass the existing Jenkins.

@maropu
Copy link
Member

maropu commented Sep 5, 2017

I roughly checked other options around parquet and I probably found parquetOutputCommitterClass in SQLConf also is not used now? If yes, it seems we have no jira entry for the option?

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81401 has finished for PR 19129 at commit 3b305d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you, @maropu !
spark.sql.parquet.output.committer.class seems to be used ParquetIOSuite.scala.

@maropu
Copy link
Member

maropu commented Sep 5, 2017

oh, yea. I got you. Thanks!

@dongjoon-hyun
Copy link
Member Author

Thank you for your review and approval, @HyukjinKwon !

@gatorsmile
Copy link
Member

gatorsmile commented Sep 5, 2017

Could you check the change history and find when we removed the usage of this SQLConf? It sounds like we did not have a test case coverage for this in the past. We did not realize it when removing the usage. We also need to update the migration notes.

@dongjoon-hyun
Copy link
Member Author

Sure, I'll.

@HyukjinKwon
Copy link
Member

The last code looks removed in 678b96e and this option looks introduced in 9eb74c7.

@dongjoon-hyun
Copy link
Member Author

Wow! Thank you, @HyukjinKwon !

@gatorsmile
Copy link
Member

Please document it in the migration guides. Thanks!

@dongjoon-hyun
Copy link
Member Author

Sure, @gatorsmile .
BTW, I searched more and updated the PR description.

It's SPARK-15639

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 7, 2017

It's marked as 2.0.1 and 2.1.0 with the following commit logs.

branch-2.0$ git log --oneline | grep SPARK-15639
977fbbfcae [SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level for parquet reader
91dffcabde Revert "[SPARK-15639][SQL] Try to push down filter at RowGroups level for parquet reader"
7d6bd11964 [SPARK-15639][SQL] Try to push down filter at RowGroups level for parquet reader

Which section is proper?

  • Upgrading From Spark SQL 1.6 to 2.0
  • Upgrading From Spark SQL 2.0 to 2.1

I think it's Upgrading From Spark SQL 1.6 to 2.0, effectively.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 7, 2017

Or, should I make Upgrading From Spark SQL 2.2 to 2.3?

@gatorsmile
Copy link
Member

SQL 1.6 to 2.0 sounds good to me.

@dongjoon-hyun
Copy link
Member Author

Thank you!

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 7, 2017

The PR title resolved two issues under title [SPARK-15639][SPARK-16321][SQL] Push down filter at RowGroups level for parquet reader I'll add like the following. Is it enough?

 - From Spark 2.0.1, `spark.sql.parquet.cacheMetadata` is no longer used. See
   [SPARK-16321](https://issues.apache.org/jira/browse/SPARK-16321) and
   [SPARK-15639](https://issues.apache.org/jira/browse/SPARK-15639) for details.


- From Spark 2.0.1, `spark.sql.parquet.cacheMetadata` is no longer used. See
[SPARK-16321](https://issues.apache.org/jira/browse/SPARK-16321) and
[SPARK-15639](https://issues.apache.org/jira/browse/SPARK-15639) for details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two jiras are wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13701 is [SPARK-15639][SPARK-16321][SQL] Push down filter at RowGroups level for parquet reader.

It's removed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no caller for initializeLocalJobFunc . Thus, initializeLocalJobFunc is a dead code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, then, it's another transitive search.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like https://issues.apache.org/jira/browse/SPARK-13664 is the one that removes the usage of this conf.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update like this.

 - `spark.sql.parquet.cacheMetadata` is no longer used.
   See [SPARK-13664](https://issues.apache.org/jira/browse/SPARK-13664) for details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm new to spark. I wonder how to disable metadata caching after deleting this conf. I created an external table, and the parquet files in specified location are updated daily, So I want to disable metadata caching rather than executing 'refresh table xxx'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @zzl1787 . This is Apache Spark 2.3. In Apache Spark 2.3, the metadata cache is not controlled by this parameter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Ok, got this, and thank you. Finally I find the parameter to control this.
spark.sql.filesourceTableRelationCacheSize = 0
This will disable the metadata cache.

@SparkQA
Copy link

SparkQA commented Sep 7, 2017

Test build #81523 has finished for PR 19129 at commit 40ed9ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2017

Test build #81525 has finished for PR 19129 at commit 8e3d8fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in e00f1a1 Sep 7, 2017
@dongjoon-hyun
Copy link
Member Author

Thank you for review, @gatorsmile , @HyukjinKwon , @maropu .
In this issue, I've learned how to track the unused stuff correctly. Thank you again.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-13656 branch September 7, 2017 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants