Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

Currently, Analyze Table is only used for Hive-serde tables. We should issue exceptions in all the other cases. When the tables are data source tables, we issued an exception. However, when tables are In-Memory Cataloged tables, we do not issue any exception.

This PR is to issue an exception when the tables are in-memory cataloged. For example,

CREATE TABLE tbl(a INT, b INT) USING parquet

tbl is a SimpleCatalogRelation when the hive support is not enabled.

How was this patch tested?

Added two test cases. One of them is just to improve the test coverage when the analyzed table is data source tables.

@gatorsmile
Copy link
Member Author

cc @hvanhovell @cloud-fan

@SparkQA
Copy link

SparkQA commented Aug 20, 2016

Test build #64134 has finished for PR 14729 at commit bb3fd8f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 20, 2016

Test build #64143 has finished for PR 14729 at commit bb3fd8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

@gatorsmile shouldn't we do this the other way around? And enable Analyze Table for any table? That is the only way CBO can be used anywhere.

Temporary tables might be a bit more difficult, but I feel like we should support them at some point.

@gatorsmile
Copy link
Member Author

@hvanhovell In the current master branch, if we want to support In-Memory cataloged tables, we need to support data source tables. You know, SimpleCatalogRelation is converted to LogicalRelation. We are unable to read/write a SimpleCatalogRelation if it is not a data source table.

If we plan to support ANALYZE TABLE on data source tables in Spark 2.1, this PR is just a bug fix of Spark 2.0. Should we fix this issue in Spark 2.0.1?

@viirya
Copy link
Member

viirya commented Aug 22, 2016

@hvanhovell Not related to this PR. But I would like to ask that do we need to support temporary tables in ANALYZE TABLE? Because temporary tables (and views) are actually represented by logical plans, LogicalPlan already has the mechanism to calculate statistics (of course just size in bytes now).

@hvanhovell
Copy link
Contributor

@gatorsmile yeah, we should fix this issue for 2.0.1.

@viirya we do not need to support all kinds of temporary tables. However, you are allowed to create a temporary read only table (confusingly named a temporary view), which connects to some source using the data sources API. I want to make sure we support this case.

@viirya
Copy link
Member

viirya commented Aug 22, 2016

@hvanhovell as I know, a temporary table will be resolved as arbitrary logical plan, instead of LeafNode that the statistics of a query plan is based on. I think it will cause problem, isn't it?

@hvanhovell
Copy link
Contributor

@viirya Yeah, a normal temporary table would be resolved as a LogicalPlan. Analyze Table does not give us any benefit there.

However, you are also allowed to do this:

CREATE TEMPORARY VIEW tmp1
USING parquet
OPTIONS(path 'some/location')

For these I would like to be able to collect statistics.

@gatorsmile
Copy link
Member Author

gatorsmile commented Aug 22, 2016

@hvanhovell Will submit a PR for Spark 2.0 tomorrow. Thanks!

@hvanhovell
Copy link
Contributor

Thanks!

@gatorsmile
Copy link
Member Author

The PR #14781 is opened. This one will be closed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants