[SPARK-17167] [SQL] Issue Exceptions when Analyze Table on In-Memory Cataloged Tables #14729

gatorsmile · 2016-08-20T07:01:45Z

What changes were proposed in this pull request?

Currently, Analyze Table is only used for Hive-serde tables. We should issue exceptions in all the other cases. When the tables are data source tables, we issued an exception. However, when tables are In-Memory Cataloged tables, we do not issue any exception.

This PR is to issue an exception when the tables are in-memory cataloged. For example,

CREATE TABLE tbl(a INT, b INT) USING parquet

tbl is a SimpleCatalogRelation when the hive support is not enabled.

How was this patch tested?

Added two test cases. One of them is just to improve the test coverage when the analyzed table is data source tables.

gatorsmile · 2016-08-20T07:02:44Z

cc @hvanhovell @cloud-fan

SparkQA · 2016-08-20T08:30:58Z

Test build #64134 has finished for PR 14729 at commit bb3fd8f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-08-20T14:08:51Z

retest this please

SparkQA · 2016-08-20T16:15:34Z

Test build #64143 has finished for PR 14729 at commit bb3fd8f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-08-21T17:54:02Z

@gatorsmile shouldn't we do this the other way around? And enable Analyze Table for any table? That is the only way CBO can be used anywhere.

Temporary tables might be a bit more difficult, but I feel like we should support them at some point.

gatorsmile · 2016-08-22T00:32:42Z

@hvanhovell In the current master branch, if we want to support In-Memory cataloged tables, we need to support data source tables. You know, SimpleCatalogRelation is converted to LogicalRelation. We are unable to read/write a SimpleCatalogRelation if it is not a data source table.

If we plan to support ANALYZE TABLE on data source tables in Spark 2.1, this PR is just a bug fix of Spark 2.0. Should we fix this issue in Spark 2.0.1?

viirya · 2016-08-22T07:08:21Z

@hvanhovell Not related to this PR. But I would like to ask that do we need to support temporary tables in ANALYZE TABLE? Because temporary tables (and views) are actually represented by logical plans, LogicalPlan already has the mechanism to calculate statistics (of course just size in bytes now).

hvanhovell · 2016-08-22T09:14:50Z

@gatorsmile yeah, we should fix this issue for 2.0.1.

@viirya we do not need to support all kinds of temporary tables. However, you are allowed to create a temporary read only table (confusingly named a temporary view), which connects to some source using the data sources API. I want to make sure we support this case.

viirya · 2016-08-22T12:58:47Z

@hvanhovell as I know, a temporary table will be resolved as arbitrary logical plan, instead of LeafNode that the statistics of a query plan is based on. I think it will cause problem, isn't it?

hvanhovell · 2016-08-22T15:38:03Z

@viirya Yeah, a normal temporary table would be resolved as a LogicalPlan. Analyze Table does not give us any benefit there.

However, you are also allowed to do this:

CREATE TEMPORARY VIEW tmp1
USING parquet
OPTIONS(path 'some/location')

For these I would like to be able to collect statistics.

gatorsmile · 2016-08-22T21:32:56Z

@hvanhovell Will submit a PR for Spark 2.0 tomorrow. Thanks!

hvanhovell · 2016-08-22T21:34:30Z

Thanks!

gatorsmile · 2016-08-24T05:07:50Z

The PR #14781 is opened. This one will be closed. Thanks!

gatorsmile added 4 commits August 19, 2016 23:36

fix.

1b03d32

add one more test case

a348565

fix style

a8dd663

improved the test cases

bb3fd8f

gatorsmile mentioned this pull request Aug 20, 2016

[SPARK-17072] [SQL] support table-level statistics generation and storing into/loading from metastore #14712

Closed

gatorsmile closed this Aug 24, 2016

viirya mentioned this pull request Sep 6, 2016

[SPARK-17206][SQL] Support ANALYZE TABLE on analyzable temporary view #14780

Closed

[SPARK-17167] [SQL] Issue Exceptions when Analyze Table on In-Memory Cataloged Tables #14729

[SPARK-17167] [SQL] Issue Exceptions when Analyze Table on In-Memory Cataloged Tables #14729

Uh oh!

Conversation

gatorsmile commented Aug 20, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Aug 20, 2016

Uh oh!

SparkQA commented Aug 20, 2016

Uh oh!

gatorsmile commented Aug 20, 2016

Uh oh!

SparkQA commented Aug 20, 2016

Uh oh!

hvanhovell commented Aug 21, 2016

Uh oh!

gatorsmile commented Aug 22, 2016

Uh oh!

viirya commented Aug 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvanhovell commented Aug 22, 2016

Uh oh!

viirya commented Aug 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvanhovell commented Aug 22, 2016

Uh oh!

gatorsmile commented Aug 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvanhovell commented Aug 22, 2016

Uh oh!

gatorsmile commented Aug 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viirya commented Aug 22, 2016 •

edited

Loading

viirya commented Aug 22, 2016 •

edited

Loading

gatorsmile commented Aug 22, 2016 •

edited

Loading