Skip to content

Conversation

@clockfly
Copy link
Contributor

@clockfly clockfly commented May 3, 2016

What changes were proposed in this pull request?

This PR support new SQL syntax CREATE TEMPORARY VIEW.
Like:

CREATE TEMPORARY VIEW viewName AS SELECT * from xx
CREATE OR REPLACE TEMPORARY VIEW viewName AS SELECT * from xx
CREATE TEMPORARY VIEW viewName (c1 COMMENT 'blabla', c2 COMMENT 'blabla') AS SELECT * FROM xx

How was this patch tested?

Unit tests.

@clockfly
Copy link
Contributor Author

clockfly commented May 3, 2016

work in progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed now since we no longer support non native view. If there still a config option for native view, can you remove that as well?

@rxin
Copy link
Contributor

rxin commented May 3, 2016

@clockfly for work in progress pr, put WIP in the title.

…fier AS query

This PR support new SQL syntax CREATE TEMPORARY VIEW.

Unit tests.

Author: Sean Zhong <seanzhong@apache.org>
@clockfly clockfly changed the title [SPARK-6339][SQL] Supports create CREATE TEMPORARY VIEW tableIdentifier AS query [SPARK-6339][SQL][WIP] Supports create CREATE TEMPORARY VIEW tableIdentifier AS query May 3, 2016
// Temporary view names should NOT contain database prefix like "database.table"
if (isTemporary && tableDesc.identifier.database.isDefined) {
val database = tableDesc.identifier.database.get
throw new AnalysisException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this semantic rule come from? Hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be consistent with DataSet API. When registering a temp table, it will remove the database prefix, here is the code:

sessionState.sqlParser.parseTableIdentifier(tableName).table,

@liancheng
Copy link
Contributor

test this please

@liancheng
Copy link
Contributor

add to whitelist

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57693 has finished for PR 12872 at commit 8af667e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

liancheng commented May 4, 2016

One thing that we've decided offline is that we should deprecate Dataset.registerTempTable() by Dataset.createTempView() in a follow-up PR. It's becoming confusing that we call the same thing "table" in Dataset API but "view" in SQL DDL.

I think the essential difference between a view and a table is that a view is basically a lineage, while a table is always materialized to disk. For example, data of temporary tables in Hive are written to scratch folder of the current session.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Indentation is off

@clockfly clockfly changed the title [SPARK-6339][SQL][WIP] Supports create CREATE TEMPORARY VIEW tableIdentifier AS query [SPARK-6339][SQL] Supports create CREATE TEMPORARY VIEW tableIdentifier AS query May 4, 2016
@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57707 has finished for PR 12872 at commit a497cf3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57710 has finished for PR 12872 at commit 1e20bb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

checkAnswer(sql("select count(*) FROM jtv2"), Row(2))

// Checks temporary views
sql("CREATE TEMPORARY VIEW temp_jtv1 AS SELECT * FROM jt WHERE id > 3").collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For DDL commands like this one, you don't need to call .collect(). Unlike SELECT queries, commands are executed eagerly.

sql("DROP VIEW testView")

val df = (1 until 10).map(i => i -> i).toDF("i", "j")
df.write.format("json").saveAsTable("jt2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as above, it's a waste to write a new persistent table here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorted persistent table is used by the checking logic below.

checkAnswer(sql("SELECT * FROM testView ORDER BY i"), (1 to 9).map(i => Row(i, i)))

@liancheng
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57756 has finished for PR 12872 at commit 85d121c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Would like to ask @yhuai to have a look at this.

@clockfly clockfly changed the title [SPARK-6339][SQL] Supports create CREATE TEMPORARY VIEW tableIdentifier AS query [SPARK-6339][SQL] Supports CREATE TEMPORARY VIEW tableIdentifier AS query May 4, 2016
if (tableDesc.schema.isEmpty) {
analyzedPlan
} else {
val projectList = analyzedPlan.output.zip(tableDesc.schema).map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you want to check if analyzedPlan.output and tableDesc.schema have the same number of columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already check the lenght

@yhuai
Copy link
Contributor

yhuai commented May 5, 2016

LGTM

@asfgit asfgit closed this in 8fb1463 May 5, 2016
@yhuai
Copy link
Contributor

yhuai commented May 5, 2016

Merged to master and branch 2.0.

asfgit pushed a commit that referenced this pull request May 5, 2016
…uery

## What changes were proposed in this pull request?

This PR support new SQL syntax CREATE TEMPORARY VIEW.
Like:
```
CREATE TEMPORARY VIEW viewName AS SELECT * from xx
CREATE OR REPLACE TEMPORARY VIEW viewName AS SELECT * from xx
CREATE TEMPORARY VIEW viewName (c1 COMMENT 'blabla', c2 COMMENT 'blabla') AS SELECT * FROM xx
```

## How was this patch tested?

Unit tests.

Author: Sean Zhong <clockfly@gmail.com>

Closes #12872 from clockfly/spark-6399.

(cherry picked from commit 8fb1463)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@gatorsmile
Copy link
Member

gatorsmile commented Aug 7, 2016

Currently, the existing DDL behaviors for views when users do not specify the database name are described below:

  • CREATE OR REPLACE TEMPORARY VIEW view_name creates/alters the TEMPORARY view.
  • CREATE OR REPLACE VIEW view_name creates/alters the PERSISTENT view.
  • DROP VIEW view_name OR SELECT... FROM view_name is always first applied to a TEMPORARY view, if existing. If the temporary view does not exist, we try to drop/fetch the PERSISTENT view, if existing.
  • ALTER VIEW view_name is only applicable to the PERSISTENT view, even if the temporary view with the same name exists.

@clockfly @rxin @yhuai @liancheng @hvanhovell @cloud-fan We are using different name resolution rules for different DDL statements, should we make them consistent?

@yhuai
Copy link
Contributor

yhuai commented Aug 7, 2016

Can you be more specific on the inconsistency? Seems ALTER VIEW view_name is the only inconsistent command?

@gatorsmile
Copy link
Member

gatorsmile commented Aug 7, 2016

Yeah, ALTER VIEW should be consistent with DROP VIEW, if we use the same naming rule. Should I submit a PR for it?

Another potential issue to users is the behaviors of CRAETE VIEW and CREATE TEMPORARY VIEW when users do not specify the database names. See the following example:

sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE ID < 3")
sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName") 

When we processing the second statement, we simply add CURRENT_DATABASE to make it a fully qualified view name. However, if users do not specify the fully qualified name in the subsequent SELECT/DROP, the persistent view is shadowed by the temporary view with the same name. The returned results might be a surprise to the Spark users, because they might not realize there exist such a temporary view in the existing session.

Of course, the existing behavior is right, but I think the better way is to force users to specify the database name when creating a persistent view if there exists a temporary view with the same name. That means, we can issue an error message here in this specific case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants