-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6339][SQL] Supports CREATE TEMPORARY VIEW tableIdentifier AS query #12872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
work in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed now since we no longer support non native view. If there still a config option for native view, can you remove that as well?
|
@clockfly for work in progress pr, put WIP in the title. |
…fier AS query This PR support new SQL syntax CREATE TEMPORARY VIEW. Unit tests. Author: Sean Zhong <seanzhong@apache.org>
| // Temporary view names should NOT contain database prefix like "database.table" | ||
| if (isTemporary && tableDesc.identifier.database.isDefined) { | ||
| val database = tableDesc.identifier.database.get | ||
| throw new AnalysisException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this semantic rule come from? Hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to be consistent with DataSet API. When registering a temp table, it will remove the database prefix, here is the code:
| sessionState.sqlParser.parseTableIdentifier(tableName).table, |
|
test this please |
|
add to whitelist |
|
Test build #57693 has finished for PR 12872 at commit
|
|
One thing that we've decided offline is that we should deprecate I think the essential difference between a view and a table is that a view is basically a lineage, while a table is always materialized to disk. For example, data of temporary tables in Hive are written to scratch folder of the current session. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Indentation is off
…t with "CREATE TEMPORARY TABLE" syntax
|
Test build #57707 has finished for PR 12872 at commit
|
|
Test build #57710 has finished for PR 12872 at commit
|
| checkAnswer(sql("select count(*) FROM jtv2"), Row(2)) | ||
|
|
||
| // Checks temporary views | ||
| sql("CREATE TEMPORARY VIEW temp_jtv1 AS SELECT * FROM jt WHERE id > 3").collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: For DDL commands like this one, you don't need to call .collect(). Unlike SELECT queries, commands are executed eagerly.
| sql("DROP VIEW testView") | ||
|
|
||
| val df = (1 until 10).map(i => i -> i).toDF("i", "j") | ||
| df.write.format("json").saveAsTable("jt2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as above, it's a waste to write a new persistent table here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorted persistent table is used by the checking logic below.
checkAnswer(sql("SELECT * FROM testView ORDER BY i"), (1 to 9).map(i => Row(i, i)))
|
LGTM |
|
Test build #57756 has finished for PR 12872 at commit
|
|
Would like to ask @yhuai to have a look at this. |
| if (tableDesc.schema.isEmpty) { | ||
| analyzedPlan | ||
| } else { | ||
| val projectList = analyzedPlan.output.zip(tableDesc.schema).map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you want to check if analyzedPlan.output and tableDesc.schema have the same number of columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already check the lenght
|
LGTM |
|
Merged to master and branch 2.0. |
…uery ## What changes were proposed in this pull request? This PR support new SQL syntax CREATE TEMPORARY VIEW. Like: ``` CREATE TEMPORARY VIEW viewName AS SELECT * from xx CREATE OR REPLACE TEMPORARY VIEW viewName AS SELECT * from xx CREATE TEMPORARY VIEW viewName (c1 COMMENT 'blabla', c2 COMMENT 'blabla') AS SELECT * FROM xx ``` ## How was this patch tested? Unit tests. Author: Sean Zhong <clockfly@gmail.com> Closes #12872 from clockfly/spark-6399. (cherry picked from commit 8fb1463) Signed-off-by: Yin Huai <yhuai@databricks.com>
|
Currently, the existing DDL behaviors for views when users do not specify the database name are described below:
@clockfly @rxin @yhuai @liancheng @hvanhovell @cloud-fan We are using different name resolution rules for different DDL statements, should we make them consistent? |
|
Can you be more specific on the inconsistency? Seems |
|
Yeah, Another potential issue to users is the behaviors of When we processing the second statement, we simply add Of course, the existing behavior is right, but I think the better way is to force users to specify the database name when creating a persistent view if there exists a temporary view with the same name. That means, we can issue an error message here in this specific case. |
What changes were proposed in this pull request?
This PR support new SQL syntax CREATE TEMPORARY VIEW.
Like:
How was this patch tested?
Unit tests.