-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15674][SQL] Deprecates "CREATE TEMPORARY TABLE USING...", uses "CREAT TEMPORARY VIEW USING..." instead #13414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…E TEMPORARY VIEW USING..." instead
94d66c2 to
c2a29b3
Compare
|
So I am not sure I understand this one. Why should we deprecate this in favour of creating a view? A Could you elaborate on why we need this? |
|
Test build #59662 has finished for PR 13414 at commit
|
|
Test build #59663 has finished for PR 13414 at commit
|
We still allow temporary views and temporary tables are intermediate layers between user and the actual table: Currently, in our implementation, we don't support temporary table, we only supports temporary view. The difference is that:
|
|
I think the name of the I am pretty sure that no query is executed in this case. It will just scan the data. For example the following REPL code: import java.nio.file.Files
val location = Files.createTempDirectory("data").resolve("src")
spark.range(0, 100000).
select($"id".as("key"), rand().as("value")).
write.parquet(location.toString)
spark.sql(s"create temporary table my_src using parquet options(path '$location')")
spark.table("my_src").explain(true)Yields the following plan: Am I missing something? |
|
I updated the description, please check whether it makes more sense now. |
|
@clockfly the description is getting there. IIUC the problem we are solving is the following:
Using
I do have a couple of issues with this:
What do you think? |
|
@hvanhovell Probably we can talk more face to face next week. |
| identifierCommentList? (COMMENT STRING)? | ||
| (PARTITIONED ON identifierList)? | ||
| (TBLPROPERTIES tablePropertyList)? AS query #createView | ||
| | CREATE (OR REPLACE)? TEMPORARY VIEW tableIdentifier ('(' colTypeList ')')? tableProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Could you break this line up so we keep all #... hooks on the same column...
|
@clockfly this looks pretty good. I have left some (minor) comments. |
e521310 to
127c309
Compare
|
@hvanhovell Thanks for the review. Updated. |
|
LGTM pending Jenkins |
|
Test build #60079 has finished for PR 13414 at commit
|
|
Test build #60081 has finished for PR 13414 at commit
|
|
Thanks! Merging to master/2.0 |
… "CREAT TEMPORARY VIEW USING..." instead ## What changes were proposed in this pull request? The current implementation of "CREATE TEMPORARY TABLE USING datasource..." is NOT creating any intermediate temporary data directory like temporary HDFS folder, instead, it only stores a SQL string in memory. Probably we should use "TEMPORARY VIEW" instead. This PR assumes a temporary table has to link with some temporary intermediate data. It follows the definition of temporary table like this (from [hortonworks doc](https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/temp-tables.html)): > A temporary table is a convenient way for an application to automatically manage intermediate data generated during a complex query **Example**: ``` scala> spark.sql("CREATE temporary view my_tab7 (c1: String, c2: String) USING org.apache.spark.sql.execution.datasources.csv.CSVFileFormat OPTIONS (PATH '/Users/seanzhong/csv/cars.csv')") scala> spark.sql("select c1, c2 from my_tab7").show() +----+-----+ | c1| c2| +----+-----+ |year| make| |2012|Tesla| ... ``` It NOW prints a **deprecation warning** if "CREATE TEMPORARY TABLE USING..." is used. ``` scala> spark.sql("CREATE temporary table my_tab7 (c1: String, c2: String) USING org.apache.spark.sql.execution.datasources.csv.CSVFileFormat OPTIONS (PATH '/Users/seanzhong/csv/cars.csv')") 16/05/31 10:39:27 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE tableName USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead ``` ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #13414 from clockfly/create_temp_view_using. (cherry picked from commit 890baac) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
What changes were proposed in this pull request?
The current implementation of "CREATE TEMPORARY TABLE USING datasource..." is NOT creating any intermediate temporary data directory like temporary HDFS folder, instead, it only stores a SQL string in memory. Probably we should use "TEMPORARY VIEW" instead.
This PR assumes a temporary table has to link with some temporary intermediate data. It follows the definition of temporary table like this (from hortonworks doc):
Example:
It NOW prints a deprecation warning if "CREATE TEMPORARY TABLE USING..." is used.
How was this patch tested?
Unit test.