-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14124] [SQL] [FOLLOWUP] Implement Database-related DDL Commands #12081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| def getDefaultDBPath(db: String): String = { | ||
| /** Get the path for creating a non-default database. */ | ||
| def createDatabasePath(db: String, path: Option[String]): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this doesn't seem to belong to the session catalog. It's only used in 1 place so I would just move it to CreateDatabase itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try it, but it involves a lot of code changes.
The major issue is the type mismatch between the path in case class CreateDatabase and the locationUri in CatalogDatabase. The first one has to be Option[String] since users might not provide location in the create table command. The second one has to be String since we always have the location.
If we change the type of locationUri in CatalogDatabase to Option[String], the related code changes look not clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just call this defaultDatabasePath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do the change.
|
@gatorsmile I took a quick look and I think the logic can be simpler. There are a lot of random checks that don't seem necessary to me. Actually I thought the original goal was to create the dir, but this patch doesn't seem to do that. I'm not sure what the non-test changes here achieve. |
|
@andrewor14 You know, this PR has many changes since its initial version. Originally, the focus of this PR is to create a correct warehouse directory. Later, a few related PRs were merged. Thus, I removed the contents. : ) Yeah, sure. I can clean it and try to create the dir. |
| val fs = new Path(path).getFileSystem(hiveContext.sessionState.newHadoopConf()) | ||
| withTable(tabName) { | ||
| assert(tmpDir.listFiles.isEmpty) | ||
| sql(s"CREATE DATABASE $dbName Location '$tmpDir'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we create a database with a given location, then we drop this database, will we still keep the dir like external table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory will be dropped by the Hive metastore API. The test case in this PR verified this behavior.
https://github.com/gatorsmile/spark/blob/mkdir/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L394
|
@andrewor14 I also think about creating directory in implementation of create database. Metastore APIs created the directory for us no matter whether the provided location/directory exists or not. The following test cases verified it. Should we still create the database directory? If yes, could we do it in a separate PR? This PR has a long change history. It has been rebased 8 times. |
|
Test build #57717 has finished for PR 12081 at commit
|
|
Test build #57722 has finished for PR 12081 at commit
|
|
: ) This one needs to be rebased again. 9 times |
…wNewNewNewNew # Conflicts: # sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
|
Test build #57772 has finished for PR 12081 at commit
|
| databaseName, | ||
| comment.getOrElse(""), | ||
| path.getOrElse(catalog.getDefaultDBPath(databaseName)), | ||
| catalog.createDatabasePath(databaseName, path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at it again. Seems we do not really need to pass in path, right? Also, how about we keep the name getDefulatDBPath because createDatabasePath implied file system operations (sorry....). But, it will be good to add comment to this method to explain its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, we have defaultTablePath in SessionCatalog. Let's use defaultDatabasePath as the better. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, actually how is new Path(p).toString different from p? Looks like that is the only functional difference here? If there's no particular reason why we need to do that then let's just revert all these changes here and only commit the test changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it sounds like this issue has been fixed by the other PRs. Will revert it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, getDefaultDBPath is just used for the following use case:
Get the path for creating a non-default database when database location is not provided by users.
|
@gatorsmile Thank you for updating this. Can you address my comments? Then, let's get it in! |
|
Test build #57830 has finished for PR 12081 at commit
|
|
Test build #57831 has finished for PR 12081 at commit
|
|
Alright, this is mainly just test changes now. LGTM merging into master 2.0. |
#### What changes were proposed in this pull request? First, a few test cases failed in mac OS X because the property value of `java.io.tmpdir` does not include a trailing slash on some platform. Hive always removes the last trailing slash. For example, what I got in the web: ``` Win NT --> C:\TEMP\ Win XP --> C:\TEMP Solaris --> /var/tmp/ Linux --> /var/tmp ``` Second, a couple of test cases are added to verify if the commands work properly. #### How was this patch tested? Added a test case for it and correct the previous test cases. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #12081 from gatorsmile/mkdir. (cherry picked from commit 8cba57a) Signed-off-by: Andrew Or <andrew@databricks.com>
What changes were proposed in this pull request?
First, a few test cases failed in mac OS X because the property value of
java.io.tmpdirdoes not include a trailing slash on some platform. Hive always removes the last trailing slash. For example, what I got in the web:Second, a couple of test cases are added to verify if the commands work properly.
How was this patch tested?
Added a test case for it and correct the previous test cases.