Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Mar 31, 2016

What changes were proposed in this pull request?

First, a few test cases failed in mac OS X because the property value of java.io.tmpdir does not include a trailing slash on some platform. Hive always removes the last trailing slash. For example, what I got in the web:

Win NT  --> C:\TEMP\
Win XP  --> C:\TEMP
Solaris --> /var/tmp/
Linux   --> /var/tmp

Second, a couple of test cases are added to verify if the commands work properly.

How was this patch tested?

Added a test case for it and correct the previous test cases.

gatorsmile and others added 30 commits November 13, 2015 14:50

def getDefaultDBPath(db: String): String = {
/** Get the path for creating a non-default database. */
def createDatabasePath(db: String, path: Option[String]): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this doesn't seem to belong to the session catalog. It's only used in 1 place so I would just move it to CreateDatabase itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try it, but it involves a lot of code changes.

The major issue is the type mismatch between the path in case class CreateDatabase and the locationUri in CatalogDatabase. The first one has to be Option[String] since users might not provide location in the create table command. The second one has to be String since we always have the location.

https://github.com/gatorsmile/spark/blob/mkdir/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L63

If we change the type of locationUri in CatalogDatabase to Option[String], the related code changes look not clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just call this defaultDatabasePath

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do the change.

@andrewor14
Copy link
Contributor

@gatorsmile I took a quick look and I think the logic can be simpler. There are a lot of random checks that don't seem necessary to me. Actually I thought the original goal was to create the dir, but this patch doesn't seem to do that. I'm not sure what the non-test changes here achieve.

@gatorsmile
Copy link
Member Author

@andrewor14 You know, this PR has many changes since its initial version. Originally, the focus of this PR is to create a correct warehouse directory. Later, a few related PRs were merged. Thus, I removed the contents. : )

Yeah, sure. I can clean it and try to create the dir.

val fs = new Path(path).getFileSystem(hiveContext.sessionState.newHadoopConf())
withTable(tabName) {
assert(tmpDir.listFiles.isEmpty)
sql(s"CREATE DATABASE $dbName Location '$tmpDir'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we create a database with a given location, then we drop this database, will we still keep the dir like external table?

Copy link
Member Author

@gatorsmile gatorsmile May 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory will be dropped by the Hive metastore API. The test case in this PR verified this behavior.
https://github.com/gatorsmile/spark/blob/mkdir/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L394

@gatorsmile
Copy link
Member Author

@andrewor14 I also think about creating directory in implementation of create database.

Metastore APIs created the directory for us no matter whether the provided location/directory exists or not. The following test cases verified it.
https://github.com/gatorsmile/spark/blob/mkdir/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L401-L411

Should we still create the database directory?

If yes, could we do it in a separate PR? This PR has a long change history. It has been rebased 8 times.
It is hard for the future readers to follow what we did in this PR.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57717 has finished for PR 12081 at commit 59a9805.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57722 has finished for PR 12081 at commit b885f7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

: ) This one needs to be rebased again. 9 times

gatorsmile added 2 commits May 4, 2016 10:20
…wNewNewNewNew

# Conflicts:
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57772 has finished for PR 12081 at commit 838eaeb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

databaseName,
comment.getOrElse(""),
path.getOrElse(catalog.getDefaultDBPath(databaseName)),
catalog.createDatabasePath(databaseName, path),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at it again. Seems we do not really need to pass in path, right? Also, how about we keep the name getDefulatDBPath because createDatabasePath implied file system operations (sorry....). But, it will be good to add comment to this method to explain its purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we have defaultTablePath in SessionCatalog. Let's use defaultDatabasePath as the better. What do you think?

Copy link
Contributor

@andrewor14 andrewor14 May 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, actually how is new Path(p).toString different from p? Looks like that is the only functional difference here? If there's no particular reason why we need to do that then let's just revert all these changes here and only commit the test changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it sounds like this issue has been fixed by the other PRs. Will revert it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, getDefaultDBPath is just used for the following use case:

Get the path for creating a non-default database when database location is not provided by users.

@yhuai
Copy link
Contributor

yhuai commented May 4, 2016

@gatorsmile Thank you for updating this. Can you address my comments? Then, let's get it in!

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57830 has finished for PR 12081 at commit 55cf518.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57831 has finished for PR 12081 at commit b190b86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Alright, this is mainly just test changes now. LGTM merging into master 2.0.

asfgit pushed a commit that referenced this pull request May 5, 2016
#### What changes were proposed in this pull request?

First, a few test cases failed in mac OS X  because the property value of `java.io.tmpdir` does not include a trailing slash on some platform. Hive always removes the last trailing slash. For example, what I got in the web:
```
Win NT  --> C:\TEMP\
Win XP  --> C:\TEMP
Solaris --> /var/tmp/
Linux   --> /var/tmp
```
Second, a couple of test cases are added to verify if the commands work properly.

#### How was this patch tested?
Added a test case for it and correct the previous test cases.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #12081 from gatorsmile/mkdir.

(cherry picked from commit 8cba57a)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 8cba57a May 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants