-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16452][SQL] Support basic INFORMATION_SCHEMA #14116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @rxin . |
|
Test build #62023 has finished for PR 14116 at commit
|
|
Wow, there was only one error. I fixed it a few minute ago.
|
|
cc @hvanhovell , too. |
|
Test build #62029 has finished for PR 14116 at commit
|
|
The failure seems to be irrelevant to this PR. Locally, it passes like the following. |
|
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhm the indentation is really weird - i don't think we need to indent each line with one more level ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
|
It doesn't look like we are getting any benefits from column pruning - perhaps we should just do predicate pushdown? The code would be simpler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and setupTable or registerTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used registerTable.
|
This looks pretty good. Can you add more comments explaining what each class/method does, and how the whole thing works? |
|
Sure. I'll update the PR and proceed in this way. |
|
Test build #62030 has finished for PR 14116 at commit
|
|
Test build #62031 has finished for PR 14116 at commit
|
|
Test build #62038 has finished for PR 14116 at commit
|
|
Test build #62040 has finished for PR 14116 at commit
|
|
Test build #62041 has finished for PR 14116 at commit
|
|
Test build #62043 has finished for PR 14116 at commit
|
|
|
|
Test build #62064 has finished for PR 14116 at commit
|
|
Test build #62065 has finished for PR 14116 at commit
|
|
Now, all tests are passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thank you for review! Right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain which cases will enter this processing? Is that possible we could hit backtick-quoted INFORMATION_SCHEMA_DATABASE here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backtick-quoted one will not reach here.
scala> sql("create table `aaa.bbb`(a int)")
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: [aaa.bbb]: is not a valid table name;
val catalog = spark.sessionState.catalog
catalog.setCurrentDatabase(SessionCatalog.INFORMATION_SCHEMA_DATABASE)
sql("CREATE TABLE my_tab (age INT, name STRING)")We can set the current database to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am lost. A database INFORMATION_SCHEMA_DATABASE is created through a SQL command, but the function databaseExists always returns true for the database for INFORMATION_SCHEMA_DATABASE.
|
Finished my first pass. The major concern is handling of |
|
Thank you so much, @gatorsmile . And, sorry for late response. Definitely, I have many things to do. Now, it's my turn. Let's see how much I can handle them. :) |
|
Test build #62721 has finished for PR 14116 at commit
|
|
Test build #62897 has finished for PR 14116 at commit
|
|
Test build #63023 has finished for PR 14116 at commit
|
|
Rebased to the master because |
|
Test build #63351 has finished for PR 14116 at commit
|
|
Test build #63641 has finished for PR 14116 at commit
|
|
Test build #63757 has finished for PR 14116 at commit
|
|
Test build #64041 has finished for PR 14116 at commit
|
|
Test build #64358 has finished for PR 14116 at commit
|
|
Test build #64530 has finished for PR 14116 at commit
|
|
Test build #64577 has finished for PR 14116 at commit
|
|
Test build #64876 has finished for PR 14116 at commit
|
|
Test build #65019 has finished for PR 14116 at commit
|
|
Test build #65141 has finished for PR 14116 at commit
|
|
Test build #65250 has finished for PR 14116 at commit
|
|
Test build #65431 has finished for PR 14116 at commit
|
|
Rebased to resolve conflicts. |
|
Test build #65664 has finished for PR 14116 at commit
|
|
The issue might be tackled later after Catalog Federation. Now, I close this PR since it's too stale. Thank you all for spending time at this PR. |
What changes were proposed in this pull request?
This PR introduces INFORMATION_SCHEMA, a database consisting views which provide information about all of the tables, views, columns in a database.
How was this patch tested?
Pass the Jenkins tests including a new testsuite.