-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28628][SQL] Implement SupportsNamespaces in V2SessionCatalog #25363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28628][SQL] Implement SupportsNamespaces in V2SessionCatalog #25363
Conversation
|
Test build #108670 has finished for PR 25363 at commit
|
|
For reviewers, the R failure is irrelevant to this PR. (I've hit the same failure at my PR before) |
|
Retest this please. |
|
Test build #108692 has finished for PR 25363 at commit
|
|
Retest this please. |
|
Test build #108846 has finished for PR 25363 at commit
|
|
The implementation is solid. Looking forward to the tests |
b4f3a93 to
c436985
Compare
|
@brkyvz, added tests. |
|
Test build #109649 has finished for PR 25363 at commit
|
brkyvz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple extra tests would be great, otherwise LGTM.
| spark.sql("""CREATE DATABASE IF NOT EXISTS ns""") | ||
| spark.sql("""CREATE DATABASE IF NOT EXISTS ns2""") | ||
| val catalog = newCatalog() | ||
| catalog.createNamespace(Array("db"), emptyProps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a TODO to go through the public APIs once these are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| class V2SessionCatalogSuite | ||
| extends SparkFunSuite with SharedSparkSession with BeforeAndAfter { | ||
| import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._ | ||
| class V2SessionCatalogSuite extends SparkFunSuite with SharedSparkSession with BeforeAndAfter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is no longer a Suite. Can we rename to V2SessionCatalogTestUtils or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to V2SessionCatalogBaseSuite because it is the superclass of a suite that provides setup. So more of a suite than just util, but it doesn't contain tests.
| // validate that this catalog's reserved properties are not removed | ||
| changes.foreach { | ||
| case remove: RemoveProperty | ||
| if remove.property == "location" || remove.property == "comment" => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paranoia: does case sensitivity matter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. There is no guarantee of case insensitivity for table properties implementations.
| defaultLocation: Option[URI] = None): CatalogDatabase = { | ||
| CatalogDatabase( | ||
| name = db, | ||
| description = metadata.getOrDefault("comment", ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we make "comment" and "location" static variables and use those static variables through instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
...ore/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalogSuite.scala
Show resolved
Hide resolved
| catalog.dropNamespace(testNs) | ||
| } | ||
|
|
||
| test("alterNamespace: basic behavior") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also check failing to change database location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or not sure if that's supported or not. I think not. But you should be able to update the comment (description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests that fail to remove reserved properties, location and comment, because the catalog requires values. I also added tests that validate they can be updated by altering the namespace to set those properties.
This isn't actually exposed yet because there is no SQL or public API for altering a namespace. But this is how table properties work, so I think it makes sense to match the behavior.
c436985 to
9659f52
Compare
|
@brkyvz, I updated this to address your comments. When you have a chance, please take another look. Thanks! |
|
Test build #109980 has finished for PR 25363 at commit
|
|
|
||
| assert(initialPath === spark.catalog.getDatabase(testNs(0)).locationUri.toString) | ||
|
|
||
| catalog.alterNamespace(testNs, NamespaceChange.setProperty("location", newPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cloud-fan is this hive catalog behavior? I thought you can't change the location of a database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive supports it: https://issues.apache.org/jira/browse/HIVE-8472
However, AFAIK Spark has a problem to support it: #25294
@rdblue can we check if the newly created tables under this namespace can reflect the location change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables are correctly created using the database's location. We have been using this feature for a long time to put all tables for a database in a separate bucket.
| true | ||
|
|
||
| case Array(_) => | ||
| // exists returned false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This is the case where the database does not exist. We know that because the above existence check returned false. This comment clarifies the Array case because it appears that an Array of one item always matches. So we need to note the context from the previous case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see
| class V2SessionCatalogSuite | ||
| extends SparkFunSuite with SharedSparkSession with BeforeAndAfter { | ||
| import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._ | ||
| class V2SessionCatalogBaseSuite extends SparkFunSuite with SharedSparkSession with BeforeAndAfter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The session catalog has 2 implementations: in-memory and hive. Shall we follow ExternalCatalogSuite and run the tests in both sql/core and sql/hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. These tests are for the Hive implementation. Applying these to a test v2 session catalog implementation sounds like a different PR to me. Is that what you're suggesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah seems like this can be done in a follow up
|
This LGTM. Merging to master! |
|
Thanks @brkyvz and @cloud-fan for reviewing! |
What changes were proposed in this pull request?
This adds namespace support to V2SessionCatalog.
How was this patch tested?
WIP: will add tests for v2 session catalog namespace methods.