Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Aug 5, 2019

What changes were proposed in this pull request?

This adds namespace support to V2SessionCatalog.

How was this patch tested?

WIP: will add tests for v2 session catalog namespace methods.

@SparkQA
Copy link

SparkQA commented Aug 5, 2019

Test build #108670 has finished for PR 25363 at commit b4f3a93.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class V2SessionCatalog(sessionState: SessionState) extends TableCatalog with SupportsNamespaces

@dongjoon-hyun
Copy link
Member

For reviewers, the R failure is irrelevant to this PR. (I've hit the same failure at my PR before)

* checking CRAN incoming feasibility ...Error in readRDS(con) : 
  ReadItem: unknown type 0, perhaps written by later version of R
Execution halted
During startup - Warning message:

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 6, 2019

Test build #108692 has finished for PR 25363 at commit b4f3a93.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class V2SessionCatalog(sessionState: SessionState) extends TableCatalog with SupportsNamespaces

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28628][SQL] Implement SupportsNamespaces in V2SessionCatalog [SPARK-28628][SQL][WIP] Implement SupportsNamespaces in V2SessionCatalog Aug 8, 2019
@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108846 has finished for PR 25363 at commit b4f3a93.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class V2SessionCatalog(sessionState: SessionState) extends TableCatalog with SupportsNamespaces

@brkyvz
Copy link
Contributor

brkyvz commented Aug 22, 2019

The implementation is solid. Looking forward to the tests

@rdblue rdblue force-pushed the SPARK-28628-support-namespaces-in-v2-session-catalog branch from b4f3a93 to c436985 Compare August 23, 2019 17:05
@rdblue rdblue changed the title [SPARK-28628][SQL][WIP] Implement SupportsNamespaces in V2SessionCatalog [SPARK-28628][SQL] Implement SupportsNamespaces in V2SessionCatalog Aug 23, 2019
@rdblue
Copy link
Contributor Author

rdblue commented Aug 23, 2019

@brkyvz, added tests.

@SparkQA
Copy link

SparkQA commented Aug 23, 2019

Test build #109649 has finished for PR 25363 at commit c436985.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class V2SessionCatalog(sessionState: SessionState) extends TableCatalog with SupportsNamespaces

Copy link
Contributor

@brkyvz brkyvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple extra tests would be great, otherwise LGTM.

spark.sql("""CREATE DATABASE IF NOT EXISTS ns""")
spark.sql("""CREATE DATABASE IF NOT EXISTS ns2""")
val catalog = newCatalog()
catalog.createNamespace(Array("db"), emptyProps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a TODO to go through the public APIs once these are implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

class V2SessionCatalogSuite
extends SparkFunSuite with SharedSparkSession with BeforeAndAfter {
import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
class V2SessionCatalogSuite extends SparkFunSuite with SharedSparkSession with BeforeAndAfter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is no longer a Suite. Can we rename to V2SessionCatalogTestUtils or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to V2SessionCatalogBaseSuite because it is the superclass of a suite that provides setup. So more of a suite than just util, but it doesn't contain tests.

// validate that this catalog's reserved properties are not removed
changes.foreach {
case remove: RemoveProperty
if remove.property == "location" || remove.property == "comment" =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paranoia: does case sensitivity matter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. There is no guarantee of case insensitivity for table properties implementations.

defaultLocation: Option[URI] = None): CatalogDatabase = {
CatalogDatabase(
name = db,
description = metadata.getOrDefault("comment", ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we make "comment" and "location" static variables and use those static variables through instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

catalog.dropNamespace(testNs)
}

test("alterNamespace: basic behavior") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check failing to change database location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or not sure if that's supported or not. I think not. But you should be able to update the comment (description)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests that fail to remove reserved properties, location and comment, because the catalog requires values. I also added tests that validate they can be updated by altering the namespace to set those properties.

This isn't actually exposed yet because there is no SQL or public API for altering a namespace. But this is how table properties work, so I think it makes sense to match the behavior.

@rdblue rdblue force-pushed the SPARK-28628-support-namespaces-in-v2-session-catalog branch from c436985 to 9659f52 Compare August 31, 2019 00:20
@rdblue
Copy link
Contributor Author

rdblue commented Aug 31, 2019

@brkyvz, I updated this to address your comments. When you have a chance, please take another look. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 31, 2019

Test build #109980 has finished for PR 25363 at commit 9659f52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


assert(initialPath === spark.catalog.getDatabase(testNs(0)).locationUri.toString)

catalog.alterNamespace(testNs, NamespaceChange.setProperty("location", newPath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @cloud-fan is this hive catalog behavior? I thought you can't change the location of a database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive supports it: https://issues.apache.org/jira/browse/HIVE-8472

However, AFAIK Spark has a problem to support it: #25294

@rdblue can we check if the newly created tables under this namespace can reflect the location change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables are correctly created using the database's location. We have been using this feature for a long time to put all tables for a database in a separate bucket.

true

case Array(_) =>
// exists returned false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exists

Copy link
Contributor Author

@rdblue rdblue Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is the case where the database does not exist. We know that because the above existence check returned false. This comment clarifies the Array case because it appears that an Array of one item always matches. So we need to note the context from the previous case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see

class V2SessionCatalogSuite
extends SparkFunSuite with SharedSparkSession with BeforeAndAfter {
import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
class V2SessionCatalogBaseSuite extends SparkFunSuite with SharedSparkSession with BeforeAndAfter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session catalog has 2 implementations: in-memory and hive. Shall we follow ExternalCatalogSuite and run the tests in both sql/core and sql/hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. These tests are for the Hive implementation. Applying these to a test v2 session catalog implementation sounds like a different PR to me. Is that what you're suggesting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah seems like this can be done in a follow up

@brkyvz
Copy link
Contributor

brkyvz commented Sep 3, 2019

This LGTM. Merging to master!

@asfgit asfgit closed this in 5ea134c Sep 3, 2019
@rdblue
Copy link
Contributor Author

rdblue commented Sep 3, 2019

Thanks @brkyvz and @cloud-fan for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants