-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35242][SQL] Support changing session catalog's default database #37679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan, @hddong, @dongjoon-hyun could you please review my changes? Thanks! |
|
Can one of the admins verify this patch? |
|
Could you please take a look when you have some time? This PR is a follow-up PR for #32364. Thanks! |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
|
we should also update |
|
Thanks @cloud-fan, I will check these! |
@cloud-fan, did you mean this change? |
Yes |
@cloud-fan Ok, I have fixed this. |
@cloud-fan Good question. The reason that we cannot delete the user specified default database because we have the following if statement in the actual code: and this is the latest state of master: As you can see that I am just using the same logic. If you think that we should only deny the database drop for "default" and allow for the value of spark.sql.catalog.spark_catalog.defaultDatabase, it is ok for me. The change is very simple: and here is the validation: Is this solution acceptable for you? |
|
SGTM! |
|
Thanks @cloud-fan, I have implemented this and all tests passed. As I see we have resolved all of your feedbacks. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogManagerSuite.scala
Outdated
Show resolved
Hide resolved
|
Hi @cloud-fan, All build issues have been fixed and all of your feedbacks have been implemented. Latest state: |
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for 2 minor comments
Thanks @cloud-fan! I have implemented these two comments. |
|
thanks, meriging to master! |
|
Thank you very much for your help! |
What changes were proposed in this pull request?
This PR is a follow-up PR for #32364. It has been closed by github-actions because it hasn't been updated in a while. The previous PR has added a new custom parameter (spark.sql.catalog.$SESSION_CATALOG_NAME.defaultDatabase) to configure the session catalog's default database which is required by some use cases where the user does not have access to the default database.
Therefore I have created a new PR based on this and added these changes in addition:
Why are the changes needed?
If our user does not have any permissions for the Hive default database in Ranger, it will fail with the following error:
The idea is that we introduce a new configuration parameter where we can set a different database name for the default database. Our user has enough permissions for this in Ranger.
For example:
spark-shell --conf spark.sql.catalog.spark_catalog.defaultDatabase=other_dbDoes this PR introduce any user-facing change?
There will be a new configuration parameter as I mentioned above but the default value is "default" as it was previously.
How was this patch tested?
https://github.com/roczei/spark/actions/runs/2934863118
Scenario a) hrt_10 does not have access to the default database in Hive:
This is the expected behavior because it will use the "default" db name.
Scenario b) Use the "other" database where the hrt_10 user has proper permissions