-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DROP SCHEMA statements #6096
Conversation
(false, None, false) => Err(DataFusionError::Execution(format!( | ||
"Schema '{name}' doesn't exist." | ||
))), | ||
// no schema found but dont return error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario would occur for a statement like DROP SCHEMA does_not_exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is like DROP SCHEMA foo IF EXISTS
This PR I think is consistent with what postgres does in this case
postgres=# drop schema if exists foo;
NOTICE: schema "foo" does not exist, skipping
DROP SCHEMA
); | ||
ctx.sql("DROP SCHEMA test_schema CASCADE").await?; | ||
|
||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jaylmiller and @viirya and @comphead for the reviews. I left some structure comments that I think are worth considering but I don't think they are required to merge this PR as:
- This PR is well tested
- Some of the comments are to clean up the code structure rather than anything fundamental
@@ -450,3 +450,39 @@ mod tests { | |||
); | |||
} | |||
} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat strange / non standard to have code below a mod test
-- maybe it would be cleaner in its own module (e.g. datafusion/common/src/schema_reference.rs
) perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that sounds good to me
/// | ||
/// By default returns a "Not Implemented" error | ||
fn deregister_schema(&self, name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> { | ||
// use variables to avoid unused variable warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative pattern is to name the variable with a prefix. Like_
fn deregister_schema(&self, _name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {
(false, None, false) => Err(DataFusionError::Execution(format!( | ||
"Schema '{name}' doesn't exist." | ||
))), | ||
// no schema found but dont return error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is like DROP SCHEMA foo IF EXISTS
This PR I think is consistent with what postgres does in this case
postgres=# drop schema if exists foo;
NOTICE: schema "foo" does not exist, skipping
DROP SCHEMA
@@ -375,7 +375,7 @@ SHOW CREATE TABLE test.xyz | |||
---- | |||
datafusion test xyz CREATE VIEW test.xyz AS SELECT * FROM abc | |||
|
|||
statement error DataFusion error: This feature is not implemented: Only `DROP TABLE/VIEW | |||
statement error DataFusion error: Execution error: Cannot drop schema test because other tables depend on it: xyz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/// Attempts to find a schema and deregister it. Returns a tuple of the schema and a | ||
/// flag indicating whether dereg was performed (e.g if schema is found but has tables | ||
/// then `cascade` must be set) | ||
fn find_and_deregister_schema<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if different catalogs might want to implement the "is the schema empty" check differently 🤔
If so, it might make sense to put logic for "is the schema empty so can we drop it" in the CatalogProvider
Perhaps if CatalogProvider
had a signature like:
fn deregister_schema(&self, name: &str, cascade: bool) -> Result<Option<Arc<dyn SchemaProvider>>> {
I think this code would get significantly simpler as well as the debug_assert
s to reassert what had just been validated would be unnecessary.
That being said, we could also do such a change as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that is a good point. I was trying to keep the signature as similar as possible to the other Catalog's deregister
methods. But you are right that it could make sense to do it like this. The code within the context provider is a bit complex at the moment
@@ -2617,4 +2690,96 @@ mod tests { | |||
.unwrap() | |||
} | |||
} | |||
|
|||
/// helper for the following drop schema tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use sqllogic tests https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/ to write this more concisely
For example, perhaps we could extend:
@@ -73,3 +73,40 @@ async fn create_external_table_with_ddl() -> Result<()> { | |||
|
|||
Ok(()) | |||
} | |||
|
|||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend putting these tests in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/ddl.slt rather than a rs test
@alamb Thanks for the thorough review! I will modify this PR accordingly in the next day or so. |
FYI I think #6144 will likely also cause conflicts with this PR |
@alamb No worries! I've actually been following along with your work there--thanks for CCing me. I was planning on updating this PR according to your review in the next few days anyways--so conflicts not a big deal. Thanks for the heads up. |
fa552de
to
be19d7b
Compare
@alamb Thanks for the suggestions: managed to clean this PR up ALOT following them. I think this should be ready to review/merge, but let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks really nice -- thank you @jaylmiller
@@ -124,6 +124,26 @@ pub trait CatalogProvider: Sync + Send { | |||
"Registering new schemas is not supported".to_string(), | |||
)) | |||
} | |||
|
|||
/// Removes a schema from this catalog. Implementations of this method should return | |||
/// errors if the schema exists but cannot be dropped. For example, in DataFusion's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ --nice documentation
pub struct DropCatalogSchema { | ||
/// The schema name | ||
pub name: OwnedSchemaReference, | ||
/// If the schema exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If the schema exists | |
/// If true, do not error if the schema does not exist |
Which issue does this PR close?
Closes #6027
Rationale for this change
Being able to drop schemas is generally just a good feature to have: exists in most databases.
What changes are included in this PR?
deregister_schema
toCatalogProvider
trait (and an impl for it inMemoryCatalogProvider
)DropCatalogSchema
LogicalPlan::DropCatalogSchema
Are these changes tested?
Unit tests and sql integration tests
Are there any user-facing changes?
this is a user-facing feature but everything is backwards compatible.