Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DROP SCHEMA statements #6096

Merged
merged 1 commit into from
Apr 30, 2023
Merged

Support DROP SCHEMA statements #6096

merged 1 commit into from
Apr 30, 2023

Conversation

jaylmiller
Copy link
Contributor

@jaylmiller jaylmiller commented Apr 22, 2023

Which issue does this PR close?

Closes #6027

Rationale for this change

Being able to drop schemas is generally just a good feature to have: exists in most databases.

What changes are included in this PR?

  • Add optional method deregister_schema to CatalogProvider trait (and an impl for it in MemoryCatalogProvider)
  • Add a new LogicalPlan node DropCatalogSchema
  • Modify parser to be able to parse LogicalPlan::DropCatalogSchema
  • Implement dropping schema in the datafusion context

Are these changes tested?

Unit tests and sql integration tests

Are there any user-facing changes?

this is a user-facing feature but everything is backwards compatible.

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner labels Apr 22, 2023
@jaylmiller jaylmiller changed the title Add schema dropping Support DROP SCHEMA statements Apr 22, 2023
@jaylmiller jaylmiller marked this pull request as ready for review April 22, 2023 19:35
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 22, 2023
(false, None, false) => Err(DataFusionError::Execution(format!(
"Schema '{name}' doesn't exist."
))),
// no schema found but dont return error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible scenario?

Copy link
Contributor Author

@jaylmiller jaylmiller Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario would occur for a statement like DROP SCHEMA does_not_exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is like DROP SCHEMA foo IF EXISTS

This PR I think is consistent with what postgres does in this case

postgres=# drop schema if exists foo;
NOTICE:  schema "foo" does not exist, skipping
DROP SCHEMA

);
ctx.sql("DROP SCHEMA test_schema CASCADE").await?;

assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jaylmiller and @viirya and @comphead for the reviews. I left some structure comments that I think are worth considering but I don't think they are required to merge this PR as:

  1. This PR is well tested
  2. Some of the comments are to clean up the code structure rather than anything fundamental

@@ -450,3 +450,39 @@ mod tests {
);
}
}

#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat strange / non standard to have code below a mod test -- maybe it would be cleaner in its own module (e.g. datafusion/common/src/schema_reference.rs) perhaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that sounds good to me

///
/// By default returns a "Not Implemented" error
fn deregister_schema(&self, name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {
// use variables to avoid unused variable warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative pattern is to name the variable with a prefix. Like_

    fn deregister_schema(&self, _name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {

(false, None, false) => Err(DataFusionError::Execution(format!(
"Schema '{name}' doesn't exist."
))),
// no schema found but dont return error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is like DROP SCHEMA foo IF EXISTS

This PR I think is consistent with what postgres does in this case

postgres=# drop schema if exists foo;
NOTICE:  schema "foo" does not exist, skipping
DROP SCHEMA

@@ -375,7 +375,7 @@ SHOW CREATE TABLE test.xyz
----
datafusion test xyz CREATE VIEW test.xyz AS SELECT * FROM abc

statement error DataFusion error: This feature is not implemented: Only `DROP TABLE/VIEW
statement error DataFusion error: Execution error: Cannot drop schema test because other tables depend on it: xyz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// Attempts to find a schema and deregister it. Returns a tuple of the schema and a
/// flag indicating whether dereg was performed (e.g if schema is found but has tables
/// then `cascade` must be set)
fn find_and_deregister_schema<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if different catalogs might want to implement the "is the schema empty" check differently 🤔

If so, it might make sense to put logic for "is the schema empty so can we drop it" in the CatalogProvider

Perhaps if CatalogProvider had a signature like:

    fn deregister_schema(&self, name: &str, cascade: bool) -> Result<Option<Arc<dyn SchemaProvider>>> {

I think this code would get significantly simpler as well as the debug_asserts to reassert what had just been validated would be unnecessary.

That being said, we could also do such a change as a follow on PR

Copy link
Contributor Author

@jaylmiller jaylmiller Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that is a good point. I was trying to keep the signature as similar as possible to the other Catalog's deregister methods. But you are right that it could make sense to do it like this. The code within the context provider is a bit complex at the moment

@@ -2617,4 +2690,96 @@ mod tests {
.unwrap()
}
}

/// helper for the following drop schema tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -73,3 +73,40 @@ async fn create_external_table_with_ddl() -> Result<()> {

Ok(())
}

#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaylmiller
Copy link
Contributor Author

@alamb Thanks for the thorough review! I will modify this PR accordingly in the next day or so.

@alamb
Copy link
Contributor

alamb commented Apr 27, 2023

FYI I think #6144 will likely also cause conflicts with this PR

@jaylmiller
Copy link
Contributor Author

FYI I think #6144 will likely also cause conflicts with this PR

@alamb No worries! I've actually been following along with your work there--thanks for CCing me.

I was planning on updating this PR according to your review in the next few days anyways--so conflicts not a big deal. Thanks for the heads up.

@github-actions github-actions bot removed the optimizer Optimizer rules label Apr 29, 2023
@jaylmiller jaylmiller force-pushed the drop-schema branch 2 times, most recently from fa552de to be19d7b Compare April 29, 2023 14:51
@jaylmiller
Copy link
Contributor Author

@alamb Thanks for the suggestions: managed to clean this PR up ALOT following them.

I think this should be ready to review/merge, but let me know what you think!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really nice -- thank you @jaylmiller

@@ -124,6 +124,26 @@ pub trait CatalogProvider: Sync + Send {
"Registering new schemas is not supported".to_string(),
))
}

/// Removes a schema from this catalog. Implementations of this method should return
/// errors if the schema exists but cannot be dropped. For example, in DataFusion's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ --nice documentation

pub struct DropCatalogSchema {
/// The schema name
pub name: OwnedSchemaReference,
/// If the schema exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If the schema exists
/// If true, do not error if the schema does not exist

@alamb alamb merged commit a015798 into apache:main Apr 30, 2023
@andygrove andygrove added the enhancement New feature or request label May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate enhancement New feature or request logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DROP SCHEMA statements
5 participants