Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .palantir/revapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,14 @@ acceptedBreaks:
justification: "Removing deprecations for 1.2.0"
"1.2.0":
org.apache.iceberg:iceberg-api:
- code: "java.method.addedToInterface"
new: "method boolean org.apache.iceberg.catalog.SessionCatalog::dropNamespace(org.apache.iceberg.catalog.SessionCatalog.SessionContext,\
\ org.apache.iceberg.catalog.Namespace, boolean)"
justification: "extending api to add new method, not removing any public api"
- code: "java.method.addedToInterface"
new: "method boolean org.apache.iceberg.catalog.SupportsNamespaces::dropNamespace(org.apache.iceberg.catalog.Namespace,\
\ boolean) throws org.apache.iceberg.exceptions.NamespaceNotEmptyException"
justification: "extending api to add new method, not removing any public api"
- code: "java.field.constantValueChanged"
old: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
new: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
Expand Down Expand Up @@ -715,7 +723,7 @@ acceptedBreaks:
- code: "java.method.addedToInterface"
new: "method java.lang.String org.apache.iceberg.expressions.Reference<T>::name()"
justification: "All subclasses implement name"
- code: "java.method.addedToInterface"
- code: "java.methodropNamespaced.addedToInterface"
new: "method java.util.List<org.apache.iceberg.StatisticsFile> org.apache.iceberg.Table::statisticsFiles()"
justification: "new API method"
- code: "java.method.removed"
Expand Down
17 changes: 17 additions & 0 deletions api/src/main/java/org/apache/iceberg/catalog/SessionCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,23 @@ default List<Namespace> listNamespaces(SessionContext context) {
*/
boolean dropNamespace(SessionContext context, Namespace namespace);

/**
* Drop a namespace. If the namespace exists and was dropped, this will return true.
*
* @param context session context
* @param namespace a {@link Namespace namespace}
* @param cascade – When true, deletes all objects under the namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a stricter definition IMO. In some instances below here you delete folders, sometimes you delete files, sometimes it deletes entries. We need a strong definition here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big missing thing here is how this works with "purge"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two options, drop_namespace with cascade will:

  1. drop the tables not the data (no purge)
  2. drop the tables and delete the data (purge)

We can standardize this in Iceberg and go with the preferred option, #1 sounds better as #2 can take a long time depending on number of tables and data in them. This behavior is not consistent across all catalogs, for example hive metastore cascade deletes all the data (purge) in tables where as others don't.

should I create a vote thread for this to decide on the preference or can we decide that in this PR? I can then update the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer documented the current purge behavior in the PR description and docs.

Only Hive catalog purges the tables as that's the default behavior of hive when we pass the cascade=true parameter.

I can make it consistent across all catalogs and modify current hive catalog behavior to just drop table without purge similar to other catalogs. Let me know your thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I didn't get back to this but I do believe we probably need a discuss thread, or we can get feedback here but I think we need to hear from @jackye1995 and others who are maintaining other Catalog implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed implementation for other catalogs, will initiate a discussion once api change in this PR is merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glue by default does cascade drop if just call glue.deleteDatabase, but does not drop data in the table. So technically you can support that directly. But I am okay to publish another PR to support that, up to you.

For doing purge or not, I think the behavior of cascade is not clearly specified in Spark. It could also be argued as something that is catalog-specific, just like the behavior of Hive and Glue are different and it will be difficult to satisfy the other behavior on both sides.

This is related to the issue about different behaviors of list namespace that comes up recently. Maybe we should make a page documenting all catalogs and their different behaviors.

+1 to have a devlist discussion thread first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will start a devlist discussion and create a one pager with current state. Thanks @jackye1995

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackye1995 To confirm are you OK with adding this api to Iceberg? Or do you want keep it within engine implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am okay with adding this to Iceberg API, otherwise there is no way to leverage catalog service level integration features. I would imagine people using REST catalog would like to just handle this behind the service.

* @return true if the namespace was dropped, false otherwise.
* @throws NamespaceNotEmptyException If the namespace is not empty
*/
default boolean dropNamespace(SessionContext context, Namespace namespace, boolean cascade) {
if (cascade) {
throw new UnsupportedOperationException(
"dropNamespace with cascade not supported with this catalog");
}
return dropNamespace(context, namespace);
}

/**
* Set a collection of properties on a namespace in the catalog.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,23 @@ default List<Namespace> listNamespaces() {
*/
boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyException;

/**
* Drop a namespace. If the namespace exists and was dropped, this will return true.
*
* @param namespace a namespace. {@link Namespace}
* @param cascade – When true, deletes all objects under the namespace
* @return true if the namespace was dropped, false otherwise.
* @throws NamespaceNotEmptyException If the namespace is not empty
*/
default boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
throw new UnsupportedOperationException(
"dropNamespace with cascade not supported with this catalog");
}
return dropNamespace(namespace);
}

/**
* Set a collection of properties on a namespace in the catalog.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,16 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return BaseSessionCatalog.this.dropNamespace(context, namespace);
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
return BaseSessionCatalog.this.dropNamespace(context, namespace, true);
} else {
return dropNamespace(namespace);
}
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> updates) {
return BaseSessionCatalog.this.updateNamespaceMetadata(
Expand Down
11 changes: 11 additions & 0 deletions core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,17 @@ public boolean dropNamespace(Namespace namespace) {
}
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
// recursively delete all nested namespaces
listNamespaces(namespace).forEach(n -> dropNamespace(n, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other catalogs do not drop nested namespaces, do we need to cover this in other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing and existing tests, not all catalogs support nested namespaces only the ones I covered here do. Example Hive doesn't support nested namespaces.

If I missed any, I can add those as well.

listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties) {
throw new UnsupportedOperationException(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,15 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return namespaces.remove(namespace) != null;
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties)
throws NoSuchNamespaceException {
Expand Down
11 changes: 11 additions & 0 deletions core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,17 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return deletedRows > 0;
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
// recursively delete all nested namespaces
listNamespaces(namespace).forEach(n -> dropNamespace(n, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBC Supports nested namespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so based on

listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties)
throws NoSuchNamespaceException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,14 @@ public static void dropNamespace(SupportsNamespaces catalog, Namespace namespace
}
}

public static void dropNamespace(
SupportsNamespaces catalog, Namespace namespace, boolean cascade) {
boolean dropped = catalog.dropNamespace(namespace, cascade);
if (!dropped) {
throw new NoSuchNamespaceException("Namespace does not exist: %s", namespace);
}
}

public static UpdateNamespacePropertiesResponse updateNamespaceProperties(
SupportsNamespaces catalog, Namespace namespace, UpdateNamespacePropertiesRequest request) {
request.validate();
Expand Down
6 changes: 6 additions & 0 deletions core/src/main/java/org/apache/iceberg/rest/RESTCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,12 @@ public boolean dropNamespace(Namespace ns) throws NamespaceNotEmptyException {
return nsDelegate.dropNamespace(ns);
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
return nsDelegate.dropNamespace(namespace, cascade);
}

@Override
public boolean setProperties(Namespace ns, Map<String, String> props)
throws NoSuchNamespaceException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -444,17 +444,30 @@ public Map<String, String> loadNamespaceMetadata(SessionContext context, Namespa

@Override
public boolean dropNamespace(SessionContext context, Namespace ns) {
return dropNamespaceInternal(context, ns, false);
}

private boolean dropNamespaceInternal(SessionContext context, Namespace ns, boolean cascade) {
checkNamespaceIsValid(ns);

try {
client.delete(
paths.namespace(ns), null, headers(context), ErrorHandlers.namespaceErrorHandler());
paths.namespace(ns),
ImmutableMap.of("cascade", Boolean.toString(cascade)),
null,
headers(context),
ErrorHandlers.namespaceErrorHandler());
return true;
} catch (NoSuchNamespaceException e) {
return false;
}
}

@Override
public boolean dropNamespace(SessionContext context, Namespace namespace, boolean cascade) {
return dropNamespaceInternal(context, namespace, cascade);
}

@Override
public boolean updateNamespaceMetadata(
SessionContext context, Namespace ns, Map<String, String> updates, Set<String> removals) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,53 @@ public void testDropNamespace() throws IOException {
Assert.assertFalse(fs.isDirectory(new Path(metaLocation)));
}

@Test
public void testDropNamespaceCascade() throws IOException {
String warehouseLocation = temp.newFolder().getAbsolutePath();
HadoopCatalog catalog = new HadoopCatalog();
catalog.setConf(new Configuration());
catalog.initialize(
"hadoop", ImmutableMap.of(CatalogProperties.WAREHOUSE_LOCATION, warehouseLocation));
Namespace namespace1 = Namespace.of("db");
Namespace namespace2 = Namespace.of("db", "ns1");

TableIdentifier tbl1 = TableIdentifier.of(namespace1, "tbl1");
TableIdentifier tbl2 = TableIdentifier.of(namespace2, "tbl1");

Lists.newArrayList(tbl1, tbl2)
.forEach(t -> catalog.createTable(t, SCHEMA, PartitionSpec.unpartitioned()));

catalog.dropNamespace(namespace1, true);
String metaLocation = warehouseLocation + "/" + "db";
FileSystem fs = Util.getFs(new Path(metaLocation), catalog.getConf());
Assert.assertFalse(fs.isDirectory(new Path(metaLocation)));
}

@Test
public void testDropNamespaceCascadeFalse() throws IOException {
String warehouseLocation = temp.newFolder().getAbsolutePath();
HadoopCatalog catalog = new HadoopCatalog();
catalog.setConf(new Configuration());
catalog.initialize(
"hadoop", ImmutableMap.of(CatalogProperties.WAREHOUSE_LOCATION, warehouseLocation));
Namespace namespace1 = Namespace.of("db");
Namespace namespace2 = Namespace.of("db", "ns1");

TableIdentifier tbl1 = TableIdentifier.of(namespace1, "tbl1");
TableIdentifier tbl2 = TableIdentifier.of(namespace2, "tbl1");

Lists.newArrayList(tbl1, tbl2)
.forEach(t -> catalog.createTable(t, SCHEMA, PartitionSpec.unpartitioned()));

AssertHelpers.assertThrows(
"Should fail to drop namespace is not empty " + namespace1,
NamespaceNotEmptyException.class,
"Namespace " + namespace1 + " is not empty.",
() -> {
catalog.dropNamespace(Namespace.of("db"), false);
});
}

@Test
public void testVersionHintFileErrorWithFile() throws Exception {
addVersionsToTable(table);
Expand Down
37 changes: 37 additions & 0 deletions core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,43 @@ public void testDropNamespace() {
() -> catalog.dropNamespace(tbl4.namespace()));
}

@Test
public void testDropNamespaceCascade() {
TableIdentifier tbl0 = TableIdentifier.of("db", "ns1", "ns2", "tbl2");
TableIdentifier tbl1 = TableIdentifier.of("db", "ns1", "ns2", "tbl1");
TableIdentifier tbl2 = TableIdentifier.of("db", "ns1", "tbl2");
TableIdentifier tbl3 = TableIdentifier.of("db", "ns3", "tbl4");
TableIdentifier tbl4 = TableIdentifier.of("db", "tbl");

Lists.newArrayList(tbl0, tbl1, tbl2, tbl3, tbl4)
.forEach(t -> catalog.createTable(t, SCHEMA, PartitionSpec.unpartitioned()));

catalog.dropNamespace(tbl4.namespace(), true);
Assert.assertFalse(catalog.namespaceExists(tbl1.namespace()));
}

@Test
public void testDropNamespaceCascadeFalse() {
Assert.assertFalse(
"Should return false if drop does not modify state",
catalog.dropNamespace(Namespace.of("db", "ns1_not_exitss")));

TableIdentifier tbl0 = TableIdentifier.of("db", "ns1", "ns2", "tbl2");
TableIdentifier tbl1 = TableIdentifier.of("db", "ns1", "ns2", "tbl1");
TableIdentifier tbl2 = TableIdentifier.of("db", "ns1", "tbl2");
TableIdentifier tbl3 = TableIdentifier.of("db", "ns3", "tbl4");
TableIdentifier tbl4 = TableIdentifier.of("db", "tbl");

Lists.newArrayList(tbl0, tbl1, tbl2, tbl3, tbl4)
.forEach(t -> catalog.createTable(t, SCHEMA, PartitionSpec.unpartitioned()));

AssertHelpers.assertThrows(
"Should fail to drop namespace has tables",
NamespaceNotEmptyException.class,
"is not empty. 1 tables exist.",
() -> catalog.dropNamespace(tbl4.namespace(), false));
}

@Test
public void testCreateNamespace() {
Namespace testNamespace = Namespace.of("testDb", "ns1", "ns2");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ public <T extends RESTResponse> T handleRequest(

case DROP_NAMESPACE:
if (asNamespaceCatalog != null) {
CatalogHandlers.dropNamespace(asNamespaceCatalog, namespaceFromPathVars(vars));
dropNamespace(vars);
return null;
}
break;
Expand Down Expand Up @@ -363,6 +363,12 @@ public <T extends RESTResponse> T handleRequest(
return null;
}

private void dropNamespace(Map<String, String> vars) {
Namespace namespace = namespaceFromPathVars(vars);
boolean cascade = PropertyUtil.propertyAsBoolean(vars, "cascade", false);
CatalogHandlers.dropNamespace(asNamespaceCatalog, namespace, cascade);
}

public <T extends RESTResponse> T execute(
HTTPMethod method,
String path,
Expand Down
41 changes: 41 additions & 0 deletions docs/spark-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,47 @@ AS SELECT ...
The schema and partition spec will be replaced if changed. To avoid modifying the table's schema and partitioning, use `INSERT OVERWRITE` instead of `REPLACE TABLE`.
The new table properties in the `REPLACE TABLE` command will be merged with any existing table properties. The existing table properties will be updated if changed else they are preserved.

## `DROP NAMESPACE`

### `DROP EMPTY NAMESPACE`

To drop an _empty_ namespace, run:

```sql
DROP database prod.db.sample
```
If the namespace is not empty, this will fail with _NamespaceNotEmptyException_.


### `DROP NON_EMPTY NAMESPACE`

To drop a namespace and all its contents including tables, run:

```sql
DROP TABLE prod.db.sample CASCADE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be table? or database?

```
**WARNING**:
drop table purge behaviour with cascade depends on the type of catalog managing the namespace.
see below mapping of purge behaviour for different catalogs.
- If the database is managed by **HiveCatalog**, this will _purge_ all the tables in the database.

#### `drop namespace table purge behaviour by catalog for CASCADE`

When namespace is dropped with _CASCADE_, all tables are dropped and contents are purged based on the type of
catalog.

| Catalog | Table Purge | Nested Namespaces |
|-----------|--------------|------------------------|
| Hive | - [ x ] | - [ ] |
| Hadoop | - [ ] | - [ x ] |
| JDBC | - [ ] | - [ x ] |
| ECS | - [ ] | - [ x ] |
| Nessie | NotSupported | NotSupported |
| DynamoDb | NotSupported | NotSupported |
| Glue | NotSupported | NotSupported |
| Snowflake | NotSupported | NotSupported |


## `DROP TABLE`

The drop table behavior changed in 0.14.
Expand Down
Loading