Core: Implement Catalogs.createTable and Catalogs.dropTable by pvary · Pull Request #1481 · apache/iceberg

pvary · 2020-09-21T13:59:33Z

If tools want to implement drop and create table operations using the Catalogs they need a common interface to accessing the functionality.

For example these methods will be used by HiveMetaHook to allow Iceberg table creation through Hive SQL commands

The patch includes 2 commits:

HadoopTables did not have the dropTable functionality. Refactored the code out from BaseCatalog to be accessible by HadoopTables, and implemented the method. Added some tests around the new function.
Created the corresponding Catalogs methods and added tests around that.

pvary · 2020-09-21T18:27:16Z

CC: @rdblue, @massdosage - another piece of Hive integration

core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java

core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java

rdblue · 2020-09-24T00:11:09Z

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java

+   * @param location a path URI (e.g. hdfs:///warehouse/my_table)
+   * @return true if the table was dropped
+   */
+  public boolean dropTable(String location) {


I think this wasn't implemented before because it is not part of the Tables API, but now that this is the only implementation, maybe we should consider just deprecating the Tables API and making HadoopTables a stand-alone class.

Maybe this would merit another discussion, and another PR

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

rdblue · 2020-09-24T00:22:15Z

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

+  public static Table createTable(Configuration conf, Properties props) {
+    String schemaString = props.getProperty(InputFormatConfig.TABLE_SCHEMA);
+    Preconditions.checkNotNull(schemaString, "Table schema not set");
+    Schema schema = SchemaParser.fromJson(props.getProperty(InputFormatConfig.TABLE_SCHEMA));


It looks like this is the reason why the examples specify a table property. Can we instead use Hive schema DDL and convert it to Iceberg? Similarly, can we get the identity partition fields that way to create a spec?

I think we should keep the serialized schema for the Catalogs interface. Other systems like Impala, Presto, etc. might want to use it as well.
I would like to tackle the Hive schema DDL in another PR. The data is available in HiveIcebergSerDe.initialize in a somewhat convoluted way. I would like to get it there and convert it to the Iceberg Schema string. From there I would only push the Iceberg related stuff down further.
What do you think?

I think it's fine to do this in a separate PR. I just really don't want to require setting properties with JSON schema or spec representations as the way to use Iceberg. It's okay for a way to customize if there isn't syntax, but normal cases should just use DDL.

rdblue · 2020-09-24T00:24:20Z

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

+      }
+    }
+
+    Optional<Catalog> catalog = loadCatalog(conf);


Somewhat out of scope: We might want to build a Catalog for this logic so that this class can avoid loading and checking the catalog in every method. The catalog would get created with the configuration and handle this delegation internally.

HiveCatalog has a cache, but this might be useful for other Catalogs as well.
This seems like a good idea to pursue, but I do not promise anything here, as I have too much on my plate currently

rdblue · 2020-09-24T00:25:23Z

Thanks, @pvary! Mostly looks good, but I'd like to fix the protected method that was removed. People might rely on it since it was in a release.

…ion. Javadoc is revisited

rdblue · 2020-09-24T16:14:55Z

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java

I should have caught this yesterday, but shouldn't this return false instead of throwing the exception? That's what all the other drop methods do. If the table doesn't exist, it isn't an exceptional case. It just returns false to signal that it nothing needed to be done.

I should have caught this yesterday, but shouldn't this return false instead of throwing the exception? That's what all the other drop methods do. If the table doesn't exist, it isn't an exceptional case. It just returns false to signal that it nothing needed to be done.

Good point! Finding it now is better than finding it after pushing the PR 😄
Fixed.

rdblue · 2020-09-25T19:27:57Z

Thanks for the fix! I merged this.

probot-autolabeler bot added core MR labels Sep 22, 2020

pvary force-pushed the drop branch from 893bfe2 to c584198 Compare September 23, 2020 07:13

pvary pushed a commit to pvary/iceberg that referenced this pull request Sep 23, 2020

Implement dropTable for HadoopTables (Squashed commits of apache#1481)

08f114f

pvary mentioned this pull request Sep 23, 2020

HiveMetaHook implementation to enable CREATE TABLE and DROP TABLE from Hive queries #1495

Merged

rdblue reviewed Sep 24, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Show resolved Hide resolved

rdblue reviewed Sep 24, 2020

View reviewed changes

Peter Vary added 6 commits September 24, 2020 12:07

Implement dropTable for HadoopTables

514575e

Implement Catalogs.createTable, Catalogs.dropTable

a1207f1

Controlling properties should not be pushed to the actual table creat…

e21ad74

…ion. Javadoc is revisited

Fixing comment formatting

3df7efd

Javadoc warning fixed

fc28ebf

Addressed review comments

1960cde

pvary force-pushed the drop branch from c584198 to 1960cde Compare September 24, 2020 11:14

pvary pushed a commit to pvary/iceberg that referenced this pull request Sep 24, 2020

Implement dropTable for HadoopTables (Squashed commits of apache#1481)

8b6a92d

rdblue reviewed Sep 24, 2020

View reviewed changes

Do not throw if the table did not exist

66bb2e8

rdblue merged commit 66a37c2 into apache:master Sep 25, 2020

pvary deleted the drop branch September 26, 2020 06:59

pvary mentioned this pull request Nov 27, 2020

Remove deprecated BaseMetastoreCatalog.dropTableData #1841

Merged

Conversation

pvary commented Sep 21, 2020

Uh oh!

pvary commented Sep 21, 2020

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Sep 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Sep 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments