Core: Implement Catalogs.createTable and Catalogs.dropTable#1481
Core: Implement Catalogs.createTable and Catalogs.dropTable#1481rdblue merged 7 commits intoapache:masterfrom
Conversation
|
CC: @rdblue, @massdosage - another piece of Hive integration |
core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
Outdated
Show resolved
Hide resolved
| * @param location a path URI (e.g. hdfs:///warehouse/my_table) | ||
| * @return true if the table was dropped | ||
| */ | ||
| public boolean dropTable(String location) { |
There was a problem hiding this comment.
I think this wasn't implemented before because it is not part of the Tables API, but now that this is the only implementation, maybe we should consider just deprecating the Tables API and making HadoopTables a stand-alone class.
There was a problem hiding this comment.
Maybe this would merit another discussion, and another PR
| public static Table createTable(Configuration conf, Properties props) { | ||
| String schemaString = props.getProperty(InputFormatConfig.TABLE_SCHEMA); | ||
| Preconditions.checkNotNull(schemaString, "Table schema not set"); | ||
| Schema schema = SchemaParser.fromJson(props.getProperty(InputFormatConfig.TABLE_SCHEMA)); |
There was a problem hiding this comment.
It looks like this is the reason why the examples specify a table property. Can we instead use Hive schema DDL and convert it to Iceberg? Similarly, can we get the identity partition fields that way to create a spec?
There was a problem hiding this comment.
I think we should keep the serialized schema for the Catalogs interface. Other systems like Impala, Presto, etc. might want to use it as well.
I would like to tackle the Hive schema DDL in another PR. The data is available in HiveIcebergSerDe.initialize in a somewhat convoluted way. I would like to get it there and convert it to the Iceberg Schema string. From there I would only push the Iceberg related stuff down further.
What do you think?
There was a problem hiding this comment.
I think it's fine to do this in a separate PR. I just really don't want to require setting properties with JSON schema or spec representations as the way to use Iceberg. It's okay for a way to customize if there isn't syntax, but normal cases should just use DDL.
| } | ||
| } | ||
|
|
||
| Optional<Catalog> catalog = loadCatalog(conf); |
There was a problem hiding this comment.
Somewhat out of scope: We might want to build a Catalog for this logic so that this class can avoid loading and checking the catalog in every method. The catalog would get created with the configuration and handle this delegation internally.
There was a problem hiding this comment.
HiveCatalog has a cache, but this might be useful for other Catalogs as well.
This seems like a good idea to pursue, but I do not promise anything here, as I have too much on my plate currently
|
Thanks, @pvary! Mostly looks good, but I'd like to fix the |
…ion. Javadoc is revisited
There was a problem hiding this comment.
I should have caught this yesterday, but shouldn't this return false instead of throwing the exception? That's what all the other drop methods do. If the table doesn't exist, it isn't an exceptional case. It just returns false to signal that it nothing needed to be done.
There was a problem hiding this comment.
I should have caught this yesterday, but shouldn't this return
falseinstead of throwing the exception? That's what all the otherdropmethods do. If the table doesn't exist, it isn't an exceptional case. It just returns false to signal that it nothing needed to be done.
Good point! Finding it now is better than finding it after pushing the PR 😄
Fixed.
|
Thanks for the fix! I merged this. |
If tools want to implement drop and create table operations using the Catalogs they need a common interface to accessing the functionality.
For example these methods will be used by HiveMetaHook to allow Iceberg table creation through Hive SQL commands
The patch includes 2 commits: