-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Like Spark 3, Flink also has Catalog interface, we can integrate Iceberg catalog to Flink catalog, iceberg as a Flink catalog, users can use Flink DDLs to manipulate iceberg metadata. And query iceberg tables directly.
But Flink catalog just like Spark catalog v1.
Database and namespace
The biggest incompatible thing is the database and namespace.
- In Flink: like Hive, the identifier of a table is
catalogName.databaseName.tableName. The database name must exist, and it should be a single string (Not null or whitespace only). - In Iceberg, the identifier of a table is
namespace_level1.namespace_level2....tableNamein a catalog. The level number of namespace can be zero, one and more.
A simple choice is only support single namespace for Flink catalog, this is a little limited.
Another choice can be:
- For empty namespace: We can provide a config option
empty.namespace.name, its default value can be__DEFAULT_EMPTY_NAMESPACE__. - For multi levels namespace: Using
Namespace.toStringas database name. In Flink SQL, supports quoting identifier. For example, usingiceberg_catalog.{quote}namespace_level1.namespace_level2{quote}.table_name.
Partitions
There are two ways to map partitions:
- Mapping Iceberg partitions to Flink partitions: Only support Identity Transform partitions.
- Not mapping Iceberg partitions to Flink partitions, for Flink, the partition columns of Iceberg table are just normal columns.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels