Skip to content

Integrate Iceberg catalog to Flink catalog #1170

@JingsongLi

Description

@JingsongLi

Like Spark 3, Flink also has Catalog interface, we can integrate Iceberg catalog to Flink catalog, iceberg as a Flink catalog, users can use Flink DDLs to manipulate iceberg metadata. And query iceberg tables directly.
But Flink catalog just like Spark catalog v1.

Database and namespace

The biggest incompatible thing is the database and namespace.

  • In Flink: like Hive, the identifier of a table is catalogName.databaseName.tableName. The database name must exist, and it should be a single string (Not null or whitespace only).
  • In Iceberg, the identifier of a table is namespace_level1.namespace_level2....tableName in a catalog. The level number of namespace can be zero, one and more.

A simple choice is only support single namespace for Flink catalog, this is a little limited.
Another choice can be:

  • For empty namespace: We can provide a config option empty.namespace.name, its default value can be __DEFAULT_EMPTY_NAMESPACE__.
  • For multi levels namespace: Using Namespace.toString as database name. In Flink SQL, supports quoting identifier. For example, using iceberg_catalog.{quote}namespace_level1.namespace_level2{quote}.table_name.

Partitions

There are two ways to map partitions:

  • Mapping Iceberg partitions to Flink partitions: Only support Identity Transform partitions.
  • Not mapping Iceberg partitions to Flink partitions, for Flink, the partition columns of Iceberg table are just normal columns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions