Skip to content

Spark 3: Consider providing better support for path-based tables #1306

@aokolnychyi

Description

@aokolnychyi

In Spark 3, support for path-based tables is limited. In particular, I don't see a way to create a Hadoop table at a given location through Spark. Users have to use the Iceberg API for that as we use HadoopTables only for reading.

I see a lot of use cases with no metastore where tables are persisted in a location. Usually, these are HDFS use cases. While we can leverage HadoopCatalog for such cases, it has its own drawbacks: list operations to find a table and what is even more important it requires a special layout. The latter point is important as we cannot use HadoopCatalog for path-based tables that were migrated to Iceberg. I want Iceberg to support migration of path-based as well as metastore-based tables through SQL extensions.

I'd consider adding support to our Spark catalogs to create/load a table using a table path as an identifier. Under the hood, it will use HadoopTables.

For example, CREATE TABLE `path/to/table` USING iceberg or SELECT * FROM `path/to/table` WHERE pred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions