Skip to content

Flink: Decouple the iceberg integration work from hadoop libraries #3117

@openinx

Description

@openinx

Thanks for working on this. I wanted to draw attention to the issue mentioned above (Update FlinkCatalogFactory to implement the new Factory interface). I closed it as basically a duplicate of this one though.

The reason I opened that issue is that several users have reported difficulties in using Iceberg with Flink, specifically in configuring Hadoop.

In some environments, users might not have access to setup the class path properly for including Hadoop. The new Context interface that provides access to the ClassLoader might help with that.

The ultimate goal is to decouple the need for hadoop to be in the environment when it's not actually needed (for example, when using S3FileIO). The thought was that upgrading to 1.13 (and the new FlinkCatalogFactory interface) will be helpful to provide the easier access to the ClassLoader.

In environments where hadoop Configuration isn't used, such as Ververica Platform or AWS Kinesis Data Analytics with S3FileIO, the hope is to remove the need for the Hadoop Configuration entirely.

For Ververica, users have reported a workaround but in Kinesis Data Analytics, it's a bit more difficult (as far as has been reported).

More information on the Hadoop Configurable concern can be found in this issue: #3044

Thanks again @zhangjun0x01!

Originally posted by @kbendick in #2558 (comment)

Metadata

Metadata

Assignees

Labels

beginnerIssues for apache iceberg beginners, enjoy to contribute !flinkgood first issueGood for newcomersimprovementPR that improves existing functionalitystale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions