Skip to content

Flink: Make Hadoop an optional dependency #7332

@Fokko

Description

@Fokko

Feature Request / Improvement

Playing around with pyflink and noticed that the Hadoop dependency is required when using the REST catalog:

~ python3.9                             
Python 3.9.16 (main, Dec  7 2022, 10:06:04) 
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> 
>>> from pyflink.datastream import StreamExecutionEnvironment
>>> 
>>> env = StreamExecutionEnvironment.get_execution_environment()
>>> iceberg_flink_runtime_jar = "/Users/fokkodriesprong/Desktop/iceberg/flink/v1.17/flink-runtime/build/libs/iceberg-flink-runtime-1.17-1.3.0-SNAPSHOT.jar"
>>> 
>>> env.add_jars("file://{}".format(iceberg_flink_runtime_jar))
>>> 
>>> from pyflink.table import StreamTableEnvironment
>>> 
>>> table_env = StreamTableEnvironment.create(env)
>>> 
>>> table_env.execute_sql("""
... CREATE CATALOG tabular WITH (
...     'type'='iceberg', 
...     'catalog-type'='rest',
...     'uri'='https://api.tabular.io/ws',
...     'credential'='t-tcEe4Ihp4eM:pyTlx_4ayKV7N54gXuBmMotVFLU'
... )
... """)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/lib/python3.9/site-packages/pyflink/table/table_environment.py", line 837, in execute_sql
    return TableResult(self._j_tenv.executeSql(stmt))
  File "/opt/homebrew/lib/python3.9/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/opt/homebrew/lib/python3.9/site-packages/pyflink/util/exceptions.py", line 146, in deco
    return f(*a, **kw)
  File "/opt/homebrew/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o23.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
	at org.apache.iceberg.flink.FlinkCatalogFactory.clusterHadoopConf(FlinkCatalogFactory.java:211)
	at org.apache.iceberg.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:139)
	at org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:414)
	at org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1466)
	at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1212)
	at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:765)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
	at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
	at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	... 17 more

When using a Hadoop or Hive catalog, this makes perfect sense but would be nice to make it optional when using the REST catalog.

Query engine

Flink

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions