-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed as not planned
Description
Feature Request / Improvement
Playing around with pyflink and noticed that the Hadoop dependency is required when using the REST catalog:
➜ ~ python3.9
Python 3.9.16 (main, Dec 7 2022, 10:06:04)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> from pyflink.datastream import StreamExecutionEnvironment
>>>
>>> env = StreamExecutionEnvironment.get_execution_environment()
>>> iceberg_flink_runtime_jar = "/Users/fokkodriesprong/Desktop/iceberg/flink/v1.17/flink-runtime/build/libs/iceberg-flink-runtime-1.17-1.3.0-SNAPSHOT.jar"
>>>
>>> env.add_jars("file://{}".format(iceberg_flink_runtime_jar))
>>>
>>> from pyflink.table import StreamTableEnvironment
>>>
>>> table_env = StreamTableEnvironment.create(env)
>>>
>>> table_env.execute_sql("""
... CREATE CATALOG tabular WITH (
... 'type'='iceberg',
... 'catalog-type'='rest',
... 'uri'='https://api.tabular.io/ws',
... 'credential'='t-tcEe4Ihp4eM:pyTlx_4ayKV7N54gXuBmMotVFLU'
... )
... """)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/lib/python3.9/site-packages/pyflink/table/table_environment.py", line 837, in execute_sql
return TableResult(self._j_tenv.executeSql(stmt))
File "/opt/homebrew/lib/python3.9/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/opt/homebrew/lib/python3.9/site-packages/pyflink/util/exceptions.py", line 146, in deco
return f(*a, **kw)
File "/opt/homebrew/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o23.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.iceberg.flink.FlinkCatalogFactory.clusterHadoopConf(FlinkCatalogFactory.java:211)
at org.apache.iceberg.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:139)
at org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:414)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1466)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1212)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:765)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 17 moreWhen using a Hadoop or Hive catalog, this makes perfect sense but would be nice to make it optional when using the REST catalog.
Query engine
Flink