-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the problem you faced
When using Flink to do an incremental query read from a table, using the 0.12.2 and Flink 1.15, I occasionally get a ClassNotFoundException for org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat. This usually happens when running the test from inside Eclipse, occasionally from the command line.
To Reproduce
Steps to reproduce the behavior:
git clone https://github.com/kkrugler/flink-hudi-query-testcd flink-hudi-query-test- Edit
pom.xmlto remove explicit dependency onhive-exec. mvn clean package
Expected behavior
The tests should all pass.
Environment Description
-
Hudi version : 0.12.2
-
Flink version : 1.15.1
Additional context
I believe the problem is that the hudi-hadoop-mr dependency on hive-exec (with classifier core) is marked as provided, but when running a Flink workflow in a typical Flink cluster you don't have Hive jars installed. I think maybe it's OK for hudi-hadoop-mr to say this is provided, but hudi-flink should then have an explicit dependency on this artifact, something like:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<classifier>core</classifier>
<version>${hive.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
Note the exclusion of all transitive dependencies. All that Hudi needs from hive-exec is the one missing class, as Hudi uses HoodieParquetInputFormatBase as the base class, as per:
/**
* !!! PLEASE READ CAREFULLY !!!
*
* NOTE: Hive bears optimizations which are based upon validating whether {@link FileInputFormat}
* implementation inherits from {@link MapredParquetInputFormat}.
*
* To make sure that Hudi implementations are leveraging these optimizations to the fullest, this class
* serves as a base-class for every {@link FileInputFormat} implementations working with Parquet file-format.
*
* However, this class serves as a simple delegate to the actual implementation hierarchy: it expects
* either {@link HoodieCopyOnWriteTableInputFormat} or {@link HoodieMergeOnReadTableInputFormat} to be supplied
* to which it delegates all of its necessary methods.
*/
public abstract class HoodieParquetInputFormatBase extends MapredParquetInputFormat implements Configurable {And if you don't do this exclusion, you wind up pulling in lots of additional code that's not needed (AFAICT).
Stacktrace
23/03/09 10:08:22 INFO executiongraph.ExecutionGraph:1423 - Source: split_monitor(table=[example-table], fields=[event_time, data, enrichment, key, partition]) (1/1) (16f707e9f9462ca1ac57f69e5bc9ae4e) switched from RUNNING to FAILED on 9cbbe102-0f19-48d6-849c-4755cab4fa2d @ localhost (dataPort=-1).
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(ClassLoader.java:1016) ~[?:?]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?]
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(ClassLoader.java:1016) ~[?:?]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?]
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(ClassLoader.java:1016) ~[?:?]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?]
at org.apache.hudi.sink.partitioner.profile.WriteProfiles.getCommitMetadata(WriteProfiles.java:236) ~[hudi-flink-0.12.2.jar:0.12.2]
at org.apache.hudi.source.IncrementalInputSplits.lambda$inputSplits$2(IncrementalInputSplits.java:285) ~[hudi-flink-0.12.2.jar:0.12.2]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
at org.apache.hudi.source.IncrementalInputSplits.inputSplits(IncrementalInputSplits.java:285) ~[hudi-flink-0.12.2.jar:0.12.2]
at org.apache.hudi.source.StreamReadMonitoringFunction.monitorDirAndForwardSplits(StreamReadMonitoringFunction.java:199) ~[hudi-flink-0.12.2.jar:0.12.2]
at org.apache.hudi.source.StreamReadMonitoringFunction.run(StreamReadMonitoringFunction.java:172) ~[hudi-flink-0.12.2.jar:0.12.2]
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) ~[flink-streaming-java-1.15.1.jar:1.15.1]
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) ~[flink-streaming-java-1.15.1.jar:1.15.1]
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332) ~[flink-streaming-java-1.15.1.jar:1.15.1]
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?]
... 42 more
Metadata
Metadata
Assignees
Labels
Type
Projects
Status