Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 8, 2021

What changes were proposed in this pull request?

Trying to relocate dependencies which could conflict Spark.

Why are the changes needed?

When we want to use shaded version of hive-exec (i.e., w/o classifier), more dependencies conflict with Spark. We need to relocate these dependencies too.

Does this PR introduce any user-facing change?

If previously downstream projects rely on included dependencies in shaded release, they might need to explicitly include these dependencies after the relocation here.

How was this patch tested?

CI

@viirya
Copy link
Member Author

viirya commented Jul 13, 2021

Thanks @sunchao. Seems caused by relocating avro, protobuf. Reverted the two relocations and see what's going on.

@sunchao
Copy link
Member

sunchao commented Jul 13, 2021

@viirya there are still more errors:

java.lang.NoClassDefFoundError: org/apache/hive/org/apache/commons/logging/LogFactory
	at org.apache.hadoop.hive.conf.valcoersion.JavaIOTmpdirVariableCoercion.<clinit>(JavaIOTmpdirVariableCoercion.java:35)
	at org.apache.hadoop.hive.conf.SystemVariables.<clinit>(SystemVariables.java:37)
	at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<init>(HiveConf.java:3445)
	at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<init>(HiveConf.java:3409)
	at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:436)
	at org.apache.hadoop.hive.druid.serde.DruidSerDe.initialize(DruidSerDe.java:113)
	at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:54)
	at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
	at org.apache.hadoop.hive.druid.TestDruidSerDe.testDruidDeserializer(TestDruidSerDe.java:510)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/org/apache/commons/codec/language/Soundex
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFSoundex.<init>(GenericUDFSoundex.java:49)
	... 29 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/org/apache/commons/codec/binary/Base64
	at org.apache.hadoop.hive.serde2.lazy.LazyBinary.decodeIfNeeded(LazyBinary.java:58)
	at org.apache.hadoop.hive.serde2.lazy.LazyBinary.init(LazyBinary.java:50)
	at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:226)
	at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:202)
	at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:128)
	at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:157)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:438)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:430)

This reverts commit 4644b3b.
@viirya
Copy link
Member Author

viirya commented Jul 16, 2021

I'm not sure why this happens, or if it is caused by this change. HiveConf is not from hive-exec (ql), and this doesn't relocate any Hive internal class (we shoudn't do this too). cc @sunchao

java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
	at org.apache.hadoop.hive.druid.TestDruidStorageHandler.testCommitCreateTablePlusCommitDropTableWithoutPurge(TestDruidStorageHandler.java:135)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

@sunchao
Copy link
Member

sunchao commented Jul 16, 2021

I'll take a look @viirya - where did you find this error? do you have a link?

I'm seeing these errors in the last run:

Caused by: java.lang.ClassNotFoundException: org.apache.hive.org.apache.commons.codec.language.Soundex
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 32 more
Caused by: java.lang.ClassNotFoundException: org.apache.hive.org.apache.commons.logging.LogFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 35 more

@viirya
Copy link
Member Author

viirya commented Jul 16, 2021

The quoted test failure from "Testing / split-20 / Archive / testCommitCreateTablePlusCommitDropTableWithoutPurge – org.apache.hadoop.hive.druid.TestDruidStorageHandler" in http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2459/11/tests.

@viirya
Copy link
Member Author

viirya commented Jul 16, 2021

Yea, I'm trying to deal with org.apache.hive.org.apache.commons.logging.LogFactory.

@sunchao
Copy link
Member

sunchao commented Jul 16, 2021

I see. Yea this is very weird. I suspect there are something else that caused it to fail, for instance, when initializing some of the enum items.

<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${project.version}</version>
<classifier>core</classifier>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use the core artifact - that's just bad!

what are you trying to achieve here with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

branch-2.3 ?
please note that changes should land on master first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As more dependencies are relocated here, some modules if they depends on non-core artifact, will cause class not found error...

The motivation is because we want to use shaded version of hive-exec (i.e., w/o classifier) in Spark to make sure it doesn't conflict guava version there. But there are more dependencies conflict with Spark. We need to relocate these dependencies too..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao do we need to have similar change on master first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think so since this PR is trying to shade things from the hive-exec-core I believe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should fix the issues with using the normal hive-exec artifact if there is any - loading the core jar could cause troubles...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: on branch2 guava is most likely not properly shaded away HIVE-22126

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgyrtkirk Guava is shaded in branch-2.3 via https://issues.apache.org/jira/browse/HIVE-23980. The issue is, in order for Spark to use shaded hive-exec, Hive will need to relocate more classes and at the same time making sure it won't break other modules (for instance, if the shaded class appears in certain API and another module imported the unshaded version of the class by itself).

Currently we've abandoned this approach and decided to shade the hive-exec-core within Spark itself, following similar approach in Trino (see https://github.com/trinodb/trino-hive-apache).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could relocate/shade away those deps to make it possible for other projects to use the normal artifact - seems like there is a very good list in the trino project.

@viirya
Copy link
Member Author

viirya commented Sep 15, 2021

Close this as we are taking a different direction to shade dependencies of Hive at Spark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants