-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader #27025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
62a39de
02188e0
b801ac5
418e027
988061b
39a2171
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog( | |
| name: String, | ||
| clazz: Class[_], | ||
| input: Seq[Expression]): Expression = { | ||
|
|
||
| Try(super.makeFunctionExpression(name, clazz, input)).getOrElse { | ||
| var udfExpr: Option[Expression] = None | ||
| try { | ||
| // When we instantiate hive UDF wrapper class, we may throw exception if the input | ||
| // expressions don't satisfy the hive UDF, such as type mismatch, input number | ||
| // mismatch, etc. Here we catch the exception and throw AnalysisException instead. | ||
| if (classOf[UDF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[GenericUDF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[UDAF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveUDAFFunction( | ||
| name, | ||
| new HiveFunctionWrapper(clazz.getName), | ||
| input, | ||
| isUDAFBridgeRequired = true)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types. | ||
| // Current thread context classloader may not be the one loaded the class. Need to switch | ||
| // context classloader to initialize instance properly. | ||
| Utils.withContextClassLoader(clazz.getClassLoader) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it guaranteed that
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the class is from classpath (not loaded from addJar), it would be spark ClassLoader instead of jarClassLoader, though jarClassLoader may be able to load it as it contains Spark classloader. So just changing to jarClassLoader may work in most cases, but this would also work for the classloader which dynamically loads the classes, as we're using classloader which "loaded" the class we want to instantiate. |
||
| Try(super.makeFunctionExpression(name, clazz, input)).getOrElse { | ||
| var udfExpr: Option[Expression] = None | ||
| try { | ||
| // When we instantiate hive UDF wrapper class, we may throw exception if the input | ||
| // expressions don't satisfy the hive UDF, such as type mismatch, input number | ||
| // mismatch, etc. Here we catch the exception and throw AnalysisException instead. | ||
| if (classOf[UDF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found a potential problem: here we call However, if the expression gets transformed later, which copies I think we should materialize the loaded class in @HeartSaVioR can you take a look?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pinging me. Could you please confirm my understanding? Actually my knowledge to resolve this issue came from debugging (like, reverse-engineering) so I'm not sure I get it 100%. If my understanding is correct, this seems to be the simple reproducer - could you please confirm I understand correctly?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yup,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I figured out above code doesn't give error - HiveFunctionWrapper stores That said, below code gives error: Interestingly, like below, if we do makeCopy with classloader which loads clazz, it also doesn't give any error: we force call Could you please check whether my observation is correct, or please let me know if I'm missing something?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The experimental UT code I used is below (added to SQLQuerySuite.scala) :
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK let me put my findings: If you look at I don't know the history but I assume "we always have to create new instance for Simple UDF" is correct. I think what we can do is to cache the loaded
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh OK. I missed the case we don't cache the function. Thanks for the pointer!
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| } else if (classOf[GenericUDF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[UDAF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveUDAFFunction( | ||
| name, | ||
| new HiveFunctionWrapper(clazz.getName), | ||
| input, | ||
| isUDAFBridgeRequired = true)) | ||
| udfExpr.get.dataType // Force it to check input data types. | ||
| } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types. | ||
| } | ||
| } catch { | ||
| case NonFatal(e) => | ||
| val noHandlerMsg = s"No handler for UDF/UDAF/UDTF '${clazz.getCanonicalName}': $e" | ||
| val errorMsg = | ||
| if (classOf[GenericUDTF].isAssignableFrom(clazz)) { | ||
| s"$noHandlerMsg\nPlease make sure your function overrides " + | ||
| "`public StructObjectInspector initialize(ObjectInspector[] args)`." | ||
| } else { | ||
| noHandlerMsg | ||
| } | ||
| val analysisException = new AnalysisException(errorMsg) | ||
| analysisException.setStackTrace(e.getStackTrace) | ||
| throw analysisException | ||
| } | ||
| udfExpr.getOrElse { | ||
| throw new AnalysisException(s"No handler for UDF/UDAF/UDTF '${clazz.getCanonicalName}'") | ||
| } | ||
| } catch { | ||
| case NonFatal(e) => | ||
| val noHandlerMsg = s"No handler for UDF/UDAF/UDTF '${clazz.getCanonicalName}': $e" | ||
| val errorMsg = | ||
| if (classOf[GenericUDTF].isAssignableFrom(clazz)) { | ||
| s"$noHandlerMsg\nPlease make sure your function overrides " + | ||
| "`public StructObjectInspector initialize(ObjectInspector[] args)`." | ||
| } else { | ||
| noHandlerMsg | ||
| } | ||
| val analysisException = new AnalysisException(errorMsg) | ||
| analysisException.setStackTrace(e.getStackTrace) | ||
| throw analysisException | ||
| } | ||
| udfExpr.getOrElse { | ||
| throw new AnalysisException(s"No handler for UDF/UDAF/UDTF '${clazz.getCanonicalName}'") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Place files which are being used as resources of tests but shouldn't be added to classpath. |
Uh oh!
There was an error while loading. Please reload this page.