-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-844: Custom UDFs implemented as inner classes can't be registered/uploaded #845
Conversation
…ed/uploaded. Add unit tests for Custom UDF upload.
Note: in a couple files, IntelliJ simplified the java include lists automatically. I reviewed the changes and left them in. If include wildcards are contrary to Streamline coding standards, please let me know and I'll revert them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. It looks good overall.
Placing new test module to inside of resource directory of another module (already feels complicated) makes me feel hacky, so I would like to avoid it, and couple new module to another (having parent) so that new module can be managed along with others.
There're many places using wildcard import (with or without static). I didn't point out exhaustively but just some of them. We're avoiding it so please extract them to separate imports.
@@ -81,4 +89,8 @@ | |||
} | |||
} | |||
|
|||
public static boolean isConcrete(Class clazz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
pom.xml
Outdated
@@ -19,6 +19,11 @@ | |||
<!--examples--> | |||
<module>examples/processors</module> | |||
<module>docker</module> | |||
<!-- test jar builders --> | |||
<module>streams/service/src/test/resources/custom-udf-microtest</module> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see such pattern like putting module to resources of other module, and feels a bit odd. What's advantage and disadvantage of this? If it doesn't provide outstanding advantage, I would like to place it like other modules, maybe having another module named 'streams/testsupport' as parent.
@harshach I guess you may want to put opinion on this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HeartSaVioR , thanks for the review.
Regarding the custom-udf-microtest module: I agree the location of the jar-creation module seems hackey. I was trying to achieve:
- Keep the source code for the test jar closely associated with the test jar itself (unlike the pre-built objects typically used for such purposes).
- But have the built jar end up under streams/service/src/test/resources/ without needing a copy phase.
The approach I took demonstrably works, and leaves the jar-creating source code closely tied to the jar itself, which I think is a big plus. But it looks weird.
I'll re-organize it to a top-level module and see if you like it better. We'll then be able to move it wherever you think it fits best.
import com.google.common.collect.ImmutableSet; | ||
import com.google.common.collect.Lists; | ||
import com.google.common.collect.Sets; | ||
import com.google.common.collect.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid wildcard: just a style guide for this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert the import wildcards, since they are contrary to Streamline team coding standards. Thanks for informing me.
import com.hortonworks.streamline.streams.catalog.TopologyTestRunHistory; | ||
import com.hortonworks.streamline.streams.catalog.TopologyVersion; | ||
import com.hortonworks.streamline.streams.catalog.TopologyWindow; | ||
import com.hortonworks.streamline.streams.catalog.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
import com.hortonworks.streamline.streams.rule.UDF5; | ||
import com.hortonworks.streamline.streams.rule.UDF6; | ||
import com.hortonworks.streamline.streams.rule.UDF7; | ||
import com.hortonworks.streamline.streams.rule.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@@ -2411,7 +2361,7 @@ public UDF addUDF(UDF udf) { | |||
return udf; | |||
} | |||
|
|||
public Map<String, Class<?>> loadUdfsFromJar(java.io.File jarFile) throws IOException { | |||
public static Map<String, Class<?>> loadUdfsFromJar(java.io.File jarFile) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this doesn't require any fields in StreamCatalogService, but if we would want to make it static, maybe better to have utility class instead. Just a 2 cents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. It fits very well with the other methods in StreamCatalogService, it would seem artificial to remove it to a utility class. Perhaps it is merely fortunate that it is possible to change it to static, and it is the only method I needed to access from the mocked class. But in the interests of making only minimal changes to enable unit testing, I think it is best to do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Make sense.
@@ -318,7 +293,8 @@ public Response downloadUdf(@PathParam("udfId") Long udfId, @Context SecurityCon | |||
throw EntityNotFoundException.byId(udfId.toString()); | |||
} | |||
|
|||
private void processUdf(InputStream inputStream, | |||
/* package-private (for unit-test) */ | |||
void processUdf(InputStream inputStream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add annotation @VisibleForTesting instead if Guava is available for this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, thanks.
|
||
<groupId>com.hortonworks.streamline.test</groupId> | ||
<artifactId>custom-udf-microtest</artifactId> | ||
<version>0.1.0</version> <!--fixed version - update manually --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have specific reason to avoid coupling version to the project version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that would be one thing it could usefully inherit from a parent :-)
I'll see if the shading command prevents it from pulling in all the parent's dependencies, and if so go back to letting it have streamline pom as parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the <minimizeJar> configuration successfully prevents the test jar from inheriting the parent's dependencies, so I put test-support/custom-udf-microtest as submodules under streamline, and let it inherit the product version.
If folks want to similarly move source code for other test jars, like those in streams/service/src/test/resources/customprocessorupload, into test-support/, it's a good place that avoids dependency ordering problems.
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<!-- Does not need a <parent> declaration --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering there's a reason to avoid coupling with other modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely yes. If you give it a parent relationship, it tends to pull all the parent's dependencies into the jar (altho of course this can be worked around). Also, as a test jar, it is intended to be as stand-alone as possible.
Finally, there's no benefit in declaring a parent, because it doesn't need to inherit anything from streamline pom. It's fine being just a sub-module for dependency-tree purposes.
<dependency> | ||
<groupId>com.hortonworks.streamline</groupId> | ||
<artifactId>streamline-sdk</artifactId> | ||
<version>0.1.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue here. I think this is easy to be broken.
FYI: I pulled the change and ran the unit tests via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through and left a few comments. It will be good to test if the inner classes works end-end.
return true; | ||
} catch (Throwable ex) { | ||
LOG.warn("class {} is subtype of {}, but it can't be initialized.", s, superTypeClass); | ||
LOG.warn("class {} is concrete subtype of {}, but can't be initialized due to exception:", s, superTypeClass, ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ex will not be logged due to missing {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the slf4j apis (since version 1.6.0) treat the last argument specially if it is a Throwable. Please see https://www.slf4j.org/faq.html#paramException
Indeed, I first used a '{}' with the ex, and it didn't print the stack trace. This does.
@@ -42,44 +42,19 @@ | |||
import org.slf4j.Logger; | |||
import org.slf4j.LoggerFactory; | |||
|
|||
import javax.ws.rs.Consumes; | |||
import javax.ws.rs.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid wildcard import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arunmahadevan , thanks for the review.
As noted with @HeartSaVioR 's comments, I will remove all wildcard imports.
import javax.ws.rs.core.SecurityContext; | ||
import javax.ws.rs.core.StreamingOutput; | ||
import javax.ws.rs.core.UriInfo; | ||
import javax.ws.rs.core.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid wildcard import
import java.util.Optional; | ||
import java.util.Set; | ||
import java.util.UUID; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid wildcard import
import static com.hortonworks.streamline.streams.security.Permission.EXECUTE; | ||
import static com.hortonworks.streamline.streams.security.Permission.READ; | ||
import static com.hortonworks.streamline.streams.security.Permission.WRITE; | ||
import static com.hortonworks.streamline.streams.security.Permission.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid wildcard import
pom.xml
Outdated
@@ -19,6 +19,11 @@ | |||
<!--examples--> | |||
<module>examples/processors</module> | |||
<module>docker</module> | |||
<!-- test jar builders --> | |||
<module>streams/service/src/test/resources/custom-udf-microtest</module> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to build a jar with some test UDFs and use it for testing the UDFCatalogResource? Maybe you can add the classes you want as a part of streamline-functions
test sources and create a test jar for that module and add it as a dependency for your unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that seems a good place to put them, I'll try that. If I can't prevent it from including all the super-module dependencies, I'll have to put it at the top level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streamline-functions builds its own target jar, so it can't have a subsidiary module as a jar. So I put the test jar sources under streamline/test-support/custom-udf-microtest.
If folks want to similarly move source code for other test jars, like those in streams/service/src/test/resources/customprocessorupload, into test-support/, it's a good place that avoids dependency ordering problems.
public void setClassName(String className) { | ||
this.className = className; | ||
this.className = className.replace('$', '.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this would work. In storm-sql these functions are registered by loading classes with class.forName
which expects the name retuned by getName
and not the canonical name. You may want to try it out end-end, or add some test case here to check - https://github.com/hortonworks/streamline/blob/master/streams/runners/storm/runtime/src/test/java/com/hortonworks/streamline/streams/runtime/storm/bolt/rules/FunctionsTest.java#L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arunmahadevan . I will add those tests, and instead of forcing to canonical form in the setClassName call, I'll force it to fqdn name during upload processing.
revert all wildcard imports use fqdn (not canonical) class names throughout UDF logic including forcing the UDF.ClassName to fqdn in validateUDF, if the user gave us canonical form adjust test cases for inner classes use @VisibleForTesting annotation as suggested
- move custom-udf-microtest.jar to streamline/test-support/custom-udf-microtest Normalized parent and module relationships. Confirmed that shader <minimizeJar> prevents pulling in inherited dependencies - UDFCatalogResourceTest dependency and execution still work. Used dependency:copy to copy built jar into streamline-service target/generated-test-resources/
public void setClassName(String className) { | ||
this.className = className; | ||
} | ||
public void setClassName(String className) { this.className = className; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't use one liner method definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, fooled again by IntelliJ. Fixed. Thanks for pointing it out.
+1 from me given that my review comments are all addressed. |
@mattf-horton , I am not sure why you are not able to build a test jar and add it as a dependency. Just trying to see if we can avoid adding new top level modules for testing. For e.g.
The |
Hi @arunmahadevan , I'm open to doing it that way if you think it is best; however, I've worked on this and your suggestion to add testing to streams/runtime/storm/bolt/rules/FunctionsTest.java, and have the following observations:
The way I've done it provides a single top-level home for modules for all these jars to go, as well as the source code for the CustomUDF test jar.
I will do a manual end-to-end test, and fix or report back here any problems I find. Hopefully I've responded adequately to your other suggestions. Thanks. |
@mattf-horton , understand your concerns. I dont see other test classes being part of the uploaded Jar a major issue since the UDF's are loaded by class name. The concern here is adding a top level module for test-support, haven't seen such top level modules in other projects. If it was for something like integration tests it would make sense. Anyways I am ok if @HeartSaVioR, @harshach and others think its reasonable. |
Streamline has a bit complicated module structure making contributors not easy to understand, but a one of implicit rule is that the modules which only bind to streamline are better to be placed under streams module. Hence streams/test-support/custom-udf-microtest feels better to me. |
@HeartSaVioR , happy to make that change. @arunmahadevan , is that satisfactory? Regarding auto-test status, I merged latest master to pick up the addition of streamline-sql module. That module has a compile error in PreparedStatementBuilder.java. I'm not proposing the fix as part of my PR since I don't want to step on others' work. But until that is fixed, master fails compilation (with or without my PR). Unit tests pass on all other modules (by-passing storage-core), except for a dependency on storage in webservice. |
By substituting jars into a real-world working HDF deployment, I've established that without this fix UDFs implemented as inner classes are not loaded, and with the fix they are successfully loaded. @arunmahadevan , okay with you if I commit this? Thanks. |
@mattf-horton , looks good to merge. Please squash the commits before merging. |
@mattf-horton, you want to squash and merge the changes to master ? let me know if you would like me to merge. Thanks. |
Thanks, @arunmahadevan . |
This PR makes the following changes: