Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-844: Custom UDFs implemented as inner classes can't be registered/uploaded #845

Merged
merged 8 commits into from
Aug 31, 2017

Conversation

mattf-apache
Copy link
Contributor

This PR makes the following changes:

  1. Fix the bug in ProxyUtil.java#L75 to use FqdnName instead of CanonicalName.
  2. Add unit test capability for Custom UDF upload:
    • new file UDFCatalogResourceTest.java and new module custom-udf-microtest (provides the jar)
    • modified root pom.xml and streams/service/pom.xml to add test dependency (and fixed junit and jmockit dependencies)
    • UDFCatalogResource.java#processUdf made package-private for unit test access
    • StreamCatalogService.java#loadUdfsFromJar made static for unit test access when StreamCatalogService is mocked.
  3. UDF.java: auto-convert to Canonical if the user inputs Fqdn form of classname (with '$').
  4. Early check for concreteness: A lot of time is wasted and a lot of noise is generated in the logs processing interfaces from the SDK, included in the CustomUDF uber-jar, that meet the UDFsuperClasses.assignableFrom() test, but aren't of interest. We shut these out by only fetching concrete (instantiable) classes. Changed name of JarReader#findSubtypeOfClasses to findConcreteSubtypesOfClass.
  5. A minor nuisance: streamline-dist/VERSION is a generated file not intended to be committed, but it is not currently in .gitignore . Added it.

…ed/uploaded. Add unit tests for Custom UDF upload.
@mattf-apache
Copy link
Contributor Author

mattf-apache commented Jul 25, 2017

Note: in a couple files, IntelliJ simplified the java include lists automatically. I reviewed the changes and left them in. If include wildcards are contrary to Streamline coding standards, please let me know and I'll revert them.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. It looks good overall.

Placing new test module to inside of resource directory of another module (already feels complicated) makes me feel hacky, so I would like to avoid it, and couple new module to another (having parent) so that new module can be managed along with others.

There're many places using wildcard import (with or without static). I didn't point out exhaustively but just some of them. We're avoiding it so please extract them to separate imports.

@@ -81,4 +89,8 @@
}
}

public static boolean isConcrete(Class clazz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

pom.xml Outdated
@@ -19,6 +19,11 @@
<!--examples-->
<module>examples/processors</module>
<module>docker</module>
<!-- test jar builders -->
<module>streams/service/src/test/resources/custom-udf-microtest</module>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see such pattern like putting module to resources of other module, and feels a bit odd. What's advantage and disadvantage of this? If it doesn't provide outstanding advantage, I would like to place it like other modules, maybe having another module named 'streams/testsupport' as parent.

@harshach I guess you may want to put opinion on this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeartSaVioR , thanks for the review.

Regarding the custom-udf-microtest module: I agree the location of the jar-creation module seems hackey. I was trying to achieve:

  • Keep the source code for the test jar closely associated with the test jar itself (unlike the pre-built objects typically used for such purposes).
  • But have the built jar end up under streams/service/src/test/resources/ without needing a copy phase.

The approach I took demonstrably works, and leaves the jar-creating source code closely tied to the jar itself, which I think is a big plus. But it looks weird.

I'll re-organize it to a top-level module and see if you like it better. We'll then be able to move it wherever you think it fits best.

import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Lists;
import com.google.common.collect.Sets;
import com.google.common.collect.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid wildcard: just a style guide for this project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert the import wildcards, since they are contrary to Streamline team coding standards. Thanks for informing me.

import com.hortonworks.streamline.streams.catalog.TopologyTestRunHistory;
import com.hortonworks.streamline.streams.catalog.TopologyVersion;
import com.hortonworks.streamline.streams.catalog.TopologyWindow;
import com.hortonworks.streamline.streams.catalog.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

import com.hortonworks.streamline.streams.rule.UDF5;
import com.hortonworks.streamline.streams.rule.UDF6;
import com.hortonworks.streamline.streams.rule.UDF7;
import com.hortonworks.streamline.streams.rule.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -2411,7 +2361,7 @@ public UDF addUDF(UDF udf) {
return udf;
}

public Map<String, Class<?>> loadUdfsFromJar(java.io.File jarFile) throws IOException {
public static Map<String, Class<?>> loadUdfsFromJar(java.io.File jarFile) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this doesn't require any fields in StreamCatalogService, but if we would want to make it static, maybe better to have utility class instead. Just a 2 cents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. It fits very well with the other methods in StreamCatalogService, it would seem artificial to remove it to a utility class. Perhaps it is merely fortunate that it is possible to change it to static, and it is the only method I needed to access from the mocked class. But in the interests of making only minimal changes to enable unit testing, I think it is best to do it this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Make sense.

@@ -318,7 +293,8 @@ public Response downloadUdf(@PathParam("udfId") Long udfId, @Context SecurityCon
throw EntityNotFoundException.byId(udfId.toString());
}

private void processUdf(InputStream inputStream,
/* package-private (for unit-test) */
void processUdf(InputStream inputStream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add annotation @VisibleForTesting instead if Guava is available for this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, thanks.


<groupId>com.hortonworks.streamline.test</groupId>
<artifactId>custom-udf-microtest</artifactId>
<version>0.1.0</version> <!--fixed version - update manually -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have specific reason to avoid coupling version to the project version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that would be one thing it could usefully inherit from a parent :-)
I'll see if the shading command prevents it from pulling in all the parent's dependencies, and if so go back to letting it have streamline pom as parent.

Copy link
Contributor Author

@mattf-apache mattf-apache Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the <minimizeJar> configuration successfully prevents the test jar from inheriting the parent's dependencies, so I put test-support/custom-udf-microtest as submodules under streamline, and let it inherit the product version.
If folks want to similarly move source code for other test jars, like those in streams/service/src/test/resources/customprocessorupload, into test-support/, it's a good place that avoids dependency ordering problems.

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<!-- Does not need a <parent> declaration -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering there's a reason to avoid coupling with other modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely yes. If you give it a parent relationship, it tends to pull all the parent's dependencies into the jar (altho of course this can be worked around). Also, as a test jar, it is intended to be as stand-alone as possible.

Finally, there's no benefit in declaring a parent, because it doesn't need to inherit anything from streamline pom. It's fine being just a sub-module for dependency-tree purposes.

<dependency>
<groupId>com.hortonworks.streamline</groupId>
<artifactId>streamline-sdk</artifactId>
<version>0.1.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue here. I think this is easy to be broken.

@HeartSaVioR
Copy link
Contributor

FYI: I pulled the change and ran the unit tests via mvn clean install, it succeed.

Copy link
Contributor

@arunmahadevan arunmahadevan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through and left a few comments. It will be good to test if the inner classes works end-end.

return true;
} catch (Throwable ex) {
LOG.warn("class {} is subtype of {}, but it can't be initialized.", s, superTypeClass);
LOG.warn("class {} is concrete subtype of {}, but can't be initialized due to exception:", s, superTypeClass, ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ex will not be logged due to missing {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the slf4j apis (since version 1.6.0) treat the last argument specially if it is a Throwable. Please see https://www.slf4j.org/faq.html#paramException
Indeed, I first used a '{}' with the ex, and it didn't print the stack trace. This does.

@@ -42,44 +42,19 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.ws.rs.Consumes;
import javax.ws.rs.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid wildcard import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arunmahadevan , thanks for the review.

As noted with @HeartSaVioR 's comments, I will remove all wildcard imports.

import javax.ws.rs.core.SecurityContext;
import javax.ws.rs.core.StreamingOutput;
import javax.ws.rs.core.UriInfo;
import javax.ws.rs.core.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid wildcard import

import java.util.Optional;
import java.util.Set;
import java.util.UUID;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid wildcard import

import static com.hortonworks.streamline.streams.security.Permission.EXECUTE;
import static com.hortonworks.streamline.streams.security.Permission.READ;
import static com.hortonworks.streamline.streams.security.Permission.WRITE;
import static com.hortonworks.streamline.streams.security.Permission.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid wildcard import

pom.xml Outdated
@@ -19,6 +19,11 @@
<!--examples-->
<module>examples/processors</module>
<module>docker</module>
<!-- test jar builders -->
<module>streams/service/src/test/resources/custom-udf-microtest</module>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to build a jar with some test UDFs and use it for testing the UDFCatalogResource? Maybe you can add the classes you want as a part of streamline-functions test sources and create a test jar for that module and add it as a dependency for your unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that seems a good place to put them, I'll try that. If I can't prevent it from including all the super-module dependencies, I'll have to put it at the top level.

Copy link
Contributor Author

@mattf-apache mattf-apache Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streamline-functions builds its own target jar, so it can't have a subsidiary module as a jar. So I put the test jar sources under streamline/test-support/custom-udf-microtest.

If folks want to similarly move source code for other test jars, like those in streams/service/src/test/resources/customprocessorupload, into test-support/, it's a good place that avoids dependency ordering problems.

public void setClassName(String className) {
this.className = className;
this.className = className.replace('$', '.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this would work. In storm-sql these functions are registered by loading classes with class.forName which expects the name retuned by getName and not the canonical name. You may want to try it out end-end, or add some test case here to check - https://github.com/hortonworks/streamline/blob/master/streams/runners/storm/runtime/src/test/java/com/hortonworks/streamline/streams/runtime/storm/bolt/rules/FunctionsTest.java#L50

Copy link
Contributor Author

@mattf-apache mattf-apache Jul 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arunmahadevan . I will add those tests, and instead of forcing to canonical form in the setClassName call, I'll force it to fqdn name during upload processing.

revert all wildcard imports
use fqdn (not canonical) class names throughout UDF logic
    including forcing the UDF.ClassName to fqdn in validateUDF, if the user gave us canonical form
    adjust test cases for inner classes
use @VisibleForTesting annotation as suggested
- move custom-udf-microtest.jar to streamline/test-support/custom-udf-microtest
  Normalized parent and module relationships.
  Confirmed that shader <minimizeJar> prevents pulling in inherited dependencies
- UDFCatalogResourceTest dependency and execution still work.
  Used dependency:copy to copy built jar into streamline-service target/generated-test-resources/
public void setClassName(String className) {
this.className = className;
}
public void setClassName(String className) { this.className = className; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't use one liner method definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, fooled again by IntelliJ. Fixed. Thanks for pointing it out.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Aug 2, 2017

+1 from me given that my review comments are all addressed.

@arunmahadevan
Copy link
Contributor

@mattf-horton , I am not sure why you are not able to build a test jar and add it as a dependency. Just trying to see if we can avoid adding new top level modules for testing.

For e.g.

  1. In streams/functions/pom.xml
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.0.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>test-jar</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
  1. Add a new UDF/UDAF that you intend to use in other modules under
    streams/functions/src/test/java/com/hortonworks/streamline/streams/udaf/test/Foo.java

  2. Now add the test jar as dependency elsewhere if you want to use those classes, e.g. in streams/service/pom.xml

        <dependency>
            <groupId>com.hortonworks.streamline</groupId>
            <artifactId>streamline-functions</artifactId>
            <type>test-jar</type>
            <version>0.1.0-SNAPSHOT</version>
            <scope>test</scope>
        </dependency>
  1. If you plan to just refer to the jar it will be available under streams/functions/target/streams/functions/target/streamline-functions-0.1.0-SNAPSHOT-tests.jar

The streamline-functions module should be built before the module where you plan to use it.

@mattf-apache
Copy link
Contributor Author

Hi @arunmahadevan , I'm open to doing it that way if you think it is best; however, I've worked on this and your suggestion to add testing to streams/runtime/storm/bolt/rules/FunctionsTest.java, and have the following observations:

  1. In the spirit of unit testing, I wanted the CustomUDF upload test jar to ONLY have the contents of a few custom UDF implementations, and the needed streamline-sdk classes. This means it should be in its own module. (It is probably possible to manipulate the maven assembly plugin to make it happen inside another module, but that is strongly dis-recommended by online sources.) But if I add to streamline-functions test sources and pom in the way you suggest, the jar will contain all streamline-function test classes, which are unrelated.

    a. The difficulty I was referring to wasn't the mechanics of building a test jar, but the need to keep the CustomUDF jar sources in their own module while having them "under" another module.

  2. Conversely, I'm uncomfortable putting CustomUDF implementations in the streamline-functions test sources, when they don't have anything to do with the other content in streamline-functions (which are all about the built-ins), and can't be used for unit testing in the streamline-functions module because even mocked uploading of a CustomUDF requires data structures from much farther downstream in the dependency chain.

  3. Understand that you want to minimize the number of top-level modules, but Streamline really needs a place for the source code of test jars and other resources to go. For example, all of these jars are checked into git as part of Streamline, but I couldn't find any associated sources:

./streams/runtime/src/test/resources/custom-split-join-lib.jar
./streams/service/src/test/resources/customprocessorupload/iotas-core.jar
./webservice/src/main/resources/node_modules/JSV/jsdoc-toolkit/java/classes/js.jar
./webservice/src/main/resources/node_modules/JSV/jsdoc-toolkit/jsdebug.jar
./webservice/src/main/resources/node_modules/JSV/jsdoc-toolkit/jsrun.jar
./webservice/src/test/resources/parser.jar
./webservice/src/test/resources/testnotifier.jar

The way I've done it provides a single top-level home for modules for all these jars to go, as well as the source code for the CustomUDF test jar.

  1. It didn't work out to test CustomUDFs from streams/runtime/storm/bolt/rules/FunctionsTest.java. While this is a very cool example of how to bring up a runnable RulesBolt with mocked dependencies, it appears it can only run with function classes already known in the JVM. To upload a CustomUDF and make it referenceable, requires a StreamCatalogService, and I couldn't figure out how to make the RulesBolt reference one. Harsha agreed it would be difficult to combine them.

  2. Using my CustomUDF test jar, I do have a thorough unit test of the UDFCatalogResource#processUdf method, with mocked dependencies, to upload CustomUDFs.

I will do a manual end-to-end test, and fix or report back here any problems I find.

Hopefully I've responded adequately to your other suggestions. Thanks.

@arunmahadevan
Copy link
Contributor

@mattf-horton , understand your concerns. I dont see other test classes being part of the uploaded Jar a major issue since the UDF's are loaded by class name.

The concern here is adding a top level module for test-support, haven't seen such top level modules in other projects. If it was for something like integration tests it would make sense. Anyways I am ok if @HeartSaVioR, @harshach and others think its reasonable.

@HeartSaVioR
Copy link
Contributor

Streamline has a bit complicated module structure making contributors not easy to understand, but a one of implicit rule is that the modules which only bind to streamline are better to be placed under streams module. Hence streams/test-support/custom-udf-microtest feels better to me.

@mattf-apache
Copy link
Contributor Author

@HeartSaVioR , happy to make that change. @arunmahadevan , is that satisfactory?

Regarding auto-test status, I merged latest master to pick up the addition of streamline-sql module. That module has a compile error in PreparedStatementBuilder.java. I'm not proposing the fix as part of my PR since I don't want to step on others' work. But until that is fixed, master fails compilation (with or without my PR). Unit tests pass on all other modules (by-passing storage-core), except for a dependency on storage in webservice.

@mattf-apache
Copy link
Contributor Author

By substituting jars into a real-world working HDF deployment, I've established that without this fix UDFs implemented as inner classes are not loaded, and with the fix they are successfully loaded.

@arunmahadevan , okay with you if I commit this? Thanks.

@arunmahadevan
Copy link
Contributor

@mattf-horton , looks good to merge. Please squash the commits before merging.

@arunmahadevan
Copy link
Contributor

@mattf-horton, you want to squash and merge the changes to master ? let me know if you would like me to merge. Thanks.

@mattf-apache mattf-apache merged commit a2d985b into hortonworks:master Aug 31, 2017
@mattf-apache
Copy link
Contributor Author

Thanks, @arunmahadevan .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants