Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Jul 10, 2017

What changes were proposed in this pull request?

Shade JPMML classes (org.jpmml.**) and related PMML model classes (org.dmg.pmml.**). This insulates downstream users from the version of JPMML in Spark, allows us to upgrade more freely, and allows downstream users to use a different version. JPMML minor releases are not generally forwards/backwards compatible.

How was this patch tested?

Existing tests

@srowen srowen changed the title Shade JPMML [SPARK-15526][MLLIB] Shade JPMML Jul 10, 2017
@srowen
Copy link
Member Author

srowen commented Jul 10, 2017

CC @vanzin for a look at the shading, as a sense check
maybe @jkbradley or @MLnick is interested in this change too.

@vruusmann later I think we could update to JPMML 1.3 too.

@vruusmann
Copy link
Contributor

Good to know that there will be some relief coming in Apache Spark 2.3.X.

I don't think that the shading will break any Spark application that depends on the PMMLExportable trait, because this trait doesn't expose any org.dmg.pmml.* classes, as its "user interface" is all about IO operations (ie. "send PMML content to file/stream location", not "return live PMML object instance for further manipulation").

I did rewrite the "Installation" section of the JPMML-SparkML project last week to bring more clarity to application classpath conflict resolution: https://github.com/jpmml/jpmml-sparkml#library

@srowen
Copy link
Member Author

srowen commented Jul 10, 2017

Yes that's an important point I didn't mention. None of the shaded classes appear in any API, except private[mllib] classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2017

Test build #79459 has finished for PR 18584 at commit 14f7bb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jul 10, 2017

I don't think that the shading will break any Spark application that depends on the PMMLExportable trait

Hmm, I'd be more comfortable if one of you guys actually tested this to be true:

  • extend the old trait in a user class
  • try to run that class with the new Spark with the shaded dependency

Because Scala traits are weird (bytecode is copied from the trait into the class that extends it), there could be dangling references to the non-shaded classes, even if those classes are not explicitly exposed in the trait.

@srowen
Copy link
Member Author

srowen commented Jul 11, 2017

@vanzin I took this example: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala which should exercise this code most directly, broke it out into a project and compiled it vs Spark 2.1.1.

I then build Spark from this PR's branch, and ran the example locally with spark-submit. It worked, but only if I set SPARK_PREPEND_CLASSES=1. Am I right that this is to be expected, or the fact that it failed without this is a problem? the failure otherwise was:

Exception in thread "main" java.lang.NoClassDefFoundError: org/spark_project/dmg/pmml/Measure
	at org.apache.spark.mllib.pmml.export.PMMLModelExportFactory$.createPMMLModelExport(PMMLModelExportFactory.scala:38)
	at org.apache.spark.mllib.pmml.PMMLExportable$class.toPMML(PMMLExportable.scala:43)
	at org.apache.spark.mllib.pmml.PMMLExportable$class.toPMML(PMMLExportable.scala:78)
	at org.apache.spark.mllib.clustering.KMeansModel.toPMML(KMeansModel.scala:39)
	at com.cloudera.datascience.jpmml.Test$.main(Test.scala:31)
...

That's not an error between the app and Spark API but saying that the Spark code isn't finding the shaded JPMML that it is expecting to link against.

@vanzin
Copy link
Contributor

vanzin commented Jul 11, 2017

Yeah, that points at some error with shading / packaging. Is that shaded class in any of the jars in assembly/target/scala-2.11/jars?

@vanzin
Copy link
Contributor

vanzin commented Jul 11, 2017

Also, from the stack trace, the class that is extending PMMLExportable seems to be KMeansModel, which is a Spark class, right? That won't trigger the situation I described before. For that to trigger, a class in the user app needs to extend PMMLExportable. Not sure if that's how that interface is supposed to be used since I'm not familiar with mllib, but it's a public interface...

@srowen
Copy link
Member Author

srowen commented Jul 12, 2017

Yeah, the assembly/target/scala-2.11/jars don't contain any org.spark_project.jpmml.** classes. OK yeah that's a problem. I don't quite see what I missed though, because it looks like I added the same shade config that's used for Guava. Any clues off the top of your head @vanzin -- I'll keep poking to figure out what's different.

Yes good point about PMMLExportable. It's a developer API, but internally calls a method that will fail on anything but MLlib classes. Still, I added a no-op implementation of it to my example and called it and it was fine. So I think that much is OK after shading.

@vanzin
Copy link
Contributor

vanzin commented Jul 12, 2017

I think the classes are missing because you haven't added the artifact to artifactSet/includes in that plugin's configuration, you just defined the relocation.

@srowen
Copy link
Member Author

srowen commented Jul 12, 2017

Scratch that. I got it working. I forgot to <include> the artifacts.

@SparkQA
Copy link

SparkQA commented Jul 13, 2017

Test build #79571 has finished for PR 18584 at commit d14536b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM, too.

@vanzin
Copy link
Contributor

vanzin commented Jul 13, 2017

Merging to master.

@asfgit asfgit closed this in 5c8edfc Jul 13, 2017
@srowen srowen deleted the SPARK-15526 branch July 14, 2017 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants