Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Mar 2, 2015

Similar to MatrixFactorizaionModel, we only need wrappers to support save/load for tree models in Python.

@jkbradley

@SparkQA
Copy link

SparkQA commented Mar 2, 2015

Test build #28183 has started for PR 4854 at commit 201b3b9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 2, 2015

Test build #28183 has finished for PR 4854 at commit 201b3b9.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
    • class TreeEnsembleModel(JavaModelWrapper, JavaSaveable):
    • class DecisionTreeModel(JavaModelWrapper, JavaSaveable, JavaLoader):
    • class RandomForestModel(TreeEnsembleModel, JavaLoader):
    • class GradientBoostedTreesModel(TreeEnsembleModel, JavaLoader):
    • class Saveable(object):
    • class JavaSaveable(Saveable):
    • class Loader(object):
    • class JavaLoader(Loader):
    • java_class = cls._java_loader_class()

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28183/
Test FAILed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing quotation mark (here and in other examples)

@SparkQA
Copy link

SparkQA commented Mar 2, 2015

Test build #28185 has started for PR 4854 at commit 8ebcac2.

  • This patch merges cleanly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@jkbradley
Copy link
Member

LGTM other than that typo. I'm trying to compile & test now.

@SparkQA
Copy link

SparkQA commented Mar 2, 2015

Test build #28185 has finished for PR 4854 at commit 8ebcac2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
    • class TreeEnsembleModel(JavaModelWrapper, JavaSaveable):
    • class DecisionTreeModel(JavaModelWrapper, JavaSaveable, JavaLoader):
    • class RandomForestModel(TreeEnsembleModel, JavaLoader):
    • class GradientBoostedTreesModel(TreeEnsembleModel, JavaLoader):
    • class Saveable(object):
    • class JavaSaveable(Saveable):
    • class Loader(object):
    • class JavaLoader(Loader):
    • java_class = cls._java_loader_class()

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28185/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Mar 2, 2015

Test build #28196 has started for PR 4854 at commit 4586a4d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 3, 2015

Test build #28196 timed out for PR 4854 at commit 4586a4d after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28196/
Test FAILed.

@mengxr
Copy link
Contributor Author

mengxr commented Mar 3, 2015

test this please

@SparkQA
Copy link

SparkQA commented Mar 3, 2015

Test build #28208 has started for PR 4854 at commit 4586a4d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 3, 2015

Test build #28208 has finished for PR 4854 at commit 4586a4d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
    • class TreeEnsembleModel(JavaModelWrapper, JavaSaveable):
    • class DecisionTreeModel(JavaModelWrapper, JavaSaveable, JavaLoader):
    • class RandomForestModel(TreeEnsembleModel, JavaLoader):
    • class GradientBoostedTreesModel(TreeEnsembleModel, JavaLoader):
    • class JavaSaveable(Saveable):
    • java_class = cls._java_loader_class()

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28208/
Test FAILed.

@mengxr
Copy link
Contributor Author

mengxr commented Mar 3, 2015

test this please

@SparkQA
Copy link

SparkQA commented Mar 3, 2015

Test build #28215 has started for PR 4854 at commit 4586a4d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 3, 2015

Test build #28215 has finished for PR 4854 at commit 4586a4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28215/
Test PASSed.

asfgit pushed a commit that referenced this pull request Mar 3, 2015
Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #4854 from mengxr/SPARK-6097 and squashes the following commits:

4586a4d [Xiangrui Meng] fix more typos
8ebcac2 [Xiangrui Meng] fix python style
91172d8 [Xiangrui Meng] fix typos
201b3b9 [Xiangrui Meng] update user guide
b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib

(cherry picked from commit 7e53a79)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in 7e53a79 Mar 3, 2015
@mengxr
Copy link
Contributor Author

mengxr commented Mar 3, 2015

Merged into master and branch-1.3.

@catmonkeylee
Copy link

When I run the sample code in cluster mode, there is an error.

Traceback (most recent call last):
File "/data1/s/apps/spark-app/app/sample_rf.py", line 25, in
sameModel = RandomForestModel.load(sc, _model_path)
File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 254, in load
java_model = cls._load_java(sc, path)
File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 250, in _load_java
return java_obj.load(sc._jsc.sc(), path)
File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call
File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.mllib.tree.model.RandomForestModel.load.
: java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:125)
at org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:65)
at org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

@jkbradley
Copy link
Member

Can you please open up a JIRA to report the bug? Also, what is the app you were running, and could you provide the code which created and saved the model originally?

@catmonkeylee
Copy link

I don't know how to create a new issue, So I 'm sorry to send this mail to you. Please Help me.

I run the code on a spark cluster , spark version is 1.3.0

The test code:

from pyspark import SparkContext, SparkConf
from pyspark.mllib.tree import RandomForest, RandomForestModel
from pyspark.mllib.util import MLUtils

conf = SparkConf().setAppName('LocalTest')
sc = SparkContext(conf=conf)
data = MLUtils.loadLibSVMFile(sc, 'data/mllib/sample_libsvm_data.txt')
print data.count()
(trainingData, testData) = data.randomSplit([0.7, 0.3])
model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={},
numTrees=3, featureSubsetStrategy="auto",
impurity='gini', maxDepth=4, maxBins=32)

Evaluate model on test instances and compute test error

predictions = model.predict(testData.map(lambda x: x.features))
labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(testData.count())
print('Test Error = ' + str(testErr))
print('Learned classification forest model:')
print(model.toDebugString())

Save and load model

_model_path = "/home/s/apps/spark-app/data/myModelPath"
model.save(sc, _model_path)
sameModel = RandomForestModel.load(sc, _model_path)
sc.stop()

run command:
spark-submit --master spark://t0.q.net:7077 --executor-memory 1G sample_rf.py

Then I get this error :

Traceback (most recent call last):
File "/data1/s/apps/spark-app/app/sample_rf.py", line 25, in
sameModel = RandomForestModel.load(sc, _model_path)
File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 254, in load
java_model = cls._load_java(sc, path)
File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 250, in _load_java
return java_obj.load(sc._jsc.sc(), path)
File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call
File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.mllib.tree.model.RandomForestModel.load.
: java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:125)
at org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:65)
at org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:724)

发自我的 iPhone

在 2015年3月21日,06:45,jkbradley notifications@github.com 写道:

Can you please open up a JIRA to report the bug? Also, what is the app you were running, and could you provide the code which created and saved the model originally?


Reply to this email directly or view it on GitHub.

@jkbradley
Copy link
Member

@catmonkeylee I created a JIRA here: [https://issues.apache.org/jira/browse/SPARK-6457] If you can, please comment on the JIRA; I'll add suggestions there. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants