How can i use RandomForestClassifier with sparkit-learn library #72

Timoux · 2016-10-04T12:52:43Z

from splearn.ensemble import SparkRandomForestClassifier
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named ensemble

taynaud · 2016-10-04T12:58:33Z

Hello,

It is not yet released, you need to install latest master.

pip install git+https://github.com/lensacom/sparkit-learn.git

Timoux · 2016-10-04T13:06:58Z

hello,
I tried to do it now (succesfully) but the same error

from splearn.ensemble import SparkRandomForestClassifier
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named ensemble

Timoux · 2016-10-04T13:13:34Z

it's much better with : pip install --upgrade git+https://github.com/lensacom/sparkit-learn.git

Do you think that i can use the same parameters ?

#Search best params
forest = RandomForestClassifier(
n_estimators=250,
criterion='gini',
max_depth=46,
min_samples_split=26,
min_samples_leaf=2,
max_features=2, max_leaf_nodes=None,
bootstrap=True, oob_score=True, verbose=0
)

param = {"n_estimators": list(range(20, 300,40)),
"max_depth": list(range(1,75,5)),
"min_samples_split": list(range(2,32,4)),
"min_samples_leaf": list(range(2,18,4)) }

digit_rf=GridSearchCV(forest,param,cv=5,n_jobs=-1)
Aforest = digit_rf.fit(X_train, Y_train)

taynaud · 2016-10-04T13:16:23Z

It depends on your data, but be carefull, n_estimators is misleading coming from scikit-learn.

It will learn n_estimators X number of partitions.

This is because this implementation in fact train RandomForestClassifier on each partition and then merge them.

Thus you may need to reduce n_estimators depending on your dataset.

Timoux · 2016-10-04T13:43:09Z

It works, but i have new issue with the SparkGridSearchCV

digit_rf.best_estimator_
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'SparkGridSearchCV' object has no attribute 'best_estimator_

Timoux · 2016-10-05T09:09:26Z

Someone knows if the SparkGridSearchCV offers the same parameters ?

Timoux · 2016-10-05T15:54:13Z

Another issue with SparkGridSearchCV on yarn-client MODE

16/10/05 17:52:04 ERROR akka.ErrorMonitor: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1852)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:708)
at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:134)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
at com.esotericsoftware.kryo.io.Output.close(Output.java:165)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can i use RandomForestClassifier with sparkit-learn library #72

How can i use RandomForestClassifier with sparkit-learn library #72

Timoux commented Oct 4, 2016

taynaud commented Oct 4, 2016

Timoux commented Oct 4, 2016

Timoux commented Oct 4, 2016 •

edited

Loading

taynaud commented Oct 4, 2016

Timoux commented Oct 4, 2016 •

edited

Loading

Timoux commented Oct 5, 2016 •

edited

Loading

Timoux commented Oct 5, 2016 •

edited

Loading

How can i use RandomForestClassifier with sparkit-learn library #72

How can i use RandomForestClassifier with sparkit-learn library #72

Comments

Timoux commented Oct 4, 2016

taynaud commented Oct 4, 2016

Timoux commented Oct 4, 2016

Timoux commented Oct 4, 2016 • edited Loading

taynaud commented Oct 4, 2016

Timoux commented Oct 4, 2016 • edited Loading

Timoux commented Oct 5, 2016 • edited Loading

Timoux commented Oct 5, 2016 • edited Loading

Timoux commented Oct 4, 2016 •

edited

Loading

Timoux commented Oct 4, 2016 •

edited

Loading

Timoux commented Oct 5, 2016 •

edited

Loading

Timoux commented Oct 5, 2016 •

edited

Loading