Skip to content
Closed
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
6c18058
fixed minor typos in docs/README.md and docs/api.md
Rosstin Jun 26, 2015
21ac1e5
Merge branch 'master' of github.com:apache/spark into SPARK-8639
Rosstin Jun 29, 2015
2cd2985
Merge branch 'master' of github.com:apache/spark into SPARK-8639
Rosstin Jun 29, 2015
242aedd
SPARK-8660, changed comment style from JavaDoc style to normal multil…
Rosstin Jun 29, 2015
bb9a4b1
Merge branch 'master' of github.com:apache/spark into SPARK-8660
Rosstin Jun 29, 2015
5a05dee
SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style com…
Rosstin Jun 29, 2015
39ddd50
Merge branch 'master' of github.com:apache/spark into SPARK-8661
Rosstin Jul 1, 2015
fe6b112
SPARK-8660 > symbols removed from LogisticRegressionSuite.scala for e…
Rosstin Jul 1, 2015
f4b9bc8
SPARK-8660 restored character limit on multiline comments in Logistic…
Rosstin Jul 1, 2015
84356cd
Merge branch 'master' of github.com:apache/spark into SPARK-8660-2
Rosstin Jul 2, 2015
7f70b2d
Merge branch 'master' of github.com:apache/spark into SPARK-8660-2
Rosstin Jul 6, 2015
23d61d6
Merge branch 'master' of github.com:apache/spark into SPARK-8660-2
Rosstin Jul 7, 2015
cc9ee94
Merge branch 'master' of github.com:apache/spark into SPARK-8660-2
Rosstin Aug 3, 2015
1c58304
Merge branch 'master' of github.com:apache/spark into SPARK-8660-2
Rosstin Aug 7, 2015
d2e71fa
Merge branch 'master' of github.com:apache/spark into SPARK-8660-2
Rosstin Aug 10, 2015
379c592
SPARK-8965 add ml-guide python example estimator, transform, and param
Rosstin Aug 10, 2015
f5cbea6
SPARK-8965 fixed indentation issues to conform with PEP8 style for Py…
Rosstin Aug 11, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions docs/ml-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,74 @@ jsc.stop();
{% endhighlight %}
</div>

<div data-lang="python">
{% highlight python %}
from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.param import Param, Params
from pyspark.sql import Row, SQLContext

sc = SparkContext(appName="SimpleParamsExample")
sqlContext = SQLContext(sc)

# Prepare training data.
# We use LabeledPoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Spark SQL can convert RDDs of LabeledPoints into DataFrames."

# Spark SQL can convert RDDs of LabeledPoints into DataFrames.
training = sc.parallelize([LabeledPoint(1.0, [0.0, 1.1, 0.1]),
LabeledPoint(0.0, [2.0, 1.0, -1.0]),
LabeledPoint(0.0, [2.0, 1.3, 1.0]),
LabeledPoint(1.0, [0.0, 1.2, -0.5])])

# Create a LogisticRegression instance. This instance is an Estimator.
lr = LogisticRegression(maxIter=10, regParam=0.01)
# Print out the parameters, documentation, and any default values.
print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"

# Learn a LogisticRegression model. This uses the parameters stored in lr.
model1 = lr.fit(training.toDF())

# Since model1 is a Model (i.e., a transformer produced by an Estimator),
# we can view the parameters it used during fit().
# This prints the parameter (name: value) pairs, where names are unique IDs for this
# LogisticRegression instance.
print "Model 1 was fit using parameters: "
print model1.extractParamMap()

# We may alternatively specify parameters using a Python dictionary as a paramMap
paramMap = {lr.maxIter: 20}
paramMap[lr.maxIter] = 30 # Specify 1 Param, overwriting the original maxIter.
paramMap.update({lr.regParam: 0.1, lr.threshold: 0.55}) # Specify multiple Params.

# You can combine paramMaps, which are python dictionaries.
paramMap2 = {lr.probabilityCol: "myProbability"} # Change output column name
paramMapCombined = paramMap.copy()
paramMapCombined.update(paramMap2)

# Now learn a new model using the paramMapCombined parameters.
# paramMapCombined overrides all parameters set earlier via lr.set* methods.
model2 = lr.fit(training.toDF(), paramMapCombined)
print "Model 2 was fit using parameters: "
print model2.extractParamMap()

# Prepare test data
test = sc.parallelize([LabeledPoint(1.0, [-1.0, 1.5, 1.3]),
LabeledPoint(0.0, [ 3.0, 2.0, -0.1]),
LabeledPoint(1.0, [ 0.0, 2.2, -1.5])])

# Make predictions on test data using the Transformer.transform() method.
# LogisticRegression.transform will only use the 'features' column.
# Note that model2.transform() outputs a "myProbability" column instead of the usual
# 'probability' column since we renamed the lr.probabilityCol parameter previously.
prediction = model2.transform(test.toDF())
selected = prediction.select("features", "label", "myProbability", "prediction")
for row in selected.collect():
print row

sc.stop()
{% endhighlight %}
</div>

</div>

## Example: Pipeline
Expand Down