From 80872b333fca2542e46b064551eadc5a9a3741b2 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Mon, 27 Jun 2016 15:19:42 +0200
Subject: [PATCH 1/8] Add breaking changes to ML migration guide

---
 docs/mllib-guide.md | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index c28d13732eed..6610bbe3aa0c 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -104,10 +104,12 @@ and the migration guide below will explain all changes between releases.
 
 ## From 1.6 to 2.0
 
+### Breaking changes
 The deprecations and changes of behavior in the `spark.mllib` or `spark.ml` packages include:
 
-Deprecations:
+There were several breaking changes in Spark 2.0, outlined below.
 
+**Linear algebra classes for DataFrame-based APIs**
 * [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984):
  In `spark.ml.regression.LinearRegressionSummary`, the `model` field has been deprecated.
 * [SPARK-13784](https://issues.apache.org/jira/browse/SPARK-13784):
@@ -125,8 +127,31 @@ Deprecations:
  In `spark.ml.util.MLReader` and `spark.ml.util.MLWriter`, the `context` method has been deprecated in favor of `session`.
 * In `spark.ml.feature.ChiSqSelectorModel`, the `setLabelCol` method has been deprecated since it was not used by `ChiSqSelectorModel`.
 
-Changes of behavior:
+Spark's linear algebra dependencies were moved to a new project, `spark-mllib-local` (see [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944)). 
+As part of this change, the linear algebra classes were moved to a new package, `spark.ml.linalg`. 
+The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` classes, leading to a few breaking changes, predominantly in various model classes 
+(see [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) for a full list).
 
+**Note:** the RDD-based APIs in `spark.mllib` continue to depend on the previous package `spark.mllib.linalg`.
+
+**Deprecated methods removed**
+
+Several deprecated methods were removed.
+
+In `spark.ml`:
+
+* `setScoreCol` in `ml.evaluation.BinaryClassificationEvaluator`
+* `weights` in `LinearRegression` and `LogisticRegression`
+
+In `spark.mllib`:
+
+* `setMaxNumIterations` in `mllib.optimization.LBFGS` (marked as `DeveloperApi`)
+* `treeReduce` and `treeAggregate` in `mllib.rdd.RDDFunctions` (these functions are available on `RDD`s directly, and were marked as `DeveloperApi`)
+* `defaultStategy` in `mllib.tree.configuration.Strategy`
+* `build` in `mllib.tree.Node`
+* libsvm loaders for multiclass and load/save labeledData methods in `mllib.util.MLUtils`
+
+A full list of breaking changes can be found at [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810).
 * [SPARK-7780](https://issues.apache.org/jira/browse/SPARK-7780):
  `spark.mllib.classification.LogisticRegressionWithLBFGS` directly calls `spark.ml.classification.LogisticRegresson` for binary classification now.
  This will introduce the following behavior changes for `spark.mllib.classification.LogisticRegressionWithLBFGS`:

From 177a9e005a77aa851b6d8b78e98de67a95524257 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Tue, 28 Jun 2016 21:09:32 +0200
Subject: [PATCH 2/8] initial work

---
 docs/mllib-guide.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 6610bbe3aa0c..77daca9d1b42 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -134,6 +134,10 @@ The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` class
 
 **Note:** the RDD-based APIs in `spark.mllib` continue to depend on the previous package `spark.mllib.linalg`.
 
+_Converting vectors and matrices_
+
+ghdhgf
+
 **Deprecated methods removed**
 
 Several deprecated methods were removed.

From 6ef09a31c4e8808277357359ac6d048d866ce9f0 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Tue, 28 Jun 2016 22:29:16 +0200
Subject: [PATCH 3/8] Update guide for vector/matrix conversion and clean up

---
 docs/mllib-guide.md | 105 ++++++++++++++++++++++++++++++++------------
 1 file changed, 78 insertions(+), 27 deletions(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 77daca9d1b42..89ebfe03fb93 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -105,50 +105,73 @@ and the migration guide below will explain all changes between releases.
 ## From 1.6 to 2.0
 
 ### Breaking changes
-The deprecations and changes of behavior in the `spark.mllib` or `spark.ml` packages include:
 
-There were several breaking changes in Spark 2.0, outlined below.
+There were several breaking changes in Spark 2.0, which are outlined below.
 
 **Linear algebra classes for DataFrame-based APIs**
-* [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984):
- In `spark.ml.regression.LinearRegressionSummary`, the `model` field has been deprecated.
-* [SPARK-13784](https://issues.apache.org/jira/browse/SPARK-13784):
- In `spark.ml.regression.RandomForestRegressionModel` and `spark.ml.classification.RandomForestClassificationModel`,
- the `numTrees` parameter has been deprecated in favor of `getNumTrees` method.
-* [SPARK-13761](https://issues.apache.org/jira/browse/SPARK-13761):
- In `spark.ml.param.Params`, the `validateParams` method has been deprecated.
- We move all functionality in overridden methods to the corresponding `transformSchema`.
-* [SPARK-14829](https://issues.apache.org/jira/browse/SPARK-14829):
- In `spark.mllib` package, `LinearRegressionWithSGD`, `LassoWithSGD`, `RidgeRegressionWithSGD` and `LogisticRegressionWithSGD` have been deprecated.
- We encourage users to use `spark.ml.regression.LinearRegresson` and `spark.ml.classification.LogisticRegresson`.
-* [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900):
- In `spark.mllib.evaluation.MulticlassMetrics`, the parameters `precision`, `recall` and `fMeasure` have been deprecated in favor of `accuracy`.
-* [SPARK-15644](https://issues.apache.org/jira/browse/SPARK-15644):
- In `spark.ml.util.MLReader` and `spark.ml.util.MLWriter`, the `context` method has been deprecated in favor of `session`.
-* In `spark.ml.feature.ChiSqSelectorModel`, the `setLabelCol` method has been deprecated since it was not used by `ChiSqSelectorModel`.
 
-Spark's linear algebra dependencies were moved to a new project, `spark-mllib-local` (see [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944)). 
+Spark's linear algebra dependencies were moved to a new project, `spark-mllib-local` 
+(see [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944)). 
 As part of this change, the linear algebra classes were moved to a new package, `spark.ml.linalg`. 
-The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` classes, leading to a few breaking changes, predominantly in various model classes 
+The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` classes, 
+leading to a few breaking changes, predominantly in various model classes 
 (see [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) for a full list).
 
 **Note:** the RDD-based APIs in `spark.mllib` continue to depend on the previous package `spark.mllib.linalg`.
 
 _Converting vectors and matrices_
 
-ghdhgf
+While most pipeline components support backward compatibility for loading, 
+some existing `DataFrames` and pipelines in Spark versions prior to 2.0, that contain vector or matrix 
+columns, may need to be migrated to the new `spark.ml` vector and matrix types. 
+Utilities for converting `DataFrame` columns from `spark.mllib.linalg` to `spark.ml.linalg` types
+(and vice versa) can be found in `spark.mllib.util.MLUtils`.
 
-**Deprecated methods removed**
+<div class="codetabs">
+<div data-lang="scala"  markdown="1">
 
-Several deprecated methods were removed.
+{% highlight scala %}
+import org.apache.spark.mllib.util.MLUtils
 
-In `spark.ml`:
+val convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
+val convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF);
+{% endhighlight %}
 
-* `setScoreCol` in `ml.evaluation.BinaryClassificationEvaluator`
-* `weights` in `LinearRegression` and `LogisticRegression`
+Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for further detail.
+</div>
+
+<div data-lang="java" markdown="1">
+
+{% highlight java %}
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.sql.Dataset;
+
+Dataset<Row> convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
+Dataset<Row> convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF);
+{% endhighlight %}
+
+Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for further detail.
+</div>
+
+<div data-lang="python"  markdown="1">
+
+{% highlight python %}
+from pyspark.mllib.util import MLUtils
+
+convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
+convertedMatrxDF = MLUtils.convertMatrixColumnsToML(matrixDF)
+{% endhighlight %}
+
+Refer to the [`MLUtils` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) for further detail.
+</div>
+</div>
 
-In `spark.mllib`:
+**Deprecated methods removed**
+
+Several deprecated methods were removed in the `spark.mllib` and `spark.ml` packages:
 
+* `setScoreCol` in `ml.evaluation.BinaryClassificationEvaluator`
+* `weights` in `LinearRegression` and `LogisticRegression` in `spark.ml`
 * `setMaxNumIterations` in `mllib.optimization.LBFGS` (marked as `DeveloperApi`)
 * `treeReduce` and `treeAggregate` in `mllib.rdd.RDDFunctions` (these functions are available on `RDD`s directly, and were marked as `DeveloperApi`)
 * `defaultStategy` in `mllib.tree.configuration.Strategy`
@@ -156,6 +179,34 @@ In `spark.mllib`:
 * libsvm loaders for multiclass and load/save labeledData methods in `mllib.util.MLUtils`
 
 A full list of breaking changes can be found at [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810).
+
+### Deprecations and changes of behavior
+
+**Deprecations**
+
+Deprecations in the `spark.mllib` and `spark.ml` packages include:
+
+* [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984):
+ In `spark.ml.regression.LinearRegressionSummary`, the `model` field has been deprecated.
+* [SPARK-13784](https://issues.apache.org/jira/browse/SPARK-13784):
+ In `spark.ml.regression.RandomForestRegressionModel` and `spark.ml.classification.RandomForestClassificationModel`,
+ the `numTrees` parameter has been deprecated in favor of `getNumTrees` method.
+* [SPARK-13761](https://issues.apache.org/jira/browse/SPARK-13761):
+ In `spark.ml.param.Params`, the `validateParams` method has been deprecated.
+ We move all functionality in overridden methods to the corresponding `transformSchema`.
+* [SPARK-14829](https://issues.apache.org/jira/browse/SPARK-14829):
+ In `spark.mllib` package, `LinearRegressionWithSGD`, `LassoWithSGD`, `RidgeRegressionWithSGD` and `LogisticRegressionWithSGD` have been deprecated.
+ We encourage users to use `spark.ml.regression.LinearRegresson` and `spark.ml.classification.LogisticRegresson`.
+* [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900):
+ In `spark.mllib.evaluation.MulticlassMetrics`, the parameters `precision`, `recall` and `fMeasure` have been deprecated in favor of `accuracy`.
+* [SPARK-15644](https://issues.apache.org/jira/browse/SPARK-15644):
+ In `spark.ml.util.MLReader` and `spark.ml.util.MLWriter`, the `context` method has been deprecated in favor of `session`.
+* In `spark.ml.feature.ChiSqSelectorModel`, the `setLabelCol` method has been deprecated since it was not used by `ChiSqSelectorModel`.
+
+**Changes of behavior**
+
+Changes of behavior in the `spark.mllib` and `spark.ml` packages include:
+
 * [SPARK-7780](https://issues.apache.org/jira/browse/SPARK-7780):
  `spark.mllib.classification.LogisticRegressionWithLBFGS` directly calls `spark.ml.classification.LogisticRegresson` for binary classification now.
  This will introduce the following behavior changes for `spark.mllib.classification.LogisticRegressionWithLBFGS`:

From ac49f31cd83aa2755e0f1948f981beee66a16527 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Wed, 29 Jun 2016 12:52:17 +0200
Subject: [PATCH 4/8] review comment and add single instance conversion details

---
 docs/mllib-guide.md | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 89ebfe03fb93..2832a43b6459 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -110,9 +110,9 @@ There were several breaking changes in Spark 2.0, which are outlined below.
 
 **Linear algebra classes for DataFrame-based APIs**
 
-Spark's linear algebra dependencies were moved to a new project, `spark-mllib-local` 
+Spark's linear algebra dependencies were moved to a new project, `mllib-local` 
 (see [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944)). 
-As part of this change, the linear algebra classes were moved to a new package, `spark.ml.linalg`. 
+As part of this change, the linear algebra classes were copied to a new package, `spark.ml.linalg`. 
 The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` classes, 
 leading to a few breaking changes, predominantly in various model classes 
 (see [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) for a full list).
@@ -127,14 +127,24 @@ columns, may need to be migrated to the new `spark.ml` vector and matrix types.
 Utilities for converting `DataFrame` columns from `spark.mllib.linalg` to `spark.ml.linalg` types
 (and vice versa) can be found in `spark.mllib.util.MLUtils`.
 
+There are also utility methods available for converting single instances of 
+vectors and matrices. Use the `asML` method on a `mllib.linalg.Vector` / `mllib.linalg.Matrix`
+for converting to `ml.linalg` types, and 
+`mllib.linalg.Vectors.fromML` / `mllib.linalg.Matrices.fromML` 
+for converting to `mllib.linalg` types.
+
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
 
 {% highlight scala %}
 import org.apache.spark.mllib.util.MLUtils
 
-val convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
-val convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF);
+// convert DataFrame columns
+val convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
+val convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF)
+// convert a single vector or matrix
+val mlVec: org.apache.spark.ml.linalg.Vector = mllibVec.asML
+val mlMat: org.apache.spark.ml.linalg.Matrix = mllibMat.asML
 {% endhighlight %}
 
 Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for further detail.
@@ -146,8 +156,12 @@ Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.
 import org.apache.spark.mllib.util.MLUtils;
 import org.apache.spark.sql.Dataset;
 
+// convert DataFrame columns
 Dataset<Row> convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
 Dataset<Row> convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF);
+// convert a single vector or matrix
+org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML
+org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML
 {% endhighlight %}
 
 Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for further detail.
@@ -158,6 +172,7 @@ Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.
 {% highlight python %}
 from pyspark.mllib.util import MLUtils
 
+# convert DataFrame columns
 convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
 convertedMatrxDF = MLUtils.convertMatrixColumnsToML(matrixDF)
 {% endhighlight %}

From c48b89418d6374407b0dd4cb33a716bbf0189bf5 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Thu, 30 Jun 2016 15:59:34 +0200
Subject: [PATCH 5/8] Add Python asML example code

---
 docs/mllib-guide.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 2832a43b6459..3b99ae00330a 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -174,7 +174,10 @@ from pyspark.mllib.util import MLUtils
 
 # convert DataFrame columns
 convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
-convertedMatrxDF = MLUtils.convertMatrixColumnsToML(matrixDF)
+convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF)
+// convert a single vector or matrix
+mlVec = mllibVec.asML()
+mlMat = mllibMat.asML()
 {% endhighlight %}
 
 Refer to the [`MLUtils` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) for further detail.

From 91fd24cca2d1a5bb6be79b68d0d7091f64ca4e97 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Thu, 30 Jun 2016 16:00:03 +0200
Subject: [PATCH 6/8] fix comment

---
 docs/mllib-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 3b99ae00330a..dcbfe11438fa 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -175,7 +175,7 @@ from pyspark.mllib.util import MLUtils
 # convert DataFrame columns
 convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
 convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF)
-// convert a single vector or matrix
+# convert a single vector or matrix
 mlVec = mllibVec.asML()
 mlMat = mllibMat.asML()
 {% endhighlight %}

From c2ce7cd9659484cec325ed137271c9b6ed52923d Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Thu, 30 Jun 2016 16:00:33 +0200
Subject: [PATCH 7/8] Fix java ;

---
 docs/mllib-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index dcbfe11438fa..bf46fd4f03e6 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -160,8 +160,8 @@ import org.apache.spark.sql.Dataset;
 Dataset<Row> convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
 Dataset<Row> convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF);
 // convert a single vector or matrix
-org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML
-org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML
+org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML;
+org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML;
 {% endhighlight %}
 
 Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for further detail.

From 919bfe9c73fca485ff33a528feffa5b59b0b7e86 Mon Sep 17 00:00:00 2001
From: Nick Pentreath <nickp@za.ibm.com>
Date: Thu, 30 Jun 2016 16:02:46 +0200
Subject: [PATCH 8/8] Fix Java method invoc

---
 docs/mllib-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index bf46fd4f03e6..17fd3e1edf4b 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -160,8 +160,8 @@ import org.apache.spark.sql.Dataset;
 Dataset<Row> convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
 Dataset<Row> convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF);
 // convert a single vector or matrix
-org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML;
-org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML;
+org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML();
+org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML();
 {% endhighlight %}
 
 Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for further detail.