Skip to content

Commit 1aa191e

Browse files
BryanCutlermengxr
authored andcommitted
[SPARK-16231][PYSPARK][ML][EXAMPLES] dataframe_example.py fails to convert ML style vectors
## What changes were proposed in this pull request? Need to convert ML Vectors to the old MLlib style before doing Statistics.colStats operations on the DataFrame ## How was this patch tested? Ran example, local tests Author: Bryan Cutler <cutlerb@gmail.com> Closes #13928 from BryanCutler/pyspark-ml-example-vector-conv-SPARK-16231.
1 parent c17b1ab commit 1aa191e

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

examples/src/main/python/ml/dataframe_example.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828

2929
from pyspark.sql import SparkSession
3030
from pyspark.mllib.stat import Statistics
31+
from pyspark.mllib.util import MLUtils
3132

3233
if __name__ == "__main__":
3334
if len(sys.argv) > 2:
@@ -55,7 +56,8 @@
5556
labelSummary.show()
5657

5758
# Convert features column to an RDD of vectors.
58-
features = df.select("features").rdd.map(lambda r: r.features)
59+
features = MLUtils.convertVectorColumnsFromML(df, "features") \
60+
.select("features").rdd.map(lambda r: r.features)
5961
summary = Statistics.colStats(features)
6062
print("Selected features column with average values:\n" +
6163
str(summary.mean()))

0 commit comments

Comments
 (0)