Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,10 @@ class BisectingKMeansModel private[clustering] (

@Since("2.0.0")
override def save(sc: SparkContext, path: String): Unit = {
BisectingKMeansModel.SaveLoadV1_0.save(sc, this, path)
BisectingKMeansModel.SaveLoadV2_0.save(sc, this, path)
}

override protected def formatVersion: String = "1.0"
override protected def formatVersion: String = "2.0"
}

@Since("2.0.0")
Expand All @@ -126,7 +126,7 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
val model = SaveLoadV1_0.load(sc, path)
model
case (SaveLoadV2_0.thisClassName, SaveLoadV2_0.thisFormatVersion) =>
val model = SaveLoadV1_0.load(sc, path)
val model = SaveLoadV2_0.load(sc, path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nice catch!

Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a regression, but it looks like a correctness or data loss issue at Spark 2.4.0 new feature.
Do you know why https://issues.apache.org/jira/browse/SPARK-25793 is created as a Minor and Documentation issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no test to verify calling correct load method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the write/load model tests in order to include also a different distance measure from the default one. In this way we should catch this error. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have ever use SaveLoadV2_0 to save model for now? Looks BisectingKMeansModel.save simply calls SaveLoadV1_0.save to save models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya @mgaido91
I will change the BisectingKMeansModel.save to

BisectingKMeansModel.SaveLoadV2_0.save(sc, this, path)

model
case _ => throw new Exception(
s"BisectingKMeansModel.load did not recognize model with (className, format version):" +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -187,11 +187,12 @@ class BisectingKMeansSuite extends SparkFunSuite with MLlibTestSparkContext {

val points = (1 until 8).map(i => Vectors.dense(i))
val data = sc.parallelize(points, 2)
val model = new BisectingKMeans().run(data)
val model = new BisectingKMeans().setDistanceMeasure(DistanceMeasure.COSINE).run(data)
try {
model.save(sc, path)
val sameModel = BisectingKMeansModel.load(sc, path)
assert(model.k === sameModel.k)
assert(model.distanceMeasure === sameModel.distanceMeasure)
model.clusterCenters.zip(sameModel.clusterCenters).foreach(c => c._1 === c._2)
} finally {
Utils.deleteRecursively(tempDir)
Expand Down