-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30351][ML][PySpark] BisectingKMeans support instance weighting #27035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #115895 has finished for PR 27035 at commit
|
|
Test build #115898 has finished for PR 27035 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, remove space before first i
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just keep existing line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean val d = input.map(_._1.size).first()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that var dataVectorWithNorm is only used to get norms, so what about removing it and make val norms = input.map(d => Vectors.norm(d._1, 2.0))...
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
Outdated
Show resolved
Hide resolved
|
I saw the conflict file. I will fix this after #27052 is merged, so I don't have to rebase twice. |
|
Test build #115962 has finished for PR 27035 at commit
|
|
ping @huaxingao #27052 is merged |
|
Test build #116128 has finished for PR 27035 at commit
|
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
Show resolved
Hide resolved
|
Test build #116255 has finished for PR 27035 at commit
|
|
Merged to master |
|
Thanks! @srowen @zhengruifeng |
What changes were proposed in this pull request?
add weight support in BisectingKMeans
Why are the changes needed?
BisectingKMeans should support instance weighting
Does this PR introduce any user-facing change?
Yes. BisectingKMeans.setWeight
How was this patch tested?
Unit test