-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24557][ML] ClusteringEvaluator support array input #21563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #91828 has finished for PR 21563 at commit
|
|
Test build #91830 has finished for PR 21563 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this is the right way. Probably we can face the same issue everywhere we are using DatasetUtils.columnToVector. Probably it is better to fix the problem there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgaido91 Thanks for your reviewing!
I have considered this, however there exists a problem:
if we want to append metadata into the transformed column (like using method .as(alias: String, metadata: Metadata)) in DatasetUtils.columnToVector, how can we get the name of transformed column?
The only way to do this I know is:
val metadata = ...
val vectorCol = ..
val vectorName = dataset.select(vectorCol) .schema.head.name
vectorCol.as(vectorName, metadata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the new column we are returning, so we can get its name with .expr.sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgaido91 I think it maybe nice to first add a name getter for column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can propose that
|
Test build #91836 has finished for PR 21563 at commit
|
eb6022c to
9064e7b
Compare
|
@mgaido91 I am sorry to make a force push to update my git username in this PR. and I found that So I still keep the orignal method. @MLnick @mengxr @jkbradley Would you please help reveiwing this? Thanks! |
|
Test build #93586 has finished for PR 21563 at commit
|
|
retest this please |
|
Test build #93863 has finished for PR 21563 at commit
|
|
@mengxr I notice that you open a ticket for supporting integer type labels in ClusteringEvalutator, would you like to shepherd this pr too? |
|
LGTM. Merged into master. Thanks! |
What changes were proposed in this pull request?
ClusteringEvaluator support array input
How was this patch tested?
added tests