-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel #28960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #124694 has finished for PR 28960 at commit
|
|
Test build #124695 has finished for PR 28960 at commit
|
| private[ml] trait FactorizationMachinesParams extends PredictorParams | ||
| with HasMaxIter with HasStepSize with HasTol with HasSolver with HasSeed | ||
| with HasFitIntercept with HasRegParam { | ||
| with HasFitIntercept with HasRegParam with HasWeightCol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add with HasWeightCol because ClassificationSummary uses weigthCol. However, FM doesn't really support instance weight yet and all the weight are default to 1.0.
| } | ||
|
|
||
| val stochasticLossHistory = new ArrayBuffer[Double](numIterations) | ||
| val stochasticLossHistory = new ArrayBuffer[Double](numIterations + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this stochasticLossHistory contain initial state + the state for each iteration, so it is consistent with the objectiveHistory in LogisticRegression and LinearRegression
| * and regVal is the regularization value computed in the previous iteration as well. | ||
| */ | ||
| stochasticLossHistory += lossSum / miniBatchSize + regVal | ||
| if (converged || i == (numIterations + 1)) break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, stochasticLossHistory only contains initial state + state form 1 to n-1 iteration, so need to add state for the last iteration too. After adding the last state, exist the loop.
|
Test build #124696 has finished for PR 28960 at commit
|
|
Test build #124711 has finished for PR 28960 at commit
|
|
Looks like it needs a rebase after I merged your other commit |
|
Test build #124830 has finished for PR 28960 at commit
|
|
Weird, a Python 2 failure? |
|
This is a python 2 failure only, python 3 is OK. I think I can simply change test data to get around this, but I found one more problem that I didn't have time to fix yet. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine if it doesn't change existing APIs and is just adding more consistent functionality
| // compute and sum up the subgradients on this subset (this is one map-reduce) | ||
| val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) | ||
| .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( | ||
| seqOp = (c, v) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget, can you write stuff like case ((foo, bar, baz), v) => here to avoid all the ._1? I keep thinking it's possible but then I find it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems not. Just tried, not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems that breakable is not used in spark (except two suites):
➜ spark git:(master) ag --scala 'breakable' .
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
2941: breakable {
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
142: breakable {
I am not sure whether it is suiteable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a little unusual unless it significantly simplifies the code. Can !converged be added back to the while condition, and then turn the if (X) break condition below into if (!X) { ... code that follows ...} ? should be the same as i will increment and end the loop right after anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
|
Test build #125718 has finished for PR 28960 at commit
|
|
Test build #125809 has finished for PR 28960 at commit
|
|
Test build #125810 has finished for PR 28960 at commit
|
|
I think you can go ahead and merge this |
|
Merged to master. Thanks @srowen @zhengruifeng for reviewing! |
What changes were proposed in this pull request?
Add training summary for FMClassificationModel...
Why are the changes needed?
so that user can get the training process status, such as loss value of each iteration and total iteration number.
Does this PR introduce any user-facing change?
Yes
FMClassificationModel.summary
FMClassificationModel.evaluate
How was this patch tested?
new tests