Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Add training summary for FMClassificationModel...

Why are the changes needed?

so that user can get the training process status, such as loss value of each iteration and total iteration number.

Does this PR introduce any user-facing change?

Yes
FMClassificationModel.summary
FMClassificationModel.evaluate

How was this patch tested?

new tests

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124694 has finished for PR 28960 at commit 0fac436.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124695 has finished for PR 28960 at commit 0a43c5f.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private[ml] trait FactorizationMachinesParams extends PredictorParams
with HasMaxIter with HasStepSize with HasTol with HasSolver with HasSeed
with HasFitIntercept with HasRegParam {
with HasFitIntercept with HasRegParam with HasWeightCol {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add with HasWeightCol because ClassificationSummary uses weigthCol. However, FM doesn't really support instance weight yet and all the weight are default to 1.0.

}

val stochasticLossHistory = new ArrayBuffer[Double](numIterations)
val stochasticLossHistory = new ArrayBuffer[Double](numIterations + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this stochasticLossHistory contain initial state + the state for each iteration, so it is consistent with the objectiveHistory in LogisticRegression and LinearRegression

* and regVal is the regularization value computed in the previous iteration as well.
*/
stochasticLossHistory += lossSum / miniBatchSize + regVal
if (converged || i == (numIterations + 1)) break
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, stochasticLossHistory only contains initial state + state form 1 to n-1 iteration, so need to add state for the last iteration too. After adding the last state, exist the loop.

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124696 has finished for PR 28960 at commit 35edf01.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124711 has finished for PR 28960 at commit ba3384d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 1, 2020

Looks like it needs a rebase after I merged your other commit

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124830 has finished for PR 28960 at commit 5b6ecb9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 6, 2020

Weird, a Python 2 failure?

======================================================================
FAIL: test_fm_classification_summary (pyspark.ml.tests.test_training_summary.TrainingSummaryTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@4/python/pyspark/ml/tests/test_training_summary.py", line 345, in test_fm_classification_summary
    self.assertAlmostEqual(s.weightedTruePositiveRate, 0.5, 2)
AssertionError: 1.0 != 0.5 within 2 places

@huaxingao
Copy link
Contributor Author

This is a python 2 failure only, python 3 is OK. I think I can simply change test data to get around this, but I found one more problem that I didn't have time to fix yet.
I will be extremely slow in these couple of weeks. Taking some time off :)

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine if it doesn't change existing APIs and is just adding more consistent functionality

// compute and sum up the subgradients on this subset (this is one map-reduce)
val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i)
.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
seqOp = (c, v) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, can you write stuff like case ((foo, bar, baz), v) => here to avoid all the ._1? I keep thinking it's possible but then I find it isn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems not. Just tried, not working.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it seems that breakable is not used in spark (except two suites):

➜  spark git:(master) ag --scala 'breakable' .   
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
2941:      breakable {

mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
142:      breakable {

I am not sure whether it is suiteable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a little unusual unless it significantly simplifies the code. Can !converged be added back to the while condition, and then turn the if (X) break condition below into if (!X) { ... code that follows ...} ? should be the same as i will increment and end the loop right after anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125718 has finished for PR 28960 at commit 77aefd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125809 has finished for PR 28960 at commit 0767117.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125810 has finished for PR 28960 at commit 9a58603.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 15, 2020

I think you can go ahead and merge this

@huaxingao huaxingao closed this in b05f309 Jul 15, 2020
@huaxingao
Copy link
Contributor Author

Merged to master. Thanks @srowen @zhengruifeng for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants