Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

1, change convertToBaggedRDDSamplingWithReplacement to attach instance weights
2, make RF supports weights

Why are the changes needed?

weightCol is already exposed, while RF has not support weights.

Does this PR introduce any user-facing change?

Yes, new setters

How was this patch tested?

added testsuites

fix bagged
}

val instances: RDD[Instance] = extractLabeledPoints(dataset, numClasses).map(_.toInstance)
val instances = extractInstances(dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better?

    validateNumClasses(numClasses)
    val instances = extractInstances(dataset, numClasses)

(20, 5, 1.0, 0.96),
(20, 10, 1.0, 0.96),
(20, 10, 0.95, 0.96)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess maybe also add different impurity in testParams?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also test a special case numTrees = 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with numTrees==1, RF is exactly the DecisionTree, which is already tested in DecisionTreeClassifierSuite/DecisionTreeRegressorSuite.

I guess maybe also add different impurity in testParams?

I guess current tests maybe enough, Testsuites for DT/GBT do not test impurity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I suggested testing different impurities is because when calculating best split, the impurity path (both entropy and gini) is affected by sample weight. However, after taking a look at the DecisionTree test, I saw both entropy and gini are tested with sample weight there, so this is already covered in DecisionTree test, no need to test here.

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116125 has finished for PR 27097 at commit 32ec9a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

friendly ping @srowen @imatiach-msft

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116160 has finished for PR 27097 at commit 14a57c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao do you have thoughts? looks reasonably straightforward, and a long standing feature request

@huaxingao
Copy link
Contributor

@srowen The change looks fine to me. Let me take another look later today or tomorrow.

@huaxingao
Copy link
Contributor

LGTM :)

@srowen
Copy link
Member

srowen commented Jan 13, 2020

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116644 has finished for PR 27097 at commit 14a57c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@imatiach-msft imatiach-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@srowen srowen closed this in 9320011 Jan 14, 2020
@srowen
Copy link
Member

srowen commented Jan 14, 2020

Merged to master

@zhengruifeng
Copy link
Contributor Author

Thanks @srowen @imatiach-msft @huaxingao for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants