[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups #8752

jkbradley · 2015-09-14T18:53:50Z

Various ML guide cleanups.

ml-guide.md: Make it easier to access the algorithm-specific guides.
LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics.
mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec”
Clean up Binarizer user guide a little.
Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place.
spark.ml Word2Vec user guide: clean up grammar/writing
Chi Sq Feature Selector docs: Improve text in doc.

LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics. mllib-feature-extraction.html#elementwiseproduct * “w” parameter should be “scalingVec” Clean up Binarizer user guide a little. Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place. spark.ml Word2Vec user guide: * clean up grammar/writing Chi Sq Feature Selector docs * Improve text in doc.

SparkQA · 2015-09-14T19:17:54Z

Test build #42436 has finished for PR 8752 at commit 53d757a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- [ChiSqSelector](api/scala/index.html#org.apache.spark.mllib.feature.ChiSqSelector) implements Chi-Squared feature selection. It operates on labeled data with categorical features.ChiSqSelectororders features based on a Chi-Squared test of independence from the class, and then filters (selects) the top features which the class label depends on the most. This is akin to yielding the features with the most predictive power.

feynmanliang · 2015-09-14T23:52:57Z

docs/ml-guide.md

Not sure if we should include model summaries in this description; I had a mailing list question about where that feature is documented

Yeah, there's not a great place. I'll try sticking a note here.

… chars

jkbradley · 2015-09-15T20:00:35Z

@feynmanliang Thanks for reviewing. Just updated per your comments.

feynmanliang · 2015-09-15T20:21:24Z

docs/ml-features.md

The classname is backticked in ChiSqSelector but not here or in Binarizer, we should choose one and be consistent. I would vote for backticking everything since that's what I've been doing

feynmanliang · 2015-09-15T20:22:41Z

LGTM after changes

SparkQA · 2015-09-15T23:14:38Z

Test build #42500 has finished for PR 8752 at commit 91f4edd.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- and then filters (selects) the top features which the class label depends on the most.

mengxr · 2015-09-16T02:43:40Z

Merged into master. Thanks!

feynmanliang reviewed Sep 14, 2015
View reviewed changes

updates from review. Updates to long lines were just splitting to 100…

91f4edd

… chars

feynmanliang reviewed Sep 15, 2015
View reviewed changes

asfgit closed this in b921fe4 Sep 16, 2015

jkbradley deleted the mlguide-fixes-1.5 branch September 16, 2015 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups #8752

[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups #8752

Uh oh!

jkbradley commented Sep 14, 2015

Uh oh!

SparkQA commented Sep 14, 2015

Uh oh!

feynmanliang Sep 14, 2015

Uh oh!

jkbradley Sep 15, 2015

Uh oh!

jkbradley commented Sep 15, 2015

Uh oh!

feynmanliang Sep 15, 2015

Uh oh!

feynmanliang commented Sep 15, 2015

Uh oh!

SparkQA commented Sep 15, 2015

Uh oh!

mengxr commented Sep 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups #8752

[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups #8752

Uh oh!

Conversation

jkbradley commented Sep 14, 2015

Uh oh!

SparkQA commented Sep 14, 2015

Uh oh!

feynmanliang Sep 14, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley Sep 15, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Sep 15, 2015

Uh oh!

feynmanliang Sep 15, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang commented Sep 15, 2015

Uh oh!

SparkQA commented Sep 15, 2015

Uh oh!

mengxr commented Sep 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants