-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][SPARK-1486][MLlib] Multi Model Training with Gradient Descent #2451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QA tests have started for PR 2451 at commit
|
|
QA tests have finished for PR 2451 at commit
|
|
With some guidance I could help you with the docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this segment merits a one-line explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are related to #2294
I can add explanations there. I realize the math is hard to understand.
|
@anantasty: If you could look through the code and mark places where you're like "What the heck is going on here", it would be easier for me to write up proper comments. I'm going to add a lot today, I can incorporate yours as well. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need sparse versions of rand and randn? It should not be too much more expensive to use the dense versions, and then convert to a sparse matrix. (I figure < 2x the cost.) I can not think of use cases for these either, except unit testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're nice functions to have. It will be helpful for people who want to do random projections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did not see the "density" argument. Sounds OK to me (but is there a use case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No use case in MLlib yet. Randomized SVD for big matrices (distributed) may make use of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds fine then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a good reason to implement this in DenseMatrix: You could avoid the expensive index (multiplication), and just iterate through counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc: "We will concatenate results (weights) to finalWeights as we iterate." (or something like that)
|
@brkyvz Let's try to split this PR into small ones. For example, functions like factory methods for sparse matrices should not be included in this PR. We want to keep the vector and matrix classes in MLlib simple and let user use breeze for linear algebra operations. If breeze has performance issues, maybe we should contribute the optimization to breeze to centralize the effort on single-machine linear algebra computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miniBatchSize is inexact. We could avoid the initial count() and instead aggregate the minibatch size during the treeAggregate.
|
@brkyvz I've made a rough pass, and have listed all of my comments. I can make future passes as needed. Lots of work & it will be great to have! |
|
closing this PR as a lot of functionality has changed |
Note: This is still a work in progress
This is the first of the pull requests to support multi-model training in MLlib. It batches examples and trains multiple models with different regularization parameters and step sizes all at once using Matrix-Matrix multiplication. It uses Native BLAS when the data matrix is dense, and uses sparse matrices as much as possible for both better memory utilization and performance (I will post performance results in the comments).
This is a HUGE Pull Request, therefore I'm posting this now. It is not finished, docs need to be updated, code can be somewhat cleaned up for ease of understanding. I'm posting this now so that users can comment and make suggestions along the way.
Most of the PR consists of adding additional Local Matrix operations for the calculation of gradients and losses.