[SPARK-1503][MLLIB] Initial AcceleratedGradientDescent implementation. #4934

staple · 2015-03-06T20:00:21Z

An implementation of accelerated gradient descent, a first order optimization algorithm with faster asymptotic convergence than standard gradient descent.

Design discussion and benchmark results at
https://issues.apache.org/jira/browse/SPARK-1503

If the implementation seems promising it may make sense to add:

documentation about the algorithm
usage examples

SparkQA · 2015-03-06T21:22:15Z

Test build #28355 has finished for PR 4934 at commit a121bd0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-03-06T22:57:29Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/AcceleratedGradientDescent.scala

It is quite hard to choose a proper stepSize in practice, because it depends on the Lipschitz constant, which is usually unknown. It may be better if we can implement a line search method.

@mengxr Thanks for taking a look. I was advised by Reza Zadeh to implement a version without line search, at least for the initial implementation.

Please see discussion here: https://issues.apache.org/jira/browse/SPARK-1503?focusedCommentId=14225295, and in the following comments. I also attached some optimization benchmarks to the jira, which include performance of both backtracking line search and non line search implementations. Per your suggestion that it's hard to choose a proper stepSize I can attest that, anecdotally, acceleration seems somewhat more sensitive to diverging with nominal stepSize than the existing gradient descent.

rezazadeh · 2015-03-09T05:18:39Z

Thank you for this PR @staple !

@mengxr I suggested to @staple to first implement without backtracking to keep the PR as simple as possible. According to his plots (see JIRA), even without backtracking, this PR achieves fewer iterations with the same cost per iteration.

Note that backtracking requires several additional map-reduces per iteration. This makes it unclear when backtracking is best used. So I suggested to first merge the case that is a clear win (fewer iterations in the same cost per iteration). I think we should merge this without backtracking, and then have another PR to properly evaluate how backtracking affects total cost with the goal of also merging backtracking.

It seems @staple has already implemented backtracking (because he has results in the JIRA), but kept them out of this PR to keep it simple, so we can tackle that afterwards.

mengxr · 2015-03-10T22:47:49Z

Line search helps if you don't know the Lipschitz constant. With accelerated gradient, it is very easy to blow up if the step size is wrong. I'm okay with not having line search in this version. But we need to consider how the APIs are going to change after we add line search. For example, if we add line search option, what is the semantic of agd.setStepSize(1.0).useLineSearch()?

Btw, I don't think we need to stick to the current GradientDescent API. The accelerated gradient takes a smooth convex function which provides gradient and optionally the Lipschitz constant. The implementation of Nesterov's method doesn't need to know RDDs.

staple · 2015-03-11T03:09:31Z

Hi, replying to some of the statements above:

It seems @staple has already implemented backtracking (because he has results in the JIRA), but kept them out of this PR to keep it simple, so we can tackle that afterwards.

I wrote a backtracking implementation (and checked that it performs the same as the tfocs implementation). Currently it is just a port of the tfocs version. I’d need a little time to make it scala / spark idiomatic, but the turnaround would be pretty fast.

For example, if we add line search option, what is the semantic of agd.setStepSize(1.0).useLineSearch()

TFOCS supports a suggested initial lipschitz value (variable named ‘L’), which is just a starting point for line search, so a corresponding behavior would be to use the step size as just an initial suggestion when line search is enabled. It may be desirable to use a parameter name like ‘L’ instead of ‘stepSize’ to make the meaning clearer.

In TFOCS you can disable backtracking line search by setting several parameters (L, Lexact, alpha, and beta) which individually control different aspects of the backtracking implementation.
For spark it may make sense to provide backtracking modes that are configured explicitly, for example fixed lipshitz bound (no backtracking), or backtracking line search based on the TFOCS implementation, or possibly an alternative line search implementation that is more conservative about performing round trip aggregations. Then there could be a setBacktrackingMode() setter to configure which mode is used.

Moving forward there may be a need to support additional acceleration algorithms in addition to Auslender and Teboulle. These might be configurable via a setAlgorithm() function.

Btw, I don't think we need to stick to the current GradientDescent API. The accelerated gradient takes a smooth convex function which provides gradient and optionally the Lipschitz constant. The implementation of Nesterov's method doesn't need to know RDDs.

This is good to know. I had been assuming we would stick with the existing GradientDescent api including Gradient and Updater delegates. Currently the applySmooth and applyProjector functions (named the same as corresponding TFOCS functions) serve as a bridge between the acceleration implementation (relatively unaware of RDDs) and spark specific RDD aggregations.

This seems like a good time to mention that the backtracking implementation in TFOCS uses a system of caching the (expensive to compute) linear operator component of the objective function, which significantly reduces the cost of backtracking. A similar implementation is possible in spark, though the performance benefit may not be as significant because two round trips would still be required per iteration. (See p. 3 of my design doc linked in the jira for some more detail.) One reason I suggested not implementing linear operator caching in the design doc is because it’s incompatible with the existing Gradient interface. But if we are considering an alternative interface it may be worth revisiting this issue.

The objective function “interface” used by TFOCS involves the functions applyLinear (linear operator component of objective), applySmooth (smooth portion of objective), and applyProjector (nonsmooth portion of objective). In addition there are a number of numeric and categorical parameters. Theoretically we could adopt a similar interface (with or without applyLinear, depending) where RDD specific operations are encapsulated within the various apply* functions.

Finally, I wanted to mention that I live in the bay area and am happy to meet in person to discuss this project if that would be helpful.

staple · 2015-03-11T03:10:30Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/AcceleratedGradientDescent.scala

Oops looks like a typo: 'avaialble'

srowen · 2015-07-28T15:11:50Z

Likewise is this one stale? I'm not sure this is going to move forward.

mengxr · 2015-07-28T15:38:11Z

We refactored the implementation. You can find the latest version at https://github.com/databricks/spark-tfocs. We will send a new PR when the implementation is ready.

@staple Could you close this PR for now?

[SPARK-1503][MLLIB] Initial AcceleratedGradientDescent implementation.

a121bd0

mengxr reviewed Mar 6, 2015
View reviewed changes

staple reviewed Mar 11, 2015
View reviewed changes

asfgit closed this in 423cdfd Aug 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-1503][MLLIB] Initial AcceleratedGradientDescent implementation. #4934

[SPARK-1503][MLLIB] Initial AcceleratedGradientDescent implementation. #4934

Uh oh!

staple commented Mar 6, 2015

Uh oh!

SparkQA commented Mar 6, 2015

Uh oh!

mengxr Mar 6, 2015

Uh oh!

staple Mar 6, 2015

Uh oh!

rezazadeh commented Mar 9, 2015

Uh oh!

mengxr commented Mar 10, 2015

Uh oh!

staple commented Mar 11, 2015

Uh oh!

staple Mar 11, 2015

Uh oh!

srowen commented Jul 28, 2015

Uh oh!

mengxr commented Jul 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-1503][MLLIB] Initial AcceleratedGradientDescent implementation. #4934

[SPARK-1503][MLLIB] Initial AcceleratedGradientDescent implementation. #4934

Uh oh!

Conversation

staple commented Mar 6, 2015

Uh oh!

SparkQA commented Mar 6, 2015

Uh oh!

mengxr Mar 6, 2015

Choose a reason for hiding this comment

Uh oh!

staple Mar 6, 2015

Choose a reason for hiding this comment

Uh oh!

rezazadeh commented Mar 9, 2015

Uh oh!

mengxr commented Mar 10, 2015

Uh oh!

staple commented Mar 11, 2015

Uh oh!

staple Mar 11, 2015

Choose a reason for hiding this comment

Uh oh!

srowen commented Jul 28, 2015

Uh oh!

mengxr commented Jul 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants