-
Notifications
You must be signed in to change notification settings - Fork 696
Memory Optimization #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Optimization #386
Conversation
…tate API provided for state reuse;PowerMethod:specialized on DenseMatrix and DenseVector for speed;QuadraticMinimizer:iterator pattern cleaned for speed,memory optimization to bring runtime close to dposv
|
I will take a closer look on comparisons with ml cholesky solver tomorrow...the first iteration is always slow...I am not quite sure why... |
|
By the way sorry to turn the code into a C-style code but I had to make sure no memories are allocated in the whole algorithm except through NNLS.initialize and QuadraticMinimizer.initialize since we were compared against native BLAS dposv :-) When you review please let me know if you see any additional memory allocation in algorithm inner loop (iterations) since I am using lot of breeze overloaded functions and might have missed things... May be there are ways to optimize initialize further so that first iteration also comes close to mllib numbers... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is not what you want. This changes the operation.
scala> implicit class Foo(x: Int) { def dot(y: Int) = x * y }
defined class Foo
scala> 3 dot 2 + 3
res1: Int = 15
scala> 3.dot(2) + 3
res2: Int = 9There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad...fixed it
|
any memory specific optimization ? Most of the time I run QuadraticMinimizer first and so hot spot is an issue...I will try the same run after warming JVM tomorrow and finish up the API changes in AM |
|
Remember it's not just that the JVM is warming up: It has to warm up for -- David On Sun, Mar 22, 2015 at 11:33 PM, Debasish Das notifications@github.com
|
|
I added an API to provide upper triangular gram matrix and with that the runtime in the first iteration also dropped...I think QuadraticMinimizer should be able to replace the ML CholeskySolver now... |
…x provided as primitive array for supporting normal equations
|
The first iteration issue is consistent with both NNLS and QuadraticMinimizer...Out of curiosity, I looked at the code and both mllib and Breeze back matrix and vector workspace with Arrays[Double]...so I am really not clear why there is the 2X difference only in initial iterations....Is it due to the overhead from traits that show up in DenseVector/DenseMatrix ? During the solve things are clean and so I don't think there are cases where BLAS using native memory is faster than QuadraticMinimizer sending memory to LAPACK to work on... |
|
using a DenseVector for the first time incurs a lot of overhead: operators def time(x: =>Unit) { val in = System.currentTimeMillis(); On Tue, Mar 24, 2015 at 1:39 PM, Debasish Das notifications@github.com
|
|
what's the status here? Can I merge this? I really want to release a fix for the SparseVector bug |
|
I am ok with this...moving from DenseVector/DenseMatrix to Array will make the code ugly |
|
Please let me know when you cut 0.11.2... |
|
tongiht On Wed, Mar 25, 2015 at 7:17 AM, Debasish Das notifications@github.com
|
Memory optimization are done to bring optimize.linear.NNLS runtime closer to mllib NNLS and optimize.proximal.QuadraticMinimizer default close to blas.dposv
NNLS:iterator pattern cleaned for speed, in-place gemv added,initialState API provided for state reuse;PowerMethod:specialized on DenseMatrix and DenseVector for speed;QuadraticMinimizer:iterator pattern cleaned for speed,memory optimization to bring runtime close to dposv