-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve SW performance by replacing functional reductions with imperative ones #965
improve SW performance by replacing functional reductions with imperative ones #965
Conversation
Can one of the admins verify this patch? |
43ae08f
to
baba084
Compare
Jenkins, test this please |
Test FAILed. Build result: FAILUREGitHub pull request #965 of commit baba084 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/965/merge^{commit} # timeout=10 > git branch -a --contains 8cc5c6d # timeout=10 > git rev-parse remotes/origin/pr/965/merge^{commit} # timeout=10Checking out Revision 8cc5c6d (origin/pr/965/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 8cc5c6d1508118846a1ddc4f4dd3decbc17a7994First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Thanks @noamBarkai! LGTM! I'll look at the failing tests a bit later. Always amusing to run into |
Jenkins, retest this please |
I saw a 40% speed up using this simple example on 36000 comparisons over 1024 executors. Curious, are you using sw for biological sequences? I ask because our current implementation does not have support for alphabets or substitution matrices. |
Test FAILed. Build result: FAILUREGitHub pull request #965 of commit baba084 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/965/merge^{commit} # timeout=10 > git branch -a --contains 8cc5c6d # timeout=10 > git rev-parse remotes/origin/pr/965/merge^{commit} # timeout=10Checking out Revision 8cc5c6d (origin/pr/965/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 8cc5c6d1508118846a1ddc4f4dd3decbc17a7994First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
@heuermh - glad to hear you've reproduced the performance improvement. 👍 |
The error is
which, as you say, looks like an issue with the source formatting of SmithWaterman.scala. It's odd that it would only show up in a single build of the matrix. Check that your git repo is clean and then run |
The 2.10/1.5.2/2.6.0 configuration is the first build to run, if that fails the whole build stops. $ ./scripts/format-source
$ git diff .
diff --git a/adam-core/src/main/scala/org/bdgenomics/adam/algorithms/smithwaterman/SmithWaterman.scala b/adam-core/src/main/scala/org/bdgenomics/adam/algorithms/smithwaterman/SmithWaterman.scala
index d4e3930..83179a9 100644
--- a/adam-core/src/main/scala/org/bdgenomics/adam/algorithms/smithwaterman/SmithWaterman.scala
+++ b/adam-core/src/main/scala/org/bdgenomics/adam/algorithms/smithwaterman/SmithWaterman.scala
@@ -47,25 +47,25 @@ abstract class SmithWaterman(xSequence: String, ySequence: String) extends Seria
* @param matrix Matrix to score.
* @return Tuple of (i, j) coordinates.
*/
- private[smithwaterman] final def maxCoordinates(matrix: Array[Array[Double]]): (Int, Int) = {
- var xMax = 0
- var yMax = 0
- var max = Double.MinValue
- var x = 0
- while (x < matrix.length) {
- var y = 0
- while (y < matrix(x).length) {
- if (matrix(x)(y) >= max) {
- max = matrix(x)(y)
- xMax = x
- yMax = y
- }
- y += 1
+ private[smithwaterman] final def maxCoordinates(matrix: Array[Array[Double]]): (Int, Int) = {
+ var xMax = 0
+ var yMax = 0
+ var max = Double.MinValue
+ var x = 0
+ while (x < matrix.length) {
+ var y = 0
+ while (y < matrix(x).length) {
+ if (matrix(x)(y) >= max) {
+ max = matrix(x)(y)
+ xMax = x
+ yMax = y
}
- x += 1
+ y += 1
}
- (yMax, xMax)
+ x += 1
}
+ (yMax, xMax)
+ } @noamBarkai could you run |
… to improve performance
baba084
to
eb8f6d4
Compare
sorry about that, done. |
Perfect, thank you. Will merge once Mr Jenkins gives his approval. |
Jenkins, retest this please |
Test PASSed. |
Thanks for the contribution! Merged in commit a505bb9 |
cool! thanks. On Tue, Mar 22, 2016 at 9:00 PM, Michael L Heuer notifications@github.com
|
while running Smith-Waterman on large data sets over Spark we noticed "hot spots" in Scala's for-loops, hence the change below replaces the elegant functional calculations with the far less-elegant but slightly more efficient while-loops.
We've seen a ~30% improvement in performance with this change.